-
-
Notifications
You must be signed in to change notification settings - Fork 701
Cache warning: linecache error reading tile #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It sounds like it's caching a sequential read. This shouldn't be possible, there's a thing to block this, but perhaps it's happening somehow. Could you show me how to run the thing that's failing on my machine? The workaround would be to disable the cache, or not use sequential mode. Try Or don't use |
You could also try turning on cache tracing with It'll print a line for every vips operation it runs, something like:
The '+' means the operation has been executed and added to the cache, a space means executed but not added to cache. Cache hits are not shown (perhaps they should be). It'll make a huge log, of course. Look for the lines before your error and see if the jpegload is being reused when it shouldn't be. |
Here it's the output, merging both stdout and stderr. |
Sorry, I've just updated the gist with a proper longer traces. You can search for |
I've performed other tests and the issue is only reproducible with JPEG images. PNG and WEBP images works well, also with higher loads than JPEG |
Oh, interesting. I guess you tried PNG as well? That's closest to JPG in the way the reader works. |
I've tried both PNG and WebP without any trouble |
Maybe increasing the cache size could help with that? If yes, I think it's still a workaround solution since it only mitigates part of the problem. What about inferring the cache size or whatever and switching to another read mode at runtime? Any other ideas without having huge impact in the performance? As alternative but reliable solution could be defining a limit of the maximum number of concurrent vips processes based on the cache/memory status, like a simple throttle strategy, but I'd prefer avoid that and be faster and responsible as max as possible |
I can't work out what's gone wrong from the trace :-( I'll see I can make the output more useful. |
No problem. I'll try to investigate as well, but regarding to the trace, it's not simple to split the traces since the issue seems to be only reproducible after a high load... Do you have at hand some libvips benchmarks based on hundred of operations and high concurrency? I'll try doing some analysis of the libvips memory stats and increasing the cache size |
@lovell has done some soak testing for sharp and (I think?) it's worked OK. He may not have had so much threading. |
@h2non If you're letting golang deal with concurrency, perhaps try setting the Over the past year I've carried out some fairly extensive memory/soak testing of the parts of libvips that sharp uses, which has led to the discovery (and subsequent fix) of a few leaks and buffer overruns. IIRC I was running a max of 16 threads. |
The |
I made a test program: https://gist.github.com/jcupitt/7de81323573665d8aafb It works fine with fewer than about 15 threads, but I start to see a range of bad things with (for example) 50 threads:
I'll try without Orc and with other file formats. |
Definitively it seems like here's a problem with thread-unsafe code. I'm not too happy about the idea of using a concurrency of |
How much is required calling to |
I believe The (maximum) number of images Go will process concurrently is the number of threads the golang runtime is managing on your behalf, limited by If you're not already doing so, you may need to wrap all the vips methods that constitute processing each image with |
I've made some progress on this: I've found a couple of threading errors with that soak program and stability is improving. Not quite there yet though, I'll keep digging. This was the big patch: jcupitt@54c37e7 it removes vips_image_wio_input() and adds vips_image_copy_memory() instead. vips_image_wio_input() was fine when we had a single main thread, and fine if you don't share image between threads, but once images are being used simultaneously by more than one thread, which can happen implicitly sometimes via the operation cache, you have problems. |
Some more progress: git master now runs this test correctly, as long as you disable orc and don't use jpegload but use vips_black() or some similar pixel source. I'm going to dig into jpegload next and see if I can find a problem there. |
I found two more problems: first, the soak test needs to sleep for a short time before exiting, or vips_shutdown() may try to throw things away before the thread cleanup code runs. Second, there seems to be a bug in openslide which causes memory corruption under heavy load. I've opened an issue here: I've updated the soak test: https://gist.github.com/jcupitt/7de81323573665d8aafb If I build git master with openslide and orc disabled and run that test, I see:
That's 10,000 shrink operations, with 50 main threads working in parallel, and four worker threads within each of those, with no reported problems. Phew! That's with the operation cache enabled as well. Hopefully this is now fixed. |
Summary: use git master vips, disable orc, disable openslide, it should be OK. |
Thanks John, This will be applied also in maintenance releases for |
I made one quite large change to vips, probably too big for 7.42 really :-( so vips-8.0 and later only, sorry. |
I've updated the test program: https://gist.github.com/jcupitt/7de81323573665d8aafb It works against 7.42 as well, as long as you disable openslide and orc. That's orc-0.4.18, other orcs may work. It's looking like the openslide problem could possibly be a glib bug, see openslide/openslide#161 |
It is a glib bug! See openslide/openslide#161 (comment) I've added a workaround to vips 8.0.1, so you can enable openslide there. I'm seeing crashes with orc-0.4.22 and orc-0.4.18, so they need to be off. @lovell, do you have a list of known-bad orc versions anywhere? I think I remember you talking about this at one point. Otherwise, all OK now. I'll close this issue. |
Ok. Thank you John for mitigating this particular issue in great way! |
The list of known-good orc versions is easier :) I've found 0.4.22 to be the most stable and least leaky. Problems in prior versions were summarised by John here - lovell/sharp#21 (comment) lovell/sharp#172 is a report of 0.4.23 segfaulting but I've yet to reproduce or track down the problem. Thanks for persisting with this John. Memory leaks and thread-related segfaults are the worst of all things to debug. |
Hey ho, so we now have 0.4.22 as a confirmed crasher. I wonder if any orcs are really stable. Maybe we should switch to OpenCL. Yes, "system unstable under heavy load" is about the most depressing error report you can see :( I was very surprised and pleased it turned out to not be a vips problem. |
@jcupitt I'm also seeing this problem, when reading from a jpeg file in sequential or sequential-unbuffered access mode and saving it twice. On the first save the cache is not populated and I get no warnings, on the second save it uses the line cache and emits warnings. The problem does not occur when opening the jpeg in random access mode. Interestingly in both my and @h2non warnings it seems to be complaining about the last line (I used a 10000x10000px image, I'm assuming h2non had a 1920x1080px image). Sample Code: im = Vips.Image.new_from_file('Earth10k.jpg', access=Vips.Access.SEQUENTIAL)
# First write generates no warnings:
im.write_to_file('Earth10k2.jpg')
# Second write emits warnings:
im.write_to_file('Earth10k2.jpg') Warnings:
|
@felixbuenemann yes, you can only use sequential images once. For example, this is fine: >>> im = Vips.Image.new_from_file("/home/john/pics/k2.jpg")
>>> im.avg()
102.79218031609196
>>> im.deviate()
67.69273812492462 This will fail: >>> im = Vips.Image.new_from_file("/home/john/pics/k2.jpg", access = Vips.Access.SEQUENTIAL)
>>> im.avg()
102.79218031609196
>>> im.deviate()
vips warning: linecache: error reading tile 0x0: VipsJpeg: out of order read at line 2048
... Random access images are cached, sequential ones are not, so the solution is just to open each time you need to scan an image: >>> im = Vips.Image.new_from_file("/home/john/pics/k2.jpg", access = Vips.Access.SEQUENTIAL)
>>> im.avg()
102.79218031609196
>>> im = Vips.Image.new_from_file("/home/john/pics/k2.jpg", access = Vips.Access.SEQUENTIAL)
>>> im.deviate()
67.69273812492462 The second open will "rewind" the image. |
If you open a jpeg in random access mode, it'll decode the whole image on the first pixel access and store the decoded image in either a memory buffer (for small images) or a temporary file (for large ones). That large decoded image is put into the operation cache and can be reused many times from many threads. If you open in seq mode, small buffers of pixels are decoded as required as you scan down the image. This saves the memory and the big upfront CPU hit of decoding the whole thing, but it means you can only scan once, and only scan from one thread. |
@jcupitt Thanks For the explanation John, I've also read through http://libvips.blogspot.de/2012/06/how-libvips-opens-file.html and now have a better understanding. The blog post however mentions that sequential-unbuffered mode can't be used for resize, while it seems to work fine for me in that case (using vips_resize()). |
Yes, that's changed, |
Since migrating from 8.4.4 to 8.5.2 we see this issue (can you re-open it?), it isn't reproducible in test environments, not even in production when requesting the same images. It looks like it has to do with high load and lots of concurrent requests. Errors outputted look like "Message: VipsJpeg: out of order read at line 3016VipsJpeg: out of order read at line 3016VipsJpeg: out of order read at line 3016VipsJpeg: out of order read at line 3016", for ~0.03% of the requests (hence the not reproducible part). Only for JPEG's it seems. Really strange, and we cannot debug it at this time.. |
Oh no! Drat. Could you open a new issue? The underlying code has mostly changed, so the fix (if we can find one) will be different. |
I've opened a new issue: #639 |
I'm doing some performance tests and I'm stressing an HTTP service which uses libvips internally.
After a lot of requests (> 1000) with high concurrency, I'm getting hundred of warnings in stdout:
I guess this could be a cache memory issue, since the resident memory of the main process is increasing more than 1GB (but this is probably due to Go don't hesitate in allocating a lot of memory).
My current cache limit is 100MB.
Any idea about the scope of this issue? Thanks!
The text was updated successfully, but these errors were encountered: