Skip to content

Cache warning: linecache error reading tile #261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
h2non opened this issue Apr 13, 2015 · 36 comments
Closed

Cache warning: linecache error reading tile #261

h2non opened this issue Apr 13, 2015 · 36 comments

Comments

@h2non
Copy link

h2non commented Apr 13, 2015

I'm doing some performance tests and I'm stressing an HTTP service which uses libvips internally.

After a lot of requests (> 1000) with high concurrency, I'm getting hundred of warnings in stdout:

vips warning: linecache: error reading tile 0x0: VipsJpeg: out of order read at line 1080
VipsJpeg: out of order read at line 1080
VipsJpeg: out of order read at line 1080
VipsJpeg: out of order read at line 1080
VipsJpeg: out of order read at line 1080
VipsJpeg: out of order read at line 1080
VipsJpeg: out of order read at line 1080

I guess this could be a cache memory issue, since the resident memory of the main process is increasing more than 1GB (but this is probably due to Go don't hesitate in allocating a lot of memory).
My current cache limit is 100MB.

Any idea about the scope of this issue? Thanks!

@jcupitt
Copy link
Member

jcupitt commented Apr 13, 2015

It sounds like it's caching a sequential read. This shouldn't be possible, there's a thing to block this, but perhaps it's happening somehow. Could you show me how to run the thing that's failing on my machine?

The workaround would be to disable the cache, or not use sequential mode. Try vips_cache_set_max(0); in your startup somewhere.

http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/VipsOperation.html#vips-cache-set-max

Or don't use ACCESS_SEQ. Both will have an effect on performance and memory use, of course :-(

@jcupitt
Copy link
Member

jcupitt commented Apr 13, 2015

You could also try turning on cache tracing with vips_cache_set_trace():

http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/VipsOperation.html#vips-cache-set-trace

It'll print a line for every vips operation it runs, something like:

$ vips embed k2.jpg x.jpg 10 10 1000 1000 --vips-cache-trace
vips cache : jpegload filename="k2.jpg" access=((VipsAccess) VIPS_ACCESS_SEQUENTIAL_UNBUFFERED) -
vips cache+: black width=1 height=1 bands=3 -
vips cache : copy in=((VipsImage*) 0xdc94b0) -
vips cache+: cast in=((VipsImage*) 0xdc97d0) format=((VipsBandFormat) VIPS_FORMAT_UCHAR) -
...etc.

The '+' means the operation has been executed and added to the cache, a space means executed but not added to cache. Cache hits are not shown (perhaps they should be).

It'll make a huge log, of course. Look for the lines before your error and see if the jpegload is being reused when it shouldn't be.

@h2non
Copy link
Author

h2non commented Apr 13, 2015

Here it's the output, merging both stdout and stderr.
You can see also the HTTP log between the traces, but the order isn't too much relevant due to the concurrency.

https://gist.github.com/h2non/4d5466ff3127812a21df

@h2non
Copy link
Author

h2non commented Apr 13, 2015

Sorry, I've just updated the gist with a proper longer traces. You can search for vips warning

@h2non
Copy link
Author

h2non commented Apr 13, 2015

I've performed other tests and the issue is only reproducible with JPEG images. PNG and WEBP images works well, also with higher loads than JPEG

@jcupitt
Copy link
Member

jcupitt commented Apr 13, 2015

Oh, interesting. I guess you tried PNG as well? That's closest to JPG in the way the reader works.

@h2non
Copy link
Author

h2non commented Apr 13, 2015

I've tried both PNG and WebP without any trouble

@h2non
Copy link
Author

h2non commented Apr 13, 2015

Maybe increasing the cache size could help with that? If yes, I think it's still a workaround solution since it only mitigates part of the problem.

What about inferring the cache size or whatever and switching to another read mode at runtime? Any other ideas without having huge impact in the performance?

As alternative but reliable solution could be defining a limit of the maximum number of concurrent vips processes based on the cache/memory status, like a simple throttle strategy, but I'd prefer avoid that and be faster and responsible as max as possible

@jcupitt
Copy link
Member

jcupitt commented Apr 13, 2015

I can't work out what's gone wrong from the trace :-( I'll see I can make the output more useful.

@h2non
Copy link
Author

h2non commented Apr 13, 2015

No problem. I'll try to investigate as well, but regarding to the trace, it's not simple to split the traces since the issue seems to be only reproducible after a high load...

Do you have at hand some libvips benchmarks based on hundred of operations and high concurrency?

I'll try doing some analysis of the libvips memory stats and increasing the cache size

@jcupitt
Copy link
Member

jcupitt commented Apr 13, 2015

@lovell has done some soak testing for sharp and (I think?) it's worked OK. He may not have had so much threading.

@lovell
Copy link
Member

lovell commented Apr 13, 2015

@h2non If you're letting golang deal with concurrency, perhaps try setting the VIPS_CONCURRENCY=1 environment variable to prevent libvips spawning nCPU additional threads per image.

Over the past year I've carried out some fairly extensive memory/soak testing of the parts of libvips that sharp uses, which has led to the discovery (and subsequent fix) of a few leaks and buffer overruns. IIRC I was running a max of 16 threads.

@lovell
Copy link
Member

lovell commented Apr 13, 2015

The linecache: error reading tile warning is generated at https://github.com/jcupitt/libvips/blob/master/libvips/conversion/tilecache.c#L702-L706 but https://github.com/jcupitt/libvips/issues/15 suggests vips_error_buffer() is not thread-safe. If so, the message you get may not match the error condition.

@jcupitt
Copy link
Member

jcupitt commented Apr 14, 2015

I made a test program:

https://gist.github.com/jcupitt/7de81323573665d8aafb

It works fine with fewer than about 15 threads, but I start to see a range of bad things with (for example) 50 threads:

$ ./a.out /data/john/pics/k2.jpg 
Segmentation fault (core dumped)
$ ./a.out /data/john/pics/k2.jpg 
$ ./a.out /data/john/pics/k2.jpg 
*** Error in `./a.out': free(): invalid next size (fast): 0x00007f8b1800af70 ***
Aborted (core dumped)
$ ./a.out /data/john/pics/k2.jpg 
vips warning: vips_image_wio_input: rewinding image with active regions
Segmentation fault (core dumped)

I'll try without Orc and with other file formats.

@h2non
Copy link
Author

h2non commented Apr 15, 2015

Definitively it seems like here's a problem with thread-unsafe code.
I made some tests based on multiple scenarios with different configurations, from limiting the number of libvips concurrency until using a simple window time frame based throttle strategy in from the HTTP server layer. As conclusion, with a high load, which increases the probability of experimenting some issue with thread-unsafe related concurrent access, the problem still persist, with a panic unexpected exit in some cases.

I'm not too happy about the idea of using a concurrency of 1 thread in libvips. This is an important bottleneck in my case, since Go is natively highly thread-focused, specially in HTTP

@h2non
Copy link
Author

h2non commented Apr 15, 2015

How much is required calling to vips_thread_shutdown()? Maybe the trouble could be around there, since I'm calling it after a set of image operations has been performed.

@lovell
Copy link
Member

lovell commented Apr 15, 2015

"I'm not too happy about the idea of using a concurrency of 1 thread in libvips. This is an important bottleneck in my case, since Go is natively highly thread-focused, specially in HTTP"

I believe vips_concurrency_set(1) refers to a maximum of 1 thread per image processing pipeline, i.e. use the current thread and don't spawn any more.

The (maximum) number of images Go will process concurrently is the number of threads the golang runtime is managing on your behalf, limited by GOMAXPROCS.

If you're not already doing so, you may need to wrap all the vips methods that constitute processing each image with runtime.LockOSThread() and runtime.UnlockOSThread() to guarantee they execute on the same thread.

@jcupitt
Copy link
Member

jcupitt commented Apr 19, 2015

I've made some progress on this: I've found a couple of threading errors with that soak program and stability is improving. Not quite there yet though, I'll keep digging.

This was the big patch: jcupitt@54c37e7 it removes vips_image_wio_input() and adds vips_image_copy_memory() instead. vips_image_wio_input() was fine when we had a single main thread, and fine if you don't share image between threads, but once images are being used simultaneously by more than one thread, which can happen implicitly sometimes via the operation cache, you have problems.

@jcupitt
Copy link
Member

jcupitt commented Apr 20, 2015

Some more progress: git master now runs this test correctly, as long as you disable orc and don't use jpegload but use vips_black() or some similar pixel source.

I'm going to dig into jpegload next and see if I can find a problem there.

@jcupitt
Copy link
Member

jcupitt commented Apr 23, 2015

I found two more problems: first, the soak test needs to sleep for a short time before exiting, or vips_shutdown() may try to throw things away before the thread cleanup code runs.

Second, there seems to be a bug in openslide which causes memory corruption under heavy load. I've opened an issue here:

openslide/openslide#161

I've updated the soak test:

https://gist.github.com/jcupitt/7de81323573665d8aafb

If I build git master with openslide and orc disabled and run that test, I see:

$ for i in {1..10}; do ./a.out ~/pics/k2.jpg; done
memory: high-water mark 192.07 MB
memory: high-water mark 238.37 MB
memory: high-water mark 226.02 MB
memory: high-water mark 233.05 MB
memory: high-water mark 182.09 MB
memory: high-water mark 195.88 MB
memory: high-water mark 202.06 MB
memory: high-water mark 208.51 MB
memory: high-water mark 239.40 MB
memory: high-water mark 164.51 MB

That's 10,000 shrink operations, with 50 main threads working in parallel, and four worker threads within each of those, with no reported problems. Phew! That's with the operation cache enabled as well.

Hopefully this is now fixed.

@jcupitt
Copy link
Member

jcupitt commented Apr 24, 2015

Summary: use git master vips, disable orc, disable openslide, it should be OK.

@h2non
Copy link
Author

h2non commented Apr 24, 2015

Thanks John,

This will be applied also in maintenance releases for 7.42.x and future 8.x versions?

@jcupitt
Copy link
Member

jcupitt commented Apr 24, 2015

I made one quite large change to vips, probably too big for 7.42 really :-( so vips-8.0 and later only, sorry.

@jcupitt
Copy link
Member

jcupitt commented Apr 25, 2015

I've updated the test program:

https://gist.github.com/jcupitt/7de81323573665d8aafb

It works against 7.42 as well, as long as you disable openslide and orc. That's orc-0.4.18, other orcs may work.

It's looking like the openslide problem could possibly be a glib bug, see openslide/openslide#161

@jcupitt
Copy link
Member

jcupitt commented Apr 26, 2015

It is a glib bug! See openslide/openslide#161 (comment)

I've added a workaround to vips 8.0.1, so you can enable openslide there.

I'm seeing crashes with orc-0.4.22 and orc-0.4.18, so they need to be off. @lovell, do you have a list of known-bad orc versions anywhere? I think I remember you talking about this at one point.

Otherwise, all OK now. I'll close this issue.

@jcupitt jcupitt closed this as completed Apr 26, 2015
@h2non
Copy link
Author

h2non commented Apr 26, 2015

Ok. Thank you John for mitigating this particular issue in great way!

@lovell
Copy link
Member

lovell commented Apr 26, 2015

The list of known-good orc versions is easier :) I've found 0.4.22 to be the most stable and least leaky.

Problems in prior versions were summarised by John here - lovell/sharp#21 (comment)

lovell/sharp#172 is a report of 0.4.23 segfaulting but I've yet to reproduce or track down the problem.

Thanks for persisting with this John. Memory leaks and thread-related segfaults are the worst of all things to debug.

@jcupitt
Copy link
Member

jcupitt commented Apr 26, 2015

Hey ho, so we now have 0.4.22 as a confirmed crasher. I wonder if any orcs are really stable. Maybe we should switch to OpenCL.

Yes, "system unstable under heavy load" is about the most depressing error report you can see :( I was very surprised and pleased it turned out to not be a vips problem.

@felixbuenemann
Copy link
Collaborator

@jcupitt I'm also seeing this problem, when reading from a jpeg file in sequential or sequential-unbuffered access mode and saving it twice. On the first save the cache is not populated and I get no warnings, on the second save it uses the line cache and emits warnings. The problem does not occur when opening the jpeg in random access mode.

Interestingly in both my and @h2non warnings it seems to be complaining about the last line (I used a 10000x10000px image, I'm assuming h2non had a 1920x1080px image).

Sample Code:

im = Vips.Image.new_from_file('Earth10k.jpg', access=Vips.Access.SEQUENTIAL)
# First write generates no warnings:
im.write_to_file('Earth10k2.jpg')
# Second write emits warnings:
im.write_to_file('Earth10k2.jpg')

Warnings:

vips warning: linecache: error reading tile 0x0: VipsJpeg: out of order read at line 10000
vips warning: linecache: error reading tile 0x8: VipsJpeg: out of order read at line 10000
vips warning: linecache: error reading tile 0x64: VipsJpeg: out of order read at line 10000
vips warning: linecache: error reading tile 0x72: VipsJpeg: out of order read at line 10000
vips warning: linecache: error reading tile 0x16: VipsJpeg: out of order read at line 10000
vips warning: linecache: error reading tile 0x24: VipsJpeg: out of order read at line 10000
[snip]
vips warning: linecache: error reading tile 0x9984: VipsJpeg: out of order read at line 10000
vips warning: linecache: error reading tile 0x9992: VipsJpeg: out of order read at line 10000

@jcupitt
Copy link
Member

jcupitt commented May 13, 2016

@felixbuenemann yes, you can only use sequential images once. For example, this is fine:

>>> im = Vips.Image.new_from_file("/home/john/pics/k2.jpg")
>>> im.avg()
102.79218031609196
>>> im.deviate()
67.69273812492462

This will fail:

>>> im = Vips.Image.new_from_file("/home/john/pics/k2.jpg", access = Vips.Access.SEQUENTIAL)
>>> im.avg()
102.79218031609196
>>> im.deviate()
vips warning: linecache: error reading tile 0x0: VipsJpeg: out of order read at line 2048
...

Random access images are cached, sequential ones are not, so the solution is just to open each time you need to scan an image:

>>> im = Vips.Image.new_from_file("/home/john/pics/k2.jpg", access = Vips.Access.SEQUENTIAL)
>>> im.avg()
102.79218031609196
>>> im = Vips.Image.new_from_file("/home/john/pics/k2.jpg", access = Vips.Access.SEQUENTIAL)
>>> im.deviate()
67.69273812492462

The second open will "rewind" the image.

@jcupitt
Copy link
Member

jcupitt commented May 13, 2016

If you open a jpeg in random access mode, it'll decode the whole image on the first pixel access and store the decoded image in either a memory buffer (for small images) or a temporary file (for large ones). That large decoded image is put into the operation cache and can be reused many times from many threads.

If you open in seq mode, small buffers of pixels are decoded as required as you scan down the image. This saves the memory and the big upfront CPU hit of decoding the whole thing, but it means you can only scan once, and only scan from one thread.

@felixbuenemann
Copy link
Collaborator

@jcupitt Thanks For the explanation John, I've also read through http://libvips.blogspot.de/2012/06/how-libvips-opens-file.html and now have a better understanding.

The blog post however mentions that sequential-unbuffered mode can't be used for resize, while it seems to work fine for me in that case (using vips_resize()).

@jcupitt
Copy link
Member

jcupitt commented May 13, 2016

Yes, that's changed, vips_resize() should work in seq-unbuf mode now. The new resizer is better about memory access.

@andrieslouw
Copy link

Since migrating from 8.4.4 to 8.5.2 we see this issue (can you re-open it?), it isn't reproducible in test environments, not even in production when requesting the same images. It looks like it has to do with high load and lots of concurrent requests. Errors outputted look like "Message: VipsJpeg: out of order read at line 3016VipsJpeg: out of order read at line 3016VipsJpeg: out of order read at line 3016VipsJpeg: out of order read at line 3016", for ~0.03% of the requests (hence the not reproducible part). Only for JPEG's it seems. Really strange, and we cannot debug it at this time..

@jcupitt
Copy link
Member

jcupitt commented Apr 12, 2017

Oh no! Drat.

Could you open a new issue? The underlying code has mostly changed, so the fix (if we can find one) will be different.

@jcupitt
Copy link
Member

jcupitt commented Apr 12, 2017

I've opened a new issue: #639

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants