Cache warning: linecache error reading tile #261

h2non · 2015-04-13T07:45:05Z

I'm doing some performance tests and I'm stressing an HTTP service which uses libvips internally.

After a lot of requests (> 1000) with high concurrency, I'm getting hundred of warnings in stdout:

vips warning: linecache: error reading tile 0x0: VipsJpeg: out of order read at line 1080
VipsJpeg: out of order read at line 1080
VipsJpeg: out of order read at line 1080
VipsJpeg: out of order read at line 1080
VipsJpeg: out of order read at line 1080
VipsJpeg: out of order read at line 1080
VipsJpeg: out of order read at line 1080

I guess this could be a cache memory issue, since the resident memory of the main process is increasing more than 1GB (but this is probably due to Go don't hesitate in allocating a lot of memory).
My current cache limit is 100MB.

Any idea about the scope of this issue? Thanks!

jcupitt · 2015-04-13T08:53:15Z

It sounds like it's caching a sequential read. This shouldn't be possible, there's a thing to block this, but perhaps it's happening somehow. Could you show me how to run the thing that's failing on my machine?

The workaround would be to disable the cache, or not use sequential mode. Try vips_cache_set_max(0); in your startup somewhere.

http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/VipsOperation.html#vips-cache-set-max

Or don't use ACCESS_SEQ. Both will have an effect on performance and memory use, of course :-(

jcupitt · 2015-04-13T10:04:05Z

You could also try turning on cache tracing with vips_cache_set_trace():

http://www.vips.ecs.soton.ac.uk/supported/current/doc/html/libvips/VipsOperation.html#vips-cache-set-trace

It'll print a line for every vips operation it runs, something like:

$ vips embed k2.jpg x.jpg 10 10 1000 1000 --vips-cache-trace
vips cache : jpegload filename="k2.jpg" access=((VipsAccess) VIPS_ACCESS_SEQUENTIAL_UNBUFFERED) -
vips cache+: black width=1 height=1 bands=3 -
vips cache : copy in=((VipsImage*) 0xdc94b0) -
vips cache+: cast in=((VipsImage*) 0xdc97d0) format=((VipsBandFormat) VIPS_FORMAT_UCHAR) -
...etc.

The '+' means the operation has been executed and added to the cache, a space means executed but not added to cache. Cache hits are not shown (perhaps they should be).

It'll make a huge log, of course. Look for the lines before your error and see if the jpegload is being reused when it shouldn't be.

h2non · 2015-04-13T12:19:06Z

Here it's the output, merging both stdout and stderr.
You can see also the HTTP log between the traces, but the order isn't too much relevant due to the concurrency.

https://gist.github.com/h2non/4d5466ff3127812a21df

h2non · 2015-04-13T12:22:33Z

Sorry, I've just updated the gist with a proper longer traces. You can search for vips warning

h2non · 2015-04-13T13:50:26Z

I've performed other tests and the issue is only reproducible with JPEG images. PNG and WEBP images works well, also with higher loads than JPEG

jcupitt · 2015-04-13T14:16:05Z

Oh, interesting. I guess you tried PNG as well? That's closest to JPG in the way the reader works.

h2non · 2015-04-13T14:36:07Z

I've tried both PNG and WebP without any trouble

h2non · 2015-04-13T14:45:39Z

Maybe increasing the cache size could help with that? If yes, I think it's still a workaround solution since it only mitigates part of the problem.

What about inferring the cache size or whatever and switching to another read mode at runtime? Any other ideas without having huge impact in the performance?

As alternative but reliable solution could be defining a limit of the maximum number of concurrent vips processes based on the cache/memory status, like a simple throttle strategy, but I'd prefer avoid that and be faster and responsible as max as possible

jcupitt · 2015-04-13T14:59:09Z

I can't work out what's gone wrong from the trace :-( I'll see I can make the output more useful.

h2non · 2015-04-13T15:10:06Z

No problem. I'll try to investigate as well, but regarding to the trace, it's not simple to split the traces since the issue seems to be only reproducible after a high load...

Do you have at hand some libvips benchmarks based on hundred of operations and high concurrency?

I'll try doing some analysis of the libvips memory stats and increasing the cache size

jcupitt · 2015-04-13T16:02:18Z

@lovell has done some soak testing for sharp and (I think?) it's worked OK. He may not have had so much threading.

lovell · 2015-04-13T17:39:59Z

@h2non If you're letting golang deal with concurrency, perhaps try setting the VIPS_CONCURRENCY=1 environment variable to prevent libvips spawning nCPU additional threads per image.

Over the past year I've carried out some fairly extensive memory/soak testing of the parts of libvips that sharp uses, which has led to the discovery (and subsequent fix) of a few leaks and buffer overruns. IIRC I was running a max of 16 threads.

lovell · 2015-04-13T17:47:14Z

The linecache: error reading tile warning is generated at https://github.com/jcupitt/libvips/blob/master/libvips/conversion/tilecache.c#L702-L706 but https://github.com/jcupitt/libvips/issues/15 suggests vips_error_buffer() is not thread-safe. If so, the message you get may not match the error condition.

jcupitt · 2015-04-14T13:43:59Z

I made a test program:

https://gist.github.com/jcupitt/7de81323573665d8aafb

It works fine with fewer than about 15 threads, but I start to see a range of bad things with (for example) 50 threads:

$ ./a.out /data/john/pics/k2.jpg 
Segmentation fault (core dumped)
$ ./a.out /data/john/pics/k2.jpg 
$ ./a.out /data/john/pics/k2.jpg 
*** Error in `./a.out': free(): invalid next size (fast): 0x00007f8b1800af70 ***
Aborted (core dumped)
$ ./a.out /data/john/pics/k2.jpg 
vips warning: vips_image_wio_input: rewinding image with active regions
Segmentation fault (core dumped)

I'll try without Orc and with other file formats.

h2non · 2015-04-15T00:01:38Z

Definitively it seems like here's a problem with thread-unsafe code.
I made some tests based on multiple scenarios with different configurations, from limiting the number of libvips concurrency until using a simple window time frame based throttle strategy in from the HTTP server layer. As conclusion, with a high load, which increases the probability of experimenting some issue with thread-unsafe related concurrent access, the problem still persist, with a panic unexpected exit in some cases.

I'm not too happy about the idea of using a concurrency of 1 thread in libvips. This is an important bottleneck in my case, since Go is natively highly thread-focused, specially in HTTP

h2non · 2015-04-15T19:16:49Z

How much is required calling to vips_thread_shutdown()? Maybe the trouble could be around there, since I'm calling it after a set of image operations has been performed.

lovell · 2015-04-15T19:44:47Z

"I'm not too happy about the idea of using a concurrency of 1 thread in libvips. This is an important bottleneck in my case, since Go is natively highly thread-focused, specially in HTTP"

I believe vips_concurrency_set(1) refers to a maximum of 1 thread per image processing pipeline, i.e. use the current thread and don't spawn any more.

The (maximum) number of images Go will process concurrently is the number of threads the golang runtime is managing on your behalf, limited by GOMAXPROCS.

If you're not already doing so, you may need to wrap all the vips methods that constitute processing each image with runtime.LockOSThread() and runtime.UnlockOSThread() to guarantee they execute on the same thread.

jcupitt · 2015-04-19T12:24:01Z

I've made some progress on this: I've found a couple of threading errors with that soak program and stability is improving. Not quite there yet though, I'll keep digging.

This was the big patch: jcupitt@54c37e7 it removes vips_image_wio_input() and adds vips_image_copy_memory() instead. vips_image_wio_input() was fine when we had a single main thread, and fine if you don't share image between threads, but once images are being used simultaneously by more than one thread, which can happen implicitly sometimes via the operation cache, you have problems.

jcupitt · 2015-04-20T17:00:32Z

Some more progress: git master now runs this test correctly, as long as you disable orc and don't use jpegload but use vips_black() or some similar pixel source.

I'm going to dig into jpegload next and see if I can find a problem there.

jcupitt · 2015-04-23T20:39:38Z

I found two more problems: first, the soak test needs to sleep for a short time before exiting, or vips_shutdown() may try to throw things away before the thread cleanup code runs.

Second, there seems to be a bug in openslide which causes memory corruption under heavy load. I've opened an issue here:

openslide/openslide#161

I've updated the soak test:

https://gist.github.com/jcupitt/7de81323573665d8aafb

If I build git master with openslide and orc disabled and run that test, I see:

$ for i in {1..10}; do ./a.out ~/pics/k2.jpg; done
memory: high-water mark 192.07 MB
memory: high-water mark 238.37 MB
memory: high-water mark 226.02 MB
memory: high-water mark 233.05 MB
memory: high-water mark 182.09 MB
memory: high-water mark 195.88 MB
memory: high-water mark 202.06 MB
memory: high-water mark 208.51 MB
memory: high-water mark 239.40 MB
memory: high-water mark 164.51 MB

That's 10,000 shrink operations, with 50 main threads working in parallel, and four worker threads within each of those, with no reported problems. Phew! That's with the operation cache enabled as well.

Hopefully this is now fixed.

jcupitt · 2015-04-24T08:43:15Z

Summary: use git master vips, disable orc, disable openslide, it should be OK.

h2non · 2015-04-24T09:28:43Z

Thanks John,

This will be applied also in maintenance releases for 7.42.x and future 8.x versions?

jcupitt · 2015-04-24T09:32:50Z

I made one quite large change to vips, probably too big for 7.42 really :-( so vips-8.0 and later only, sorry.

jcupitt · 2015-04-25T08:56:47Z

I've updated the test program:

https://gist.github.com/jcupitt/7de81323573665d8aafb

It works against 7.42 as well, as long as you disable openslide and orc. That's orc-0.4.18, other orcs may work.

It's looking like the openslide problem could possibly be a glib bug, see openslide/openslide#161

jcupitt · 2015-04-26T09:37:49Z

It is a glib bug! See openslide/openslide#161 (comment)

I've added a workaround to vips 8.0.1, so you can enable openslide there.

I'm seeing crashes with orc-0.4.22 and orc-0.4.18, so they need to be off. @lovell, do you have a list of known-bad orc versions anywhere? I think I remember you talking about this at one point.

Otherwise, all OK now. I'll close this issue.

h2non · 2015-04-26T11:04:01Z

Ok. Thank you John for mitigating this particular issue in great way!

lovell · 2015-04-26T12:45:44Z

The list of known-good orc versions is easier :) I've found 0.4.22 to be the most stable and least leaky.

Problems in prior versions were summarised by John here - lovell/sharp#21 (comment)

lovell/sharp#172 is a report of 0.4.23 segfaulting but I've yet to reproduce or track down the problem.

Thanks for persisting with this John. Memory leaks and thread-related segfaults are the worst of all things to debug.

jcupitt · 2015-04-26T15:28:33Z

Hey ho, so we now have 0.4.22 as a confirmed crasher. I wonder if any orcs are really stable. Maybe we should switch to OpenCL.

Yes, "system unstable under heavy load" is about the most depressing error report you can see :( I was very surprised and pleased it turned out to not be a vips problem.

felixbuenemann · 2016-05-12T23:23:19Z

@jcupitt I'm also seeing this problem, when reading from a jpeg file in sequential or sequential-unbuffered access mode and saving it twice. On the first save the cache is not populated and I get no warnings, on the second save it uses the line cache and emits warnings. The problem does not occur when opening the jpeg in random access mode.

Interestingly in both my and @h2non warnings it seems to be complaining about the last line (I used a 10000x10000px image, I'm assuming h2non had a 1920x1080px image).

Sample Code:

im = Vips.Image.new_from_file('Earth10k.jpg', access=Vips.Access.SEQUENTIAL)
# First write generates no warnings:
im.write_to_file('Earth10k2.jpg')
# Second write emits warnings:
im.write_to_file('Earth10k2.jpg')

Warnings:

vips warning: linecache: error reading tile 0x0: VipsJpeg: out of order read at line 10000
vips warning: linecache: error reading tile 0x8: VipsJpeg: out of order read at line 10000
vips warning: linecache: error reading tile 0x64: VipsJpeg: out of order read at line 10000
vips warning: linecache: error reading tile 0x72: VipsJpeg: out of order read at line 10000
vips warning: linecache: error reading tile 0x16: VipsJpeg: out of order read at line 10000
vips warning: linecache: error reading tile 0x24: VipsJpeg: out of order read at line 10000
[snip]
vips warning: linecache: error reading tile 0x9984: VipsJpeg: out of order read at line 10000
vips warning: linecache: error reading tile 0x9992: VipsJpeg: out of order read at line 10000

jcupitt · 2016-05-13T08:45:22Z

@felixbuenemann yes, you can only use sequential images once. For example, this is fine:

>>> im = Vips.Image.new_from_file("/home/john/pics/k2.jpg")
>>> im.avg()
102.79218031609196
>>> im.deviate()
67.69273812492462

This will fail:

>>> im = Vips.Image.new_from_file("/home/john/pics/k2.jpg", access = Vips.Access.SEQUENTIAL)
>>> im.avg()
102.79218031609196
>>> im.deviate()
vips warning: linecache: error reading tile 0x0: VipsJpeg: out of order read at line 2048
...

Random access images are cached, sequential ones are not, so the solution is just to open each time you need to scan an image:

>>> im = Vips.Image.new_from_file("/home/john/pics/k2.jpg", access = Vips.Access.SEQUENTIAL)
>>> im.avg()
102.79218031609196
>>> im = Vips.Image.new_from_file("/home/john/pics/k2.jpg", access = Vips.Access.SEQUENTIAL)
>>> im.deviate()
67.69273812492462

The second open will "rewind" the image.

jcupitt · 2016-05-13T08:52:10Z

If you open a jpeg in random access mode, it'll decode the whole image on the first pixel access and store the decoded image in either a memory buffer (for small images) or a temporary file (for large ones). That large decoded image is put into the operation cache and can be reused many times from many threads.

If you open in seq mode, small buffers of pixels are decoded as required as you scan down the image. This saves the memory and the big upfront CPU hit of decoding the whole thing, but it means you can only scan once, and only scan from one thread.

felixbuenemann · 2016-05-13T13:00:26Z

@jcupitt Thanks For the explanation John, I've also read through http://libvips.blogspot.de/2012/06/how-libvips-opens-file.html and now have a better understanding.

The blog post however mentions that sequential-unbuffered mode can't be used for resize, while it seems to work fine for me in that case (using vips_resize()).

jcupitt · 2016-05-13T13:15:55Z

Yes, that's changed, vips_resize() should work in seq-unbuf mode now. The new resizer is better about memory access.

andrieslouw · 2017-04-11T17:52:06Z

Since migrating from 8.4.4 to 8.5.2 we see this issue (can you re-open it?), it isn't reproducible in test environments, not even in production when requesting the same images. It looks like it has to do with high load and lots of concurrent requests. Errors outputted look like "Message: VipsJpeg: out of order read at line 3016VipsJpeg: out of order read at line 3016VipsJpeg: out of order read at line 3016VipsJpeg: out of order read at line 3016", for ~0.03% of the requests (hence the not reproducible part). Only for JPEG's it seems. Really strange, and we cannot debug it at this time..

jcupitt · 2017-04-12T09:41:54Z

Oh no! Drat.

Could you open a new issue? The underlying code has mostly changed, so the fix (if we can find one) will be different.

jcupitt · 2017-04-12T09:44:26Z

I've opened a new issue: #639

h2non mentioned this issue Apr 13, 2015

Process panic after heavy HTTP load h2non/imaginary#22

Closed

jcupitt added the bug label Apr 14, 2015

lovell mentioned this issue Apr 14, 2015

Fix small memory leak in openslide loader #263

Merged

h2non mentioned this issue Apr 15, 2015

Limit concurrent vips threads h2non/bimg#14

Closed

jcupitt mentioned this issue Apr 22, 2015

Crash on repeated JPEG writes with specific file #274

Closed

jcupitt added the in development label Apr 23, 2015

jcupitt closed this as completed Apr 26, 2015

lovell mentioned this issue May 25, 2015

GLib-GObject-CRITICAL G_IS_VALUE lovell/sharp#216

Closed

KrissKulins mentioned this issue Feb 10, 2017

Lambda storage space shrinking lovell/sharp#707

Closed

jcupitt mentioned this issue Apr 12, 2017

out of order read error #639

Closed

andrieslouw mentioned this issue Apr 15, 2017

Issue affecting 0.03% of JPEG-resizes weserv/images#91

Closed

priyankatapar mentioned this issue Apr 17, 2019

Segmentation fault for specific pdf on alpine linux docker container #1287

Closed

zzjin mentioned this issue Dec 16, 2020

Wired png performance vs pure go thumbnail davidbyttow/govips#143

Open

Uh oh!

Cache warning: linecache error reading tile #261

Cache warning: linecache error reading tile #261

Comments

h2non commented Apr 13, 2015

jcupitt commented Apr 13, 2015

Uh oh!

jcupitt commented Apr 13, 2015

Uh oh!

h2non commented Apr 13, 2015

Uh oh!

h2non commented Apr 13, 2015

Uh oh!

h2non commented Apr 13, 2015

Uh oh!

jcupitt commented Apr 13, 2015

Uh oh!

h2non commented Apr 13, 2015

Uh oh!

h2non commented Apr 13, 2015

Uh oh!

jcupitt commented Apr 13, 2015

Uh oh!

h2non commented Apr 13, 2015

Uh oh!

jcupitt commented Apr 13, 2015

Uh oh!

lovell commented Apr 13, 2015

Uh oh!

lovell commented Apr 13, 2015

Uh oh!

jcupitt commented Apr 14, 2015

Uh oh!

h2non commented Apr 15, 2015

Uh oh!

h2non commented Apr 15, 2015

Uh oh!

lovell commented Apr 15, 2015

Uh oh!

jcupitt commented Apr 19, 2015

Uh oh!

jcupitt commented Apr 20, 2015

Uh oh!

jcupitt commented Apr 23, 2015

Uh oh!

jcupitt commented Apr 24, 2015

Uh oh!

h2non commented Apr 24, 2015

Uh oh!

jcupitt commented Apr 24, 2015

Uh oh!

jcupitt commented Apr 25, 2015

Uh oh!

jcupitt commented Apr 26, 2015

Uh oh!

h2non commented Apr 26, 2015

Uh oh!

lovell commented Apr 26, 2015

Uh oh!

jcupitt commented Apr 26, 2015

Uh oh!

felixbuenemann commented May 12, 2016

Uh oh!

jcupitt commented May 13, 2016

Uh oh!

jcupitt commented May 13, 2016

Uh oh!

felixbuenemann commented May 13, 2016

Uh oh!

jcupitt commented May 13, 2016

Uh oh!

andrieslouw commented Apr 11, 2017

Uh oh!

jcupitt commented Apr 12, 2017

Uh oh!

jcupitt commented Apr 12, 2017

Uh oh!