-
-
Notifications
You must be signed in to change notification settings - Fork 700
out of order read error #639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related issue: #261 Test program from the previous issue: https://gist.github.com/jcupitt/7de81323573665d8aafb |
I've updated the gist slightly, but it runs cleanly here. That's 50 parallel workers, each running @andrieslouw, what is your workload? Is there some other operator we could test? |
After some debugging and testing, I could make a reproducible test case (in PHP) which gives sporadic
And we see these errors in our PHP-FPM log:
Attachment: reproducible test case |
@jcupitt: The testcase from @kleisauke is based on our workload. Production does 45 req/s average, with 26 simultaneous threads running (~500ms response time, including fetching of images). Peakload is 80 req/s with 85 simultaneous requests (~1100ms response time including fetching). Processing by libvips takes ~100ms at average, and issue looks independent of load (percentage wise). I guess it could be reproduced with 40 requests per second, and 4 or 5 simultaneous threads. |
Hey great, I'll test here. Thanks! |
I tried:
Which (on this laptop) will run four copies of |
I think I've found a problem in buffer sizing. Hang on, I'll try a patch. |
we had output buffers too large, input caches too small see https://github.com/jcupitt/libvips/issues/639
Could you try with HEAD in the 8.5 branch? I've revised the cache sizing, it might fix your issue. |
Tested it with the 8.5 branch (in our test environment) and it seems that this problem has been fixed, thanks John! 🎉 We can't test this on our production server (yet) because we depend on the libvips provided by Remi's RPM repository. I've also updated the reproducible test case (to only output if there's a libvips exception). You can test it with: parallel "php test-loop.php" ::: {1..10} With the 8.5 branch there are no errors and with the 8.5.2 release there are many Attachment: updated reproducible test case |
\o/ Hey, that's great! I pushed another small improvement, but your test still passes. I'll do 8.5.3 tomorrow. Have you looked at using the new
That's shrinking a 9400 x 9400 pixel image to 94 x 94. The quality is the same. |
Thanks for releasing 8.5.3! The test case still passes in our test environment. I'll test it out in production if 8.5.3 appears in the RPM repository. We've considered the I think the speed up (you've mentioned) is due to shrink-on-load. We've already implemented this in our resize class, it's indeed a nice improvement in speed. |
Ah, OK, good to hear you're doing that. I don't see code for doing the last 200%+ with reduce, do you have that? You need to do at least the final 200% with https://github.com/jcupitt/libvips/wiki/HOWTO----Image-shrinking#block-shrink Or maybe you have this and I'm being dumb. Do you do auto-rotate? That has a slightly tricky thing where you'll need to render to a memory buffer before the rotate operation. Same for smartcrop, if you do that. |
Ah I see, we're not leaving at least a factor of two for the final resize. The thumbnail operator does. sharp doesn't do this either. I'll do some experimenting to see if there's a big difference. About the auto-rotate: The trick to render to a memory buffer is currently not possible for us because the |
You're right, the php API would need some more things to allow render to memory, I'll try to think of the simplest. That test image of the house is a good one for exposing aliasing problems. You don't need to render to memory for trim with 8.5, the new sequential mode can do this case. |
There's indeed an aliasing problem due to shrink-on-load (see here and the fixed one here). Not sure why we haven't previously encountered this problem. Thanks for the heads-up. I've tried to implement the thumbnail operator in our current setting (on this branch). I found a few problems (should I open a new issue?):
For the rest, thanks for moving the |
You could trim the output of For pages, you could embed the page number in the filename, for example: $shrunk_image = Vips\Image\thumbnail($filename + "[page=] + $page_no + "]", 100); Not ideal I guess. You'd need to sniff the file format before doing this, of course. You're right, there's no way to break the aspect ratio. Could you explain why people want to do this? It seems puzzling to me. |
I added $new_image = |
I think we're done, I'll close. Thanks for reporting this! |
@jcupitt Today we've updated libvips to 8.5.3 on our production server. Unfortunately we're still seeing
You'll see this appear in the shell:
(looks like If you want, I can send you all images and target dimensions (which generate these errors) by email. Attachment: updated reproducible test case |
Argh! Sorry @kleisauke, I was sure this had been nailed. Yes, I see the error here too, thank you for the test case. I'll have a look. |
I've found the problem: two bits of vips are disagreeing about cache sizing, so one ends up smaller than the other expects. I should have a fix in a few hours. |
we were sizing buffers partly by image width, which could cause caches to be too small if width changed down a pipeline see https://github.com/jcupitt/libvips/issues/639
OK, pushed a fix, it seems to work now. I've added a lot more tests too, so hopefully this can't happen again. You could trigger the problem on a 4 thread machine with just:
Where
To fix this, caches are now sized the same everywhere, independent of image width. I've added a new thing too, The tests for I'll do an 8.5.4 later today. |
... and thanks for the great bug report! It was (relatively) easy to track down thanks to your testcase! |
OK, 8.5.4 is up, I'll nudge homebrew too. |
Remi-repo has been updated, code is running in production, @kleisauke is monitoring our production-servers for errors. Huge thanks already! |
After monitoring our production server for errors, it seems that this issue has been solved. Thanks John! |
Since migrating from 8.4.4 to 8.5.2 we see this issue (can you re-open it?), it isn't reproducible in test environments, not even in production when requesting the same images. It looks like it has to do with high load and lots of concurrent requests. Errors outputted look like
for ~0.03% of the requests (hence the not reproducible part). Only for JPEG's it seems. Really strange, and we cannot debug it at this time..
The text was updated successfully, but these errors were encountered: