-
Notifications
You must be signed in to change notification settings - Fork 49
Different resize issues #148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We can now do high-quality reduce for any scale factor, though it'll be slow for very large reductions. Thanks homm, see libvips/pyvips#148
Hi @homm, this looks very interesting. As you say, a two-step resize with a box filter first is an easy way to a nice speedup, especially for large reduce factors. The documentation says "This will not work well for shrink factors greater than three", it is not clear for me why. I think that's related to the previous implementation, I should remove it. The The libvips resize operations do round-to-nearest for image dimensions, so as you say I'll have a look at the reduce behaviour, I agree it looks like there are some problems there. I'll check the speed too, it does look odd. |
I had a look at the moire from reduce. It's caused by the fixed-point arithmetic reduce is using for uint8 images, plus the fixed set of masks that libvips precomputes. There's a slower high-quality mode (cast the input to double) which fixes this. I'll have a look at the performance strangeness. |
Here's the output before and after applying this patch (from issue libvips/libvips#703). Pillow reduce() vs VIPS shrink()
Pillow resize() vs VIPS reduce()
Pillow resize() vs VIPS resize()
|
That's very nice Kleis. I'm sorry your patch hasn't been merged, I've been so distracted :( Do you think we should merge for 8.9? |
Ah, now I check on a bigger screen I can see the extra moire you were talking about more clearly. It seems you have the vector path enabled, so it's the lower precision there. If I do this:
I see: So that's the C path on the left (20.12 bit fixed point) and the vector path on the right (2.6 bits for the coefficients, 10.6 for the accumulator). |
It would be nice if it were merged/fixed in 8.9. Maybe it's wiser to fix it in 8.10 (or a patch release), it allows us to test it more and notify some people (for e.g. libvips/libvips#659). Regarding the patch, it still produces a small pixel shift (within reducev testcasevips reducev input/Landscape_1.jpg output-patch/lanczos3/Landscape_1.jpg[strip,Q=85] 5.624992969 --kernel lanczos3 --centre
vips reducev input/Landscape_2.jpg temp.v 5.624992969 --kernel lanczos3 --centre
vips flip temp.v output-patch/lanczos3/Landscape_2.jpg[strip,Q=85] horizontal
vips reducev input/Landscape_3.jpg temp.v 5.624992969 --kernel lanczos3 --centre
vips rot temp.v output-patch/lanczos3/Landscape_3.jpg[strip,Q=85] d180
vips reducev input/Landscape_4.jpg temp.v 5.624992969 --kernel lanczos3 --centre
vips rot temp.v temp2.v d180
vips flip temp2.v output-patch/lanczos3/Landscape_4.jpg[strip,Q=85] horizontal
vips rot input/Landscape_5.jpg temp.v d270
vips reducev temp.v temp2.v 5.624992969 --kernel lanczos3 --centre
vips flip temp2.v output-patch/lanczos3/Landscape_5.jpg[strip,Q=85] vertical
vips rot input/Landscape_6.jpg temp.v d90
vips reducev temp.v output-patch/lanczos3/Landscape_6.jpg[strip,Q=85] 5.624992969 --kernel lanczos3 --centre
vips rot input/Landscape_7.jpg temp.v d90
vips reducev temp.v temp2.v 5.624992969 --kernel lanczos3 --centre
vips flip temp2.v output-patch/lanczos3/Landscape_7.jpg[strip,Q=85] vertical
vips rot input/Landscape_8.jpg temp.v d270
vips reducev temp.v output-patch/lanczos3/Landscape_8.jpg[strip,Q=85] 5.624992969 --kernel lanczos3 --centre
rm {temp,temp2}.v That's the reason I described as a partial patch and didn't open a PR yet. If you want, I can open a draft PR for easier reviewing. |
I wonder if it could be related to the extra vector path in reducev? Anyway, let's defer this to 8.10 and perhaps try to improve |
I got confused, so here are those timings in a table to make comparisons a little easier. I converted times to milliseconds, and called the three operations box (for a simple box filter) convolution (for the convolution-style adaptive filter) and combined (box then convolution).
So the puzzling one is 9.4x reduce being three times slower than 8x reduce. I'll have a look. For fun, I tried your benchmark on this machine with threading enabled.
|
Oh, found it, 8x reductions will hit the vector path, 9.4x is a little too large and it falls back to C. |
It hasn't gone completely. If you look close, the red border is more intensive than on the Pillow version. This could be due to different edges handling and probably ok (Pillow is considering pixels outside of the image as not exist, while another approach is expanding the edge pixels outside of the image so they get more weight). But if you look at the right edge, it is far more narrow. And such a difference shouldn't happen.
I'm sorry for involving, but in my opinion, this is too huge for a minor release.
In Pillow-SIMD accumulator consists of 1 bit for sign, 1 bit for overflow under 255 value, 8 bits for the value (9 bits in total) and up to 22 bits for precision. Coefficients are 16 bits and always under 1.0, so 1 bit is for a sign, all other bits are for precision. Exact precision is calculated dynamically and based on max coefficient value and data types limits (max coefficient shouldn't overflow INT16, accumulator shouldn't overflow INT32). You can find this in normalize_coeffs_8bpc function. With such an approach there is no need in separate non-vectorized version, the accumulator is large enough to handle very largely reduce. I hope it helps.
Just keep in mind this is not SIMD version :D |
By the way, for me is not clear how |
Yes, libvips copies edge pixels outwards, so you would expect the red to be a little stronger. Thank you for the notes on pillow precision. Orc only has 16 x 16 -> 32, there's no 32 x 32 -> top of 64 (I don't think) so we are forced to use a 16-bit accumulator. libvip uses major version to mean API, so it changes very rarely. The minor version number changes for each six-monthly release, so 8.9.0 would be a good point to revise the resamplers. The only issue is that we've had 8.9 on the edge of going out for weeks and it's painful to consider a further delay, especially since we don't have a complete and tested patch. We'll leave it for 8.10 in the summer. 8.9 adds true streaming, you might be interested: https://libvips.github.io/libvips/2019/12/11/What's-new-in-8.9.html libvips https://libvips.github.io/libvips/API/current/libvips-resample.html#vips-shrink Yes, I know you plan to work your SIMD magic as well! I'm sure there will be a further impressive speed-up. libvips currently spends ~70% of CPU during resize in one line of C in |
Sorry, I thought 8.9 is already released. |
Upcoming release:
|
Oh that's very nice, you've done pillow-7 SIMD already? To defend libvips a little, its implementations of box filter and kernel filter downsample have extra complexity (and hence lower speed) because they need to be able to thread, and because they need to be able to work on image streams. They add extra caches and locks, and they work in lines. This test is timing a single operation in isolation (of course, that's the focus of your work) and so it measures the downside of this extra complexity, but does not measure the upside. If you time libvips and pillow-simd end-to-end, then libvips will often be faster and use less memory, for example: #!/usr/bin/env python3
import sys
from PIL import Image
input_filename = sys.argv[1]
output_filename = sys.argv[2]
scale = float(sys.argv[3])
im = Image.open(input_filename)
im = im.resize((round(im.width / scale), round(im.height / scale)),
Image.LANCZOS, reducing_gap=2.0)
im.save(output_filename) I see:
(I'm not sure why pillow-7 prints "seen error" here) I think pillow and libvips are complementary in many ways. Most libvips development work has gone into the streaming and threading system, whereas pillow seem to focus more on the implementation of each operation. If we put your SIMD resize code inside libvips, it'd go even faster. |
I haven't found such string in Pillow sources nor in google in conjunction with Python, time or libpng. Really weird thing ) |
PNG compression has unpredictable complexity which depends on default settings. This is not a good choice for benchmarking. |
Huh weird, I see it on two ubuntu 19.10 systems. You're right, it's not pillow7. I tried running in gdb and butting a break on printf, but the string didn't appear, oddly. It's clearly unimportant. |
Indeed, the red border is more intensive. I tried a different way of edge handling (by considering pixels outside the image as black) and saw a small difference in the intensity of the border(s). See: DetailsPillow reduce() vs VIPS shrink()
Pillow resize() vs VIPS reduce()
Pillow resize() vs VIPS resize()
The border narrowness issue is due to libvips/libvips#1512 (I think, since only the right and bottom border are too narrow).
That was my first thought too. Disabling the vector path with the I also tried to rollback the round-to-nearest behavior within the mask index: const int py = (int) Y;
- const int sy = Y * VIPS_TRANSFORM_SCALE * 2;
- const int siy = sy & (VIPS_TRANSFORM_SCALE * 2 - 1);
- const int ty = (siy + 1) >> 1;
const int py = (int) Y;
+ const int sy = Y * VIPS_TRANSFORM_SCALE;
+ const int ty = sy & (VIPS_TRANSFORM_SCALE - 1);
const int *cyo = reducev->matrixo[ty]; (commit libvips/libvips@7fd672f might be relevant) but that didn't work either (didn't saw any visual difference). |
This just mixes black color to the edge pixels, which is not correct. If you are curious about how this works in Pillow, we need to deep in how resizing works. Resizing is a convolution of the source pixels with some coefficients distribution. The sum of coefficients should always be 1.0. For the edge pixels, some coefficients lay outside of the image. We have some options, including:
|
Ah OK, thanks for the info! Calculating resizing coefficients is quite new to me.
Perhaps we could increase the I noticed that this constant in |
The pixel shift should be fixed in 8.10 and I just submitted a draft PR for the round-up option in shrink (see libvips/libvips#1769). Analysis of this PR with the above test image can be found in the kleisauke/pyvips-issue-148 repository. |
Hi. It's been some time since @homm has posted this issue and his performance numbers. I've reran his script with most recent versions of Pillow and PyVIPS and found some unexpected performance issues with PyVIPS. I've reformatted the script's output to better showcase them:
What is odd about those times:
Those numbers are from M2 MacBook Pro if it matters. Here is the configuration I've used:
Here is the updated script. |
Thanks for testing @amw -- it might be worth rechecking with libvips/libvips#3618 This is the new highway-backed SIMD code we're hoping to get into 8.15. There are some accuracy and speed improvements. |
With PR libvips/libvips#3618 and this test image: I see this on my AMD Ryzen 9 7900 workstation: $ python pillow-vs-vips.py ../vips-microbench/images/x.jpg
Size: 10000x10000
---------------------------------------------------------------------
| 8x fast | vips 137.48ms ( 947%) | pillow 14.51ms ( 100%) |
| 8x adaptive | vips 133.44ms ( 347%) | pillow 38.44ms ( 100%) |
| 8x slow | vips 26.29ms ( 100%) | pillow 63.46ms ( 241%) |
---------------------------------------------------------------------
| 9.4x fast | vips 167.61ms ( 100%) | pillow ----ms |
| 9.4x adaptive | vips 103.63ms ( 275%) | pillow 37.68ms ( 100%) |
| 9.4x slow | vips 26.47ms ( 100%) | pillow 62.05ms ( 234%) |
---------------------------------------------------------------------
| 16x fast | vips 69.94ms ( 511%) | pillow 13.68ms ( 100%) |
| 16x adaptive | vips 66.75ms ( 410%) | pillow 16.30ms ( 100%) |
| 16x slow | vips 40.28ms ( 100%) | pillow 44.02ms ( 109%) |
---------------------------------------------------------------------
-------------------------------------------------------------------------------------------
| vips 8x | fast 137.41ms ( 555%) | adaptive 130.59ms ( 528%) | slow 24.76ms ( 100%) |
| vips 9.4x | fast 168.84ms ( 625%) | adaptive 102.79ms ( 380%) | slow 27.02ms ( 100%) |
| vips 16x | fast 69.85ms ( 169%) | adaptive 67.31ms ( 163%) | slow 41.38ms ( 100%) |
-------------------------------------------------------------------------------------------
| pillow 8x | fast 14.88ms ( 100%) | adaptive 25.91ms ( 174%) | slow 47.56ms ( 320%) |
| pillow 9.4x | fast ----ms | adaptive 25.53ms ( 100%) | slow 47.10ms ( 184%) |
| pillow 16x | fast 13.19ms ( 100%) | adaptive 15.81ms ( 120%) | slow 42.58ms ( 323%) |
------------------------------------------------------------------------------------------- (that's with Pillow-SIMD installed following the instructions at https://github.com/uploadcare/pillow-simd#installation) So, it seems that @jcupitt Perhaps it might be worth investigating if we could optimize Details--- a/libvips/resample/resize.c
+++ b/libvips/resample/resize.c
@@ -351,7 +351,7 @@ vips_resize_class_init(VipsResizeClass *class)
_("Reducing gap"),
VIPS_ARGUMENT_OPTIONAL_INPUT,
G_STRUCT_OFFSET(VipsResize, gap),
- 0.0, 1000000.0, 2.0);
+ 0.0, 1000000.0, 0.0);
/* We used to let people set the input offset so you could pick centre
* or corner interpolation, but it's not clear this was useful.
@@ -391,7 +391,7 @@ vips_resize_class_init(VipsResizeClass *class)
static void
vips_resize_init(VipsResize *resize)
{
- resize->gap = 2.0;
+ resize->gap = 0.0;
resize->kernel = VIPS_KERNEL_LANCZOS3;
}
@@ -406,15 +406,14 @@ vips_resize_init(VipsResize *resize)
*
* * @vscale: %gdouble vertical scale factor
* * @kernel: #VipsKernel to reduce with
- * * @gap: reducing gap to use (default: 2.0)
+ * * @gap: reducing gap to use (default: 0.0)
*
* Resize an image.
*
* Set @gap to speed up downsizing by having vips_shrink() to shrink
* with a box filter first. The bigger @gap, the closer the result
* to the fair resampling. The smaller @gap, the faster resizing.
- * The default value is 2.0 (very close to fair resampling
- * while still being faster in many cases).
+ * The default value is 0.0 (no optimization).
*
* vips_resize() normally uses #VIPS_KERNEL_LANCZOS3 for the final reduce, you
* can change this with @kernel. Downsizing is done with centre convention. |
Here's an updated version of the benchmark script: This version modifies the libvips benchmark to use sequential access and integrates the file opening directly within the benchmark method for a fair comparison. Using libvips 8.16.0 with the same test setup and machine as referenced earlier, the results are: $ python pillow-vs-vips.py x.jpg
Size: 10000x10000
---------------------------------------------------------------------
| 8x fast | vips 278.04ms ( 100%) | pillow 321.09ms ( 115%) |
| 8x adaptive | vips 331.50ms ( 100%) | pillow 330.74ms ( 100%) |
| 8x slow | vips 254.45ms ( 100%) | pillow 367.02ms ( 144%) |
---------------------------------------------------------------------
| 9.4x fast | vips 270.10ms ( 100%) | pillow ----ms |
| 9.4x adaptive | vips 327.11ms ( 100%) | pillow 338.08ms ( 103%) |
| 9.4x slow | vips 253.89ms ( 100%) | pillow 364.61ms ( 144%) |
---------------------------------------------------------------------
| 16x fast | vips 236.71ms ( 100%) | pillow 303.52ms ( 128%) |
| 16x adaptive | vips 268.73ms ( 100%) | pillow 318.99ms ( 119%) |
| 16x slow | vips 288.36ms ( 100%) | pillow 354.38ms ( 123%) |
--------------------------------------------------------------------- Notably, adaptive resizing in libvips (using I've made an attempt to address this by implementing SIMD optimizations on this work-in-progress branch: With these optimizations, I see: $ python pillow-vs-vips.py x.jpg
Size: 10000x10000
---------------------------------------------------------------------
| 8x fast | vips 212.18ms ( 100%) | pillow 316.16ms ( 149%) |
| 8x adaptive | vips 227.78ms ( 100%) | pillow 331.31ms ( 145%) |
| 8x slow | vips 259.51ms ( 100%) | pillow 368.50ms ( 142%) |
---------------------------------------------------------------------
| 9.4x fast | vips 221.57ms ( 100%) | pillow ----ms |
| 9.4x adaptive | vips 227.24ms ( 100%) | pillow 332.57ms ( 146%) |
| 9.4x slow | vips 253.03ms ( 100%) | pillow 356.11ms ( 141%) |
---------------------------------------------------------------------
| 16x fast | vips 199.49ms ( 100%) | pillow 306.95ms ( 154%) |
| 16x adaptive | vips 209.37ms ( 100%) | pillow 311.88ms ( 149%) |
| 16x slow | vips 294.99ms ( 100%) | pillow 347.84ms ( 118%) |
--------------------------------------------------------------------- |
This is an attempt to spot all geometry issues in VIPS resizing at once.
Pillow 7.0 has been released today (no SIMD version yet) and I've implemented a new resizing method called
reduce()
. It allows very fast reducing an image by integer times in both dimensions and could be used as a first step before true-convolution resampling. I know VIPS also works the same way. So, I borrowed this idea and come to share some ideas, test cases, and bugs in VIPS resizing. I hope this will be useful.At first, let's learn the tools we have.
In Pillow
im.reduce()
— fast image shrinking by integer times. Each result pixel is just an average value of particular source pixels.im.resize()
— convolution-based resampling with adaptive kernel size.im.resize(reducing_gap=2.0)
— 2-step resizing where the first step isim.reduce()
and second step is convolution-based resampling. The intermediate size is not less than two times bigger than the requested size.In VIPS
im.shrink()
— fast image shrinking by integer times and unknown algorithm for fractional scales.im.reduce()
— convolution-based resampling with adaptive kernel size. The documentation says "This will not work well for shrink factors greater than three", it is not clear for me why.im.resize()
— 2-step resizing where the first step isim.shrink()
and second step is convolution-based resampling withim.reduce()
. The documentation says "How much is done by vips_shrink() vs. vips_reduce() varies with the kernel setting" but I pretty sure for this is equivalent of Pillow'sreducing_gap=2.0
at least for current cases.The test image
It has the following features:
The test code
Pillow reduce() vs VIPS shrink()
I will use only 8x versions. It is better to compare Images by opening in adjacent browsers tabs and fast switch.
The results are really close. The only noticeable difference is the last column. Indeed, we are scaling by a factor of 8 and 2049 / 8 is 256.125 pixels. So VIPS strips the last column, while Pillow expands the last column of red pixels to the whole last column of the resulting image. At first glance VIPS's behavior looks more correct since the last pixel should be 0.125 pixels width in the resulting image and this is close to 0 than to 1. But in practice, we lost the information from the last column completely for further processing. While in Pillow's version we still can take into account the impact of the last column in further processing.
The same behavior as Pillow shows libjpeg when you are using DCT scaling. If you are opening a 2049 × 2047 jpeg image with scale = 8, you get 257×256 image where the last source column is expanded on the whole column of the resulting image.
Pillow resize() vs VIPS reduce()
There are some problems with VIPS version. The whole image is shifted by about 0.5 pixels to the bottom right. As a result, top and left red lines are too bold while the bottom and right one are completely lost. This looks like a rounding error at some stage. Very likely this is the same issue as in libvips/libvips#703. The Pillow result, as the original image, has tiny rose border which looks equally on different sides.
Another Issue is some additional moire where it shouldn't be. This looks like a precision error.

Pillow resize(reducing_gap=2.0) vs VIPS resize()
Surprisingly, all of these issues don't affect resizing too much. Both images have moire due to precision loss on the first step which is ok. The locations of rings are different which is also probably is ok. But VIPS losts the right red line completely.
How this works in Pillow
The main invention in Pillow is
box
argument for resizing.box
is an additional bunch of coordinates in the source image which are used as pixels source instead of the whole image. At first glance, this looks like crop + resize sequence, but the main advantage whatbox
is not used for size calculation, it only used for resizing coefficient calculations and this is why coordinates could be floats.https://github.com/python-pillow/Pillow/blob/1cecf08d16509c20473766b4cdb7a65169844819/src/PIL/Image.py#L1860-L1873
So, what happens when we need to resize 2049×2047 image to 256×256 size with
reducing_gap=2
?reducing_gap
, it will be 2049 / 256 / 2 = 4.001953125 times for width and 2047 / 256 * 2 = 3.998046875 times for height. The real reduction will be in (4, 3) times (4 for width, 3 for height).Performance issues
Also, I see some unexpected results for the performance of resizing.
Shrink has comparable time fox 8x and 16x which is ok. But if you look at reduce, it performs almost 4 times faster for 8x reducing than for 16x. This shouldn't happen, I think. Despite this fact, resize itself works 2 times slower for 8x resizing which is also very strange. It works even slower than true-convolution reduce.
The text was updated successfully, but these errors were encountered: