Performance: safely improve XYZ to LAB conversion by ~12% #1733

lovell · 2020-07-21T10:46:48Z

This is a safer implementation of #1729, which switches the XYZ to LAB clipping from fmax/fmin maths library calls to instead use the simple ternary operators of VIPS_CLIP, which the compiler is able to heavily optimise, whilst also ensuring that NaN cannot creep through as an index into the cubed root lookup tables.

Before:

150,596,456  ???:vips_XYZ2Lab_line [/usr/local/lib/libvips.so.42.12.3]
 46,873,827  ???:vips_col_scRGB2XYZ [/usr/local/lib/libvips.so.42.12.3]
 25,567,542  /build/glibc-5mDdLG/glibc-2.30/math/../sysdeps/x86_64/fpu/s_fmax.S:fmax [/usr/lib/x86_64-linux-gnu/libm-2.30.so]
 25,567,542  /build/glibc-5mDdLG/glibc-2.30/math/../sysdeps/x86_64/fpu/s_fmin.S:fmin [/usr/lib/x86_64-linux-gnu/libm-2.30.so]
 24,180,156  ???:vips_scRGB2XYZ_line [/usr/local/lib/libvips.so.42.12.3]
 21,563,120  /build/glibc-5mDdLG/glibc-2.30/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc-2.30.so]
 18,512,200  ???:vips_sRGB2scRGB_gen [/usr/local/lib/libvips.so.42.12.3]

After:

125,021,341  ???:vips_XYZ2Lab_line [/usr/local/lib/libvips.so.42.12.3]
 46,873,827  ???:vips_col_scRGB2XYZ [/usr/local/lib/libvips.so.42.12.3]
 24,180,156  ???:vips_scRGB2XYZ_line [/usr/local/lib/libvips.so.42.12.3]
 21,563,668  /build/glibc-5mDdLG/glibc-2.30/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc-2.30.so]
 18,512,200  ???:vips_sRGB2scRGB_gen [/usr/local/lib/libvips.so.42.12.3]

jcupitt · 2020-07-21T12:23:33Z

Ah phew, it's a simpler fix than I thought.

It occurred to me that you could probably speed this up by casting to int first. For example:

C#include <stdio.h>
#include <math.h>

int
main( int argc, char **argv )
{
        double d = NAN;
        int i;

        printf( "d = %g\n", d );
        i = (int) d;
        printf( "i = %d\n", i );

        return( 0 );
}

Prints:

$ ./a.out 
d = nan
i = -2147483648

So I think you could just do:

i = VIPS_CLIP( 0, (int) nX, QUANT_ELEMENTS - 2 );

lovell · 2020-07-21T13:19:20Z

Yes, that's a great idea, I'll update and test accordingly. I'll also try to take a look and see if there's anywhere else this approach could be used.

- VIPS_CLIP is faster than fmin/fmax based library calls - Cast to int to ensure the cubed root LUT is not referenced by NaN

lovell · 2020-07-21T13:28:13Z

This approach is slightly faster than the original PR.

112,246,144  ???:vips_XYZ2Lab_line [/usr/local/lib/libvips.so.42.12.3]
 46,873,827  ???:vips_col_scRGB2XYZ [/usr/local/lib/libvips.so.42.12.3]
 24,180,156  ???:vips_scRGB2XYZ_line [/usr/local/lib/libvips.so.42.12.3]
 21,560,982  /build/glibc-5mDdLG/glibc-2.30/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc-2.30.so]
 18,512,200  ???:vips_sRGB2scRGB_gen [/usr/local/lib/libvips.so.42.12.3]

I think there are a couple of possible improvements relating to this in the scRGB to sRGB conversion - will put these in a separate PR.

jcupitt · 2020-07-21T14:16:06Z

That looks great!

Performance: improve XYZ to LAB conversion by ~15%

3659655

- VIPS_CLIP is faster than fmin/fmax based library calls - Cast to int to ensure the cubed root LUT is not referenced by NaN

lovell force-pushed the perf-xyz2lab-clip branch from a6c4c15 to 3659655 Compare July 21, 2020 13:23

jcupitt merged commit a6557eb into libvips:master Jul 21, 2020

lovell deleted the perf-xyz2lab-clip branch July 21, 2020 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Performance: safely improve XYZ to LAB conversion by ~12% #1733

Performance: safely improve XYZ to LAB conversion by ~12% #1733

Uh oh!

lovell commented Jul 21, 2020

Uh oh!

jcupitt commented Jul 21, 2020 •

edited

Loading

Uh oh!

lovell commented Jul 21, 2020

Uh oh!

lovell commented Jul 21, 2020

Uh oh!

jcupitt commented Jul 21, 2020

Uh oh!

Uh oh!

Uh oh!

Performance: safely improve XYZ to LAB conversion by ~12% #1733

Performance: safely improve XYZ to LAB conversion by ~12% #1733

Uh oh!

Conversation

lovell commented Jul 21, 2020

Uh oh!

jcupitt commented Jul 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lovell commented Jul 21, 2020

Uh oh!

lovell commented Jul 21, 2020

Uh oh!

jcupitt commented Jul 21, 2020

Uh oh!

Uh oh!

jcupitt commented Jul 21, 2020 •

edited

Loading