Skip to content

Performance: safely improve XYZ to LAB conversion by ~12% #1733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 21, 2020

Conversation

lovell
Copy link
Member

@lovell lovell commented Jul 21, 2020

This is a safer implementation of #1729, which switches the XYZ to LAB clipping from fmax/fmin maths library calls to instead use the simple ternary operators of VIPS_CLIP, which the compiler is able to heavily optimise, whilst also ensuring that NaN cannot creep through as an index into the cubed root lookup tables.

Before:

150,596,456  ???:vips_XYZ2Lab_line [/usr/local/lib/libvips.so.42.12.3]
 46,873,827  ???:vips_col_scRGB2XYZ [/usr/local/lib/libvips.so.42.12.3]
 25,567,542  /build/glibc-5mDdLG/glibc-2.30/math/../sysdeps/x86_64/fpu/s_fmax.S:fmax [/usr/lib/x86_64-linux-gnu/libm-2.30.so]
 25,567,542  /build/glibc-5mDdLG/glibc-2.30/math/../sysdeps/x86_64/fpu/s_fmin.S:fmin [/usr/lib/x86_64-linux-gnu/libm-2.30.so]
 24,180,156  ???:vips_scRGB2XYZ_line [/usr/local/lib/libvips.so.42.12.3]
 21,563,120  /build/glibc-5mDdLG/glibc-2.30/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc-2.30.so]
 18,512,200  ???:vips_sRGB2scRGB_gen [/usr/local/lib/libvips.so.42.12.3]

After:

125,021,341  ???:vips_XYZ2Lab_line [/usr/local/lib/libvips.so.42.12.3]
 46,873,827  ???:vips_col_scRGB2XYZ [/usr/local/lib/libvips.so.42.12.3]
 24,180,156  ???:vips_scRGB2XYZ_line [/usr/local/lib/libvips.so.42.12.3]
 21,563,668  /build/glibc-5mDdLG/glibc-2.30/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc-2.30.so]
 18,512,200  ???:vips_sRGB2scRGB_gen [/usr/local/lib/libvips.so.42.12.3]

@jcupitt
Copy link
Member

jcupitt commented Jul 21, 2020

Ah phew, it's a simpler fix than I thought.

It occurred to me that you could probably speed this up by casting to int first. For example:

C#include <stdio.h>
#include <math.h>

int
main( int argc, char **argv )
{
        double d = NAN;
        int i;

        printf( "d = %g\n", d );
        i = (int) d;
        printf( "i = %d\n", i );

        return( 0 );
}

Prints:

$ ./a.out 
d = nan
i = -2147483648

So I think you could just do:

i = VIPS_CLIP( 0, (int) nX, QUANT_ELEMENTS - 2 );

@lovell
Copy link
Member Author

lovell commented Jul 21, 2020

Yes, that's a great idea, I'll update and test accordingly. I'll also try to take a look and see if there's anywhere else this approach could be used.

- VIPS_CLIP is faster than fmin/fmax based library calls
- Cast to int to ensure the cubed root LUT is not referenced by NaN
@lovell lovell force-pushed the perf-xyz2lab-clip branch from a6c4c15 to 3659655 Compare July 21, 2020 13:23
@lovell
Copy link
Member Author

lovell commented Jul 21, 2020

This approach is slightly faster than the original PR.

112,246,144  ???:vips_XYZ2Lab_line [/usr/local/lib/libvips.so.42.12.3]
 46,873,827  ???:vips_col_scRGB2XYZ [/usr/local/lib/libvips.so.42.12.3]
 24,180,156  ???:vips_scRGB2XYZ_line [/usr/local/lib/libvips.so.42.12.3]
 21,560,982  /build/glibc-5mDdLG/glibc-2.30/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc-2.30.so]
 18,512,200  ???:vips_sRGB2scRGB_gen [/usr/local/lib/libvips.so.42.12.3]

I think there are a couple of possible improvements relating to this in the scRGB to sRGB conversion - will put these in a separate PR.

@jcupitt
Copy link
Member

jcupitt commented Jul 21, 2020

That looks great!

@jcupitt jcupitt merged commit a6557eb into libvips:master Jul 21, 2020
@lovell lovell deleted the perf-xyz2lab-clip branch July 21, 2020 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants