Skip to content

Performance: improve XYZ to LAB conversion by ~15% #1729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 20, 2020

Conversation

lovell
Copy link
Member

@lovell lovell commented Jul 19, 2020

This PR switches the XYZ to LAB clipping from fmax/fmin maths library calls to instead use the simple ternary operators of VIPS_CLIP, which the compiler is able to heavily optimise.

Before:

150,596,456  ???:vips_XYZ2Lab_line [/usr/local/lib/libvips.so.42.12.3]
 46,873,827  ???:vips_col_scRGB2XYZ [/usr/local/lib/libvips.so.42.12.3]
 25,567,542  /build/glibc-5mDdLG/glibc-2.30/math/../sysdeps/x86_64/fpu/s_fmax.S:fmax [/usr/lib/x86_64-linux-gnu/libm-2.30.so]
 25,567,542  /build/glibc-5mDdLG/glibc-2.30/math/../sysdeps/x86_64/fpu/s_fmin.S:fmin [/usr/lib/x86_64-linux-gnu/libm-2.30.so]
 24,180,156  ???:vips_scRGB2XYZ_line [/usr/local/lib/libvips.so.42.12.3]
 21,563,120  /build/glibc-5mDdLG/glibc-2.30/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc-2.30.so]
 18,512,200  ???:vips_sRGB2scRGB_gen [/usr/local/lib/libvips.so.42.12.3]

After:

116,498,847  ???:vips_XYZ2Lab_line [/usr/local/lib/libvips.so.42.12.3]
 46,873,827  ???:vips_col_scRGB2XYZ [/usr/local/lib/libvips.so.42.12.3]
 24,180,156  ???:vips_scRGB2XYZ_line [/usr/local/lib/libvips.so.42.12.3]
 21,562,840  /build/glibc-5mDdLG/glibc-2.30/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc-2.30.so]
 18,512,200  ???:vips_sRGB2scRGB_gen [/usr/local/lib/libvips.so.42.12.3]

@jcupitt jcupitt merged commit ef6ad7f into libvips:master Jul 20, 2020
@lovell lovell deleted the perf-lab-clip branch July 20, 2020 13:35
@jcupitt
Copy link
Member

jcupitt commented Jul 20, 2020

As I remember, VIPS_FCLIP() went in when we added support for __builtin_rint / round / fabs. They all gave a nice speedup and I suppose I just assumed min and max would as well. Interesting!

Perhaps fmin/fmax should be removed? Something to think about for 8.11.

@lovell
Copy link
Member Author

lovell commented Jul 20, 2020

Yes, most of the builtins are very much worth using as you'd expect, so I too was surprised to see such a difference here.

@lovell
Copy link
Member Author

lovell commented Jul 21, 2020

We'll need to add isnan checking to this logic as fmin/fmax implicitly provides this. This will slow things back down by ~5% but is still faster overall. PR on the way shortly.

@jcupitt
Copy link
Member

jcupitt commented Jul 21, 2020

Ooop, sorry! I just reverted this until 8.11. Sure, please make a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants