Performance: improve XYZ to LAB conversion by ~15% #1729

lovell · 2020-07-19T20:30:26Z

This PR switches the XYZ to LAB clipping from fmax/fmin maths library calls to instead use the simple ternary operators of VIPS_CLIP, which the compiler is able to heavily optimise.

Before:

150,596,456  ???:vips_XYZ2Lab_line [/usr/local/lib/libvips.so.42.12.3]
 46,873,827  ???:vips_col_scRGB2XYZ [/usr/local/lib/libvips.so.42.12.3]
 25,567,542  /build/glibc-5mDdLG/glibc-2.30/math/../sysdeps/x86_64/fpu/s_fmax.S:fmax [/usr/lib/x86_64-linux-gnu/libm-2.30.so]
 25,567,542  /build/glibc-5mDdLG/glibc-2.30/math/../sysdeps/x86_64/fpu/s_fmin.S:fmin [/usr/lib/x86_64-linux-gnu/libm-2.30.so]
 24,180,156  ???:vips_scRGB2XYZ_line [/usr/local/lib/libvips.so.42.12.3]
 21,563,120  /build/glibc-5mDdLG/glibc-2.30/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc-2.30.so]
 18,512,200  ???:vips_sRGB2scRGB_gen [/usr/local/lib/libvips.so.42.12.3]

After:

116,498,847  ???:vips_XYZ2Lab_line [/usr/local/lib/libvips.so.42.12.3]
 46,873,827  ???:vips_col_scRGB2XYZ [/usr/local/lib/libvips.so.42.12.3]
 24,180,156  ???:vips_scRGB2XYZ_line [/usr/local/lib/libvips.so.42.12.3]
 21,562,840  /build/glibc-5mDdLG/glibc-2.30/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:__memcpy_avx_unaligned_erms [/usr/lib/x86_64-linux-gnu/libc-2.30.so]
 18,512,200  ???:vips_sRGB2scRGB_gen [/usr/local/lib/libvips.so.42.12.3]

jcupitt · 2020-07-20T13:47:35Z

As I remember, VIPS_FCLIP() went in when we added support for __builtin_rint / round / fabs. They all gave a nice speedup and I suppose I just assumed min and max would as well. Interesting!

Perhaps fmin/fmax should be removed? Something to think about for 8.11.

lovell · 2020-07-20T14:04:20Z

Yes, most of the builtins are very much worth using as you'd expect, so I too was surprised to see such a difference here.

lovell · 2020-07-21T09:56:20Z

We'll need to add isnan checking to this logic as fmin/fmax implicitly provides this. This will slow things back down by ~5% but is still faster overall. PR on the way shortly.

jcupitt · 2020-07-21T10:02:41Z

Ooop, sorry! I just reverted this until 8.11. Sure, please make a new PR.

Improve performance of conversion to LAB by ~15%

4aaa395

lovell added the enhancement label Jul 19, 2020

lovell mentioned this pull request Jul 19, 2020

Performance: improve LAB to LCH conversion by ~30% #1728

Merged

jcupitt merged commit ef6ad7f into libvips:master Jul 20, 2020

lovell deleted the perf-lab-clip branch July 20, 2020 13:35

jcupitt mentioned this pull request Jul 20, 2020

reconsider use of __builtin #1731

Closed

jcupitt mentioned this pull request Jul 21, 2020

Revert "Performance: improve XYZ to LAB conversion by ~15%" #1732

Merged

lovell mentioned this pull request Jul 21, 2020

Performance: safely improve XYZ to LAB conversion by ~12% #1733

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Performance: improve XYZ to LAB conversion by ~15% #1729

Performance: improve XYZ to LAB conversion by ~15% #1729

Uh oh!

lovell commented Jul 19, 2020

Uh oh!

jcupitt commented Jul 20, 2020

Uh oh!

lovell commented Jul 20, 2020

Uh oh!

lovell commented Jul 21, 2020

Uh oh!

jcupitt commented Jul 21, 2020

Uh oh!

Uh oh!

Uh oh!

Performance: improve XYZ to LAB conversion by ~15% #1729

Performance: improve XYZ to LAB conversion by ~15% #1729

Uh oh!

Conversation

lovell commented Jul 19, 2020

Uh oh!

jcupitt commented Jul 20, 2020

Uh oh!

lovell commented Jul 20, 2020

Uh oh!

lovell commented Jul 21, 2020

Uh oh!

jcupitt commented Jul 21, 2020

Uh oh!

Uh oh!