colour: use floating-point APIs of lcms #3373

kleisauke · 2023-03-04T13:29:52Z

Avoids the need to pack the buffer of floats into lcms's fixed-point formats and ensures full precision internally in lcms.

Resolves: #3150.
Supersedes: #3368.

Note: this PR targets the 8.14 branch.

Avoids the need to pack the buffer of floats into lcms's fixed-point formats and ensures full precision internally in lcms. Resolves: libvips#3150. Supersedes: libvips#3368.

jcupitt

I'm amazed lcms has such different behaviour on this code path.

Well done for finding this Kleis.

jcupitt · 2023-03-05T12:04:06Z

I tested this PR with:

$ vips icc_import input.jpg t1.v --pcs xyz
$ vips icc_export t1.v output.jpg

And it now round-trips correctly, though there's some gamut clipping in the extreme reds, as you'd expect.

Open these in two browser tabs and flip. Before:

After round-trip:

jcupitt · 2023-03-05T12:09:29Z

I benchmarked like this. Before:

$ time vips icc_import input.jpg t1.v --pcs xyz
real	0m0.096s
user	0m0.112s
sys	0m0.076s

After:

$ time vips icc_import input.jpg t1.v --pcs xyz
real	0m0.097s
user	0m0.345s
sys	0m0.090s

So we're maybe 4x more CPU with this change, which is unfortunate :( However, the accuracy improvement is so dramatic, I don't think we have a choice.

The high CPU cost seems alarming, but it's interesting that overall run time is unaffected, since libvips hides the extra processing with extra concurrency (phew!).

lovell · 2023-03-05T20:58:28Z

I've done some quick performance testing/profiling and can confirm the CPU cost (ignoring concurrency) of both approaches is ~4x, with ~30% of the time spent calculating pow (gamma calc?). Perhaps the original cmsFLAGS_NOOPTIMIZE approach is a better/simpler change after all, especially if we can make it conditional somehow e.g. XYZ-only, or maybe via a new argument?

lovell

Having slept on it, I agree this PR is the better solution, especially as it doesn't prevent lcms from internally optimising the icc_transform operation.

kleisauke · 2023-03-06T13:20:32Z

Ah, I was unaware of the additional CPU cost (I only tested overall run time). Let me check if there's another solution.

I remembered that one of the libjxl maintainers said that it did a pow() per pixel, which is probably the reason why libjxl seems to prefer skcms over lcms. This was tracked in issue mm2/Little-CMS#2 and the fast float plugin resolves that (unfortunately GPL-licensed).

jcupitt · 2023-03-06T15:11:01Z

I had a quick go with fast float using this diff to this patch:

diff --git a/libvips/colour/icc_transform.c b/libvips/colour/icc_transform.c
index 31f0c954c..e7ed52d32 100644
--- a/libvips/colour/icc_transform.c
+++ b/libvips/colour/icc_transform.c
@@ -91,6 +91,7 @@
 /* Has to be before VIPS to avoid nameclashes.
  */
 #include <lcms2.h>
+#include <lcms2_fast_float.h>
 
 #include <vips/vips.h>
 
diff --git a/libvips/iofuncs/init.c b/libvips/iofuncs/init.c
index 93f0b17ba..25089d0bc 100644
--- a/libvips/iofuncs/init.c
+++ b/libvips/iofuncs/init.c
@@ -114,6 +114,11 @@
 #include <vips/vips7compat.h>
 #endif
 
+#ifdef HAVE_LCMS2
+#include <lcms2.h>
+#include <lcms2_fast_float.h>
+#endif /*HAVE_LCMS2*/
+
 /* abort() on the first warning or error.
  */
 int vips__fatal = 0;
@@ -469,6 +474,10 @@ vips_init( const char *argv0 )
                return( 0 );
        started = TRUE;
 
+#ifdef HAVE_LCMS2
+       cmsPlugin(cmsFastFloatExtensions());
+#endif /*HAVE_LCMS2*/
+
        /* Try to set a minimum stacksize, default 2mb. We need to do this
         * before any threads start.
         */

But I don't see a speedup :(

$ time vips icc_import input.jpg t1.v --pcs xyz
real	0m0.113s
user	0m0.360s
sys	0m0.085s
$ ldd $(which vips) | grep fast
	liblcms2_fast_float.so => /home/john/vips/lib/liblcms2_fast_float.so (0x00007fd96cbe2000)

I've probably messed up somewhere.

jcupitt · 2023-03-06T15:12:56Z

Heh callgrind shows 50% of CPU spent evaluating tone curves:

I must have missed something.

kleisauke · 2023-03-07T11:47:42Z

You're right, I don't see any speedup here either with the fast float plugin :(

I did some digging and it looks like this is being caused by the OptimizeByResampling builtin optimization in lcms. With this patch:

--- a/src/cmsopt.c
+++ b/src/cmsopt.c
@@ -1803,8 +1803,7 @@ static _cmsOptimizationCollection DefaultOptimization[] = {
 
     { OptimizeByJoiningCurves,            &DefaultOptimization[1] },
     { OptimizeMatrixShaper,               &DefaultOptimization[2] },
-    { OptimizeByComputingLinearization,   &DefaultOptimization[3] },
-    { OptimizeByResampling,               NULL }
+    { OptimizeByComputingLinearization,   NULL },
 };
 
 // The linked list head

I see (click to enlarge):

$ vipsthumbnail php-vips-192.jpg -s 1536x1024 -e srgb -o x.jpg

Original	`8.14`	`8.14` + patch	This PR	Commit kleisauke@`3e93bb0` + patch

And:

$ vipsthumbnail php-vips-169.jpg --linear -s 1920x1080 -o x.jpg

Original	`8.14`	`8.14` + patch	This PR	Commit kleisauke@`3e93bb0` + patch

So, it looks like commit kleisauke@3e93bb0 might also help here, but only when that builtin lcms optimization is disabled.

jcupitt · 2023-03-07T11:52:26Z

Oh very cool! And does changing the lcms optimisation table improve performance significantly?

kleisauke · 2023-03-07T12:41:14Z

It seems that the OptimizeByResampling optimization decreases CPU cost (i.e., disabling it makes the timings more in line with the TYPE_XYZ_FLT API).

For example:
8.14:

$ time vips icc_import php-vips-169.jpg t1.v --pcs xyz

real	0m0.106s
user	0m0.105s
sys	0m0.055s

8.14 + lcms patch:

$ time vips icc_import php-vips-169.jpg t1.v --pcs xyz

real	0m0.122s
user	0m0.247s
sys	0m0.043s

This PR:

$ time vips icc_import php-vips-169.jpg t1.v --pcs xyz

real	0m0.124s
user	0m0.257s
sys	0m0.046s

Commit kleisauke@3e93bb0 + lcms patch:

$ time vips icc_import php-vips-169.jpg t1.v --pcs xyz

real	0m0.123s
user	0m0.267s
sys	0m0.034s

lovell · 2023-03-07T13:09:56Z

Great detective work Kleis. Did you see that the accuracy of the LUTs in OptimizeByResampling is based on nGridPoints which is optionally increased by the cmsFLAGS_HIGHRESPRECALC flag?

kleisauke · 2023-03-07T13:24:47Z

Yes, I also noticed that. I initially tried these two patches:

--- a/libvips/colour/icc_transform.c
+++ b/libvips/colour/icc_transform.c
@@ -444,6 +444,9 @@ vips_icc_build( VipsObject *object )
 	if( icc->black_point_compensation ) 
 		flags |= cmsFLAGS_BLACKPOINTCOMPENSATION;
 
+	if( icc->pcs == VIPS_PCS_XYZ )
+		flags |= cmsFLAGS_HIGHRESPRECALC;
+
 	if( !(icc->trans = cmsCreateTransform( 
 		icc->in_profile, icc->in_icc_format,
 		icc->out_profile, icc->out_icc_format,

--- a/libvips/colour/icc_transform.c
+++ b/libvips/colour/icc_transform.c
@@ -444,6 +444,9 @@ vips_icc_build( VipsObject *object )
 	if( icc->black_point_compensation ) 
 		flags |= cmsFLAGS_BLACKPOINTCOMPENSATION;
 
+	if( icc->pcs == VIPS_PCS_XYZ )
+		flags |= cmsFLAGS_GRIDPOINTS( 49 );
+
 	if( !(icc->trans = cmsCreateTransform( 
 		icc->in_profile, icc->in_icc_format,
 		icc->out_profile, icc->out_icc_format,

But both didn't work, unfortunately.

jcupitt · 2023-03-07T13:27:30Z

I tried removing OptimizeByResampling() but it didn't reduce the number of pow() calls (according to callgrind). It's still doing 3 pow() calls per pixel processed :(

jcupitt · 2023-03-07T13:32:07Z

libvips uses a LUT for tone curves -- 1500 elements (from memory?) then linear piecewise interpolation between them, which is accurate enough, I think.

Maybe we could make our own lcms plugin that does something similar? Or perhaps we need to swap to skcms :(

jcupitt · 2023-03-07T13:39:26Z

Oh heh no we don't, it's much more complex, I'd forgotten.

kleisauke · 2023-03-07T13:51:00Z

IIRC, skcms doesn't support conversion to LUT-based or CMYK profiles, so switching to that is not an option.

jcupitt · 2023-03-07T13:51:53Z

Ah shame.

kleisauke · 2023-03-08T10:07:53Z

Commit 4e16c13 fixes a small color regression with the php-vips-192.jpg testcase.

$ vipsthumbnail php-vips-192.jpg -s 1536x1024 -e srgb -o x.jpg

Original	Before (`caf5aa4`)	After (`4e16c13`)

kleisauke · 2023-03-08T18:45:09Z

It looks like this issue occurs only when exporting (vips_icc_export()) an image from CIE XYZ PCS to device space, this means that vips_icc_import() can still make use of the 16-bit XYZ/LAB APIs. Commit 1a4cdbe incorporates this, PTAL.

Here are some vipsthumbnail timings:

Details

8.14:

$ time vipsthumbnail php-vips-169.jpg --linear -s 1920x1080 -o x.jpg

real	0m0.107s
user	0m0.168s
sys	0m0.024s
$ time vipsthumbnail php-vips-169.jpg -s 1920x1080 -e srgb -o x.jpg

real	0m0.095s
user	0m0.094s
sys	0m0.028s
$ time vipsthumbnail php-vips-192.jpg -s 1536x1024 -e srgb -o x.jpg

real	0m0.099s
user	0m0.121s
sys	0m0.025s
$ time vipsthumbnail php-vips-192.jpg -s 1536x1024 -i srgb -e srgb -o x.jpg

real	0m0.091s
user	0m0.086s
sys	0m0.023s

This PR:

$ time vipsthumbnail php-vips-169.jpg --linear -s 1920x1080 -o x.jpg

real	0m0.132s
user	0m0.344s
sys	0m0.023s
$ time vipsthumbnail php-vips-169.jpg -s 1920x1080 -e srgb -o x.jpg

real	0m0.095s
user	0m0.094s
sys	0m0.029s
$ time vipsthumbnail php-vips-192.jpg -s 1536x1024 -e srgb -o x.jpg

real	0m0.104s
user	0m0.174s
sys	0m0.027s
$ time vipsthumbnail php-vips-192.jpg -s 1536x1024 -i srgb -e srgb -o x.jpg

real	0m0.091s
user	0m0.083s
sys	0m0.026s

So, only processing in --linear space or exporting untagged images (i.e. those without ICC profile or when -i is omitted) are slightly worse in terms of performance.

libvips/colour/icc_transform.c

Ensures compilers may optimise.

kleisauke added 2 commits March 4, 2023 14:19

colour: use floating-point APIs of lcms

c223ba8

Avoids the need to pack the buffer of floats into lcms's fixed-point formats and ensures full precision internally in lcms. Resolves: libvips#3150. Supersedes: libvips#3368.

colour: avoid temporary buffers with LAB PCS

caf5aa4

kleisauke mentioned this pull request Mar 4, 2023

colour: use cmsFLAGS_NOOPTIMIZE to keep all precision #3368

Closed

jcupitt approved these changes Mar 5, 2023

View reviewed changes

lovell approved these changes Mar 6, 2023

View reviewed changes

colour: fix D50 / D65 calculations

4e16c13

kleisauke added 2 commits March 8, 2023 19:37

colour: use floating-point APIs only for icc_export()

1a4cdbe

Merge branch '8.14'

2f94627

lovell reviewed Mar 8, 2023

View reviewed changes

libvips/colour/icc_transform.c Show resolved Hide resolved

libvips/colour/icc_transform.c Outdated Show resolved Hide resolved

Inline calculations

91070f0

Ensures compilers may optimise.

lovell approved these changes Mar 9, 2023

View reviewed changes

kleisauke merged commit 2903876 into libvips:8.14 Mar 13, 2023

kleisauke deleted the lcms-simplify branch March 13, 2023 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

colour: use floating-point APIs of lcms #3373

colour: use floating-point APIs of lcms #3373

kleisauke commented Mar 4, 2023

jcupitt left a comment

jcupitt commented Mar 5, 2023

jcupitt commented Mar 5, 2023 •

edited

Loading

lovell commented Mar 5, 2023

lovell left a comment

kleisauke commented Mar 6, 2023

jcupitt commented Mar 6, 2023

jcupitt commented Mar 6, 2023

kleisauke commented Mar 7, 2023 •

edited

Loading

jcupitt commented Mar 7, 2023

kleisauke commented Mar 7, 2023

lovell commented Mar 7, 2023

kleisauke commented Mar 7, 2023

jcupitt commented Mar 7, 2023

jcupitt commented Mar 7, 2023

jcupitt commented Mar 7, 2023

kleisauke commented Mar 7, 2023

jcupitt commented Mar 7, 2023

kleisauke commented Mar 8, 2023

kleisauke commented Mar 8, 2023

colour: use floating-point APIs of lcms #3373

colour: use floating-point APIs of lcms #3373

Conversation

kleisauke commented Mar 4, 2023

jcupitt left a comment

Choose a reason for hiding this comment

jcupitt commented Mar 5, 2023

jcupitt commented Mar 5, 2023 • edited Loading

lovell commented Mar 5, 2023

lovell left a comment

Choose a reason for hiding this comment

kleisauke commented Mar 6, 2023

jcupitt commented Mar 6, 2023

jcupitt commented Mar 6, 2023

kleisauke commented Mar 7, 2023 • edited Loading

jcupitt commented Mar 7, 2023

kleisauke commented Mar 7, 2023

lovell commented Mar 7, 2023

kleisauke commented Mar 7, 2023

jcupitt commented Mar 7, 2023

jcupitt commented Mar 7, 2023

jcupitt commented Mar 7, 2023

kleisauke commented Mar 7, 2023

jcupitt commented Mar 7, 2023

kleisauke commented Mar 8, 2023

kleisauke commented Mar 8, 2023

jcupitt commented Mar 5, 2023 •

edited

Loading

kleisauke commented Mar 7, 2023 •

edited

Loading