Add jxl chunked save #4174

jcupitt · 2024-09-29T14:28:50Z

This PR adds support for the new libjxl chunked save API.

We were preparing a whole frame of pixels then writing in one shot, but that's obviously not great for large images. This adds a new save path using JxlEncoderAddChunkedFrame(). With a 30k x 30k sample image I see:

$ /usr/bin/time -f %M:%e vips copy st-francis.jpg x.jxl
5719092:586.74
$ /usr/bin/time -f %M:%e vips copy st-francis.jpg x.jxl[chunked]
854360:573.48

It's very slightly faster, since it's able to overlap JPEG decode with JXL encode, and needs 7x less memory.

This is version 3 of this PR I think :( and is quite simple. The NOTES file outlines earlier (and crazily complicated) schemes I tried.

TODO:

this uses the iteration in JxlEncoderAddChunkedFrame() to drive libvips evaluation (and not vips_sink_*()), so it's missing things like preeval, eval etc. -- we should add these
the old sequential save path can go once testing is over
we'll need to bump the min libjxl version to at least 0.11
add chunked load as well
add extra channel support (more than 4 bands)

wrote some NOTES on what we need to do next

instead, try to drive libvips from data_at it should be A LOT simpler

same speed, about half the memory use for a 10k x 10k pixel image still missing the signals you'd expect for a sink, eg. "preeval", "eval" etc.

Add preeval, eval, posteval, minimise signals. We have to move vips_image_preeval(), vips_image_eval(), vips_image_posteval() to the public API.

jcupitt · 2024-09-29T17:45:14Z

It seems (from #libjxl on discord) that we'll need to make a threadpool and generate libjxl tiles in parallel. We can't rely on _data_at() being executed in parallel.

…to add-jxl-chunked-save2

experiment with various caching strategies ... but it looks like none will work, since we will always throw away the cache on the original jpegload on minimise we're going to have to use the super-complicated sink_disc plan

jcupitt · 2024-10-01T13:47:35Z

We are doing parallel pixel generation by running a libvips pipeline for each 2k x 2k JXL tile. But this means that we emit a minimise signal at the end of each crop, and this drops the cache from the sequential attached to the jpegload at the end of each tile.

This PR works, but only by making sequential() persist over minimise. This will cause huge memory use.

We should probably bite the bullet and implement the incredibly complicated jxlsave outlined in NOTES.

jcupitt · 2024-10-01T14:58:55Z

Ah, a better fix! Don't emit minimise for sub-evaluations, just as we don't emit preeval, eval, etc.

ready for chunked load to go in

jcupitt · 2024-10-05T10:23:56Z

Sample JXL HDR image for testing:

https://github.com/immich-app/immich/files/14755028/JPEG.XL.HDR.Test.zip

libvips loads this as a ushort, it looks fine, I guess the HDR element is in the transfer function. There's a synthetic profile attached, but it only supports perceptual intent (libvips is relative by default).

$ vipsheader -f icc-profile-data JPEG_XL_HDR.jxl | base64 -d > hdr.icc
$ file hdr.icc 
hdr.icc: ColorSync color profile 4.4, type jxl, RGB/Lab-mntr device by jxl, 4344 bytes, 1-12-2019, 0xaa027a64eabe3514 MD5 'RGB_D65_DCI_Per_PeQ'

You can import to make a LAB image:

$ vips icc_import JPEG_XL_HDR.jxl x.v --intent perceptual
$ vipsheader x.v
x.v: 6000x4000 float, 3 bands, lab, jxlload

It looks nice:

But I can't see any out of range brightnesses:

$ vips stats x.v .csv
-75.852142333984375	94.802780151367188	2672661272.8074112	136308783384.3548	37.120295455658486	22.699366058239452	5923	3900	2379	2770
0	94.802780151367188	1401669949.0775206	89046135071.592117	58.402914544896689	17.301884669527556	4779	3376	2379	2770
-75.852142333984375	76.128402709960938	430761739.44445068	11741448443.999996	17.948405810185445	12.926010766460045	5923	3900	5390	3426
-58.478599548339844	89.867706298828125	840229584.28544009	35521199868.76268	35.009566011893334	15.949303834018592	4781	3346	2876	3245

(the second column in the second row is peak L*, 94.8 here, I'd expect to see values over 100 for specular highlights)

You can get the LAB->sRGB conversion automatically like this:

$ vips icc_import JPEG_XL_HDR.jxl x.jpg --intent perceptual

Making this image:

That will use the libvips LAB->sRGB conversion. You could also write:

$ vips icc_transform JPEG_XL_HDR.jxl x.jpg srgb --intent perceptual

That will convert to sRGB using LCMS and the srgb profile bundled with libvips.

jcupitt · 2024-10-05T10:36:12Z

We'll should probably make libvips ICC import fall back to perceptual if relative is not supported.

jcupitt · 2025-05-07T16:16:39Z

I've been looking into the macos crashes and it seems to be g_mutex_clear().

For example, running vips magickload will make a magick7 object, then dispose it without running _build(). This makes this code run:

    ...
    VIPS_FREEF(magick_destroy_exception, magick7->exception);
    g_mutex_clear(&magick7->lock);

And g_mutex_clear() will crash inside pthread_mutex_destroy:

			default enum: none
			allowed enums: none, truncated, error, warning
   revalidate   - Don't use a cached result for this operation, input gboolean
			default: false
operation flags: nocache untrusted 
vips_foreign_load_magick7_dispose: 0x600002318700
Process 59255 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
    frame #0: 0x000000018b08e2d0 libsystem_pthread.dylib`pthread_mutex_destroy + 40
libsystem_pthread.dylib`pthread_mutex_destroy:
->  0x18b08e2d0 <+40>: casa   w8, w9, [x19]

->lock has the value:

  lock = {
    p = 0x0000000000000000
    i = ([0] = 0, [1] = 0)
  }

ie. just zero (as you'd expect).

If I move the g_mutex_init() from vips_foreign_load_magick7_build() to vips_foreign_load_magick7_init(), breaking on _dispose() shows that lock now has the value:

  lock = {
    p = 0x000060000000e980
    i = ([0] = 59776, [1] = 24576)
  }

And g_mutex_clear() now runs successfully.

The docs say:

Notice that the GMutex is not initialised to any particular value. Its placement in static storage ensures that it will be initialised to all-zeros, which is appropriate.

Which doesn't seem to be working. I'll try to make a small reproducer and report upstream.

jcupitt · 2025-05-07T16:28:45Z

This seems to reproduce it:

// compile with
//  cc mutex.c `pkg-config glib-2.0 --cflags --libs`

#include <stdio.h>
#include <glib.h>

GMutex lock_static;

int
main(void)
{   
    GMutex lock_stack = { 0 };

    printf("clearing lock in static storage ...\n");
    g_mutex_clear(&lock_static);

    printf("clearing lock on stack ...\n");
    g_mutex_clear(&lock_stack);

    return 0;
}

I see:

17:25 $ cc mutex.c `pkg-config glib-2.0 --cflags --libs`
✔ ~/try 
17:27 $ ./a.out
clearing lock in static storage ...
Segmentation fault: 11
✘-SEGV ~/try

System:

17:27 $ brew info glib
==> glib: stable 2.84.1 (bottled)
17:28 $ uname -a
Darwin cheese-2.local 23.4.0 Darwin Kernel Version 23.4.0: Wed Feb 21 21:45:49 PST 2024; root:xnu-10063.101.15~2/RELEASE_ARM64_T6020 arm64

jcupitt · 2025-05-07T16:34:52Z

I made an issue here: https://gitlab.gnome.org/GNOME/glib/-/issues/3682

jcupitt · 2025-05-07T17:32:11Z

I found a bit more out: it happens when glib is NOT configured to use the native glib mutex and instead uses the platform POSIX mutex. It doesn't check for a null pointer before calling g_mutex_impl_free().

It looks like g_cond() etc. probably have the same issue.

jcupitt · 2025-05-07T18:22:21Z

The glib devs clarified the docs: if you have a mutex in static storage, it's ok to not init or clear it. It'll leak with some mutex backends, but since it's in static storage, the leak is bounded and doesn't matter.

For dynamically allocated memory (ie. stack or heap), you must init exactly once and you must clear exactly once. Init to zero is not enough.

I think we're going to have to double-check all our GMutex usage :(

jcupitt · 2025-05-08T10:48:20Z

It looks like we must only call g_mutex_clear() once, so it should be called from _finalize(), not from _dispose() :(

kleisauke · 2025-05-08T11:26:17Z

Thanks for clarifying this upstream and fixing this!

It looks like we must only call g_mutex_clear() once, so it should be called from _finalize(), not from _dispose() :(

Ah, bummer. This reminds me of this suboptimal approach:

libvips/libvips/foreign/fits.c

Lines 142 to 143 in d0c4d13

    
           if (fits->line) 
        
           	g_mutex_clear(&fits->lock);

jcupitt · 2025-05-08T11:28:12Z

Yes, I saw that haha. Safe, but a little ugly.

jcupitt · 2025-05-08T14:10:43Z

I've not revised jxlload or added more than four channel support, but let's try to merge, this PR has taken far too long.

We can do another round of work in a follow-up.

ChangeLog

libvips/foreign/jp2kload.c

kleisauke · 2025-05-08T14:35:42Z

libvips/foreign/jxlload.c

+			/* Tinkering with output transforms. Experiment with this plus
+			 *
+			 *		https://sneyers.info/hdrtest/
+			 *
+			 * and see if we can get values out of SDR range
+			 */
+
+			JxlColorEncoding enc = {
+				.color_space = JXL_COLOR_SPACE_RGB,
+				.white_point = JXL_WHITE_POINT_D65,
+				.primaries = JXL_PRIMARIES_SRGB,
+				.transfer_function = JXL_TRANSFER_FUNCTION_SRGB,
+				.rendering_intent = JXL_RENDERING_INTENT_RELATIVE,
+			};
+			if (JxlDecoderSetPreferredColorProfile(jxl->decoder, &enc)) {
+				vips_foreign_load_jxl_error(jxl,
+					"JxlDecoderSetOutputColorProfile");
+				return -1;
+			}
+			if (JxlDecoderSetDesiredIntensityTarget(jxl->decoder, 10000)) {


Should we revert this to its original?

You're right, let's do the HDR stuff in a follow up PR.

NOTES

Co-authored-by: Kleis Auke Wolthuizen <github@kleisauke.nl>

libjxl 0.11 no longer has these

kleisauke · 2025-05-08T16:38:34Z

With libjxl 0.11.1 in release mode and the current version of this PR I see:

$ /usr/bin/time -f %M:%e vips copy st-francis.jpg x.jxl
memory: high-water mark 3.99 GB
5475176:59.19
$ /usr/bin/time -f %M:%e vips copy st-francis.jpg x.jxl[chunked]
memory: high-water mark 1.69 GB
2151556:66.88

FWIW, I see similar results on my AMD Ryzen 9 7900 workstation.

master branch:

$ /usr/bin/time -f %M:%e vips copy ~/st-francis.jpg x.jxl
5533800:50.35

This PR:

$ /usr/bin/time -f %M:%e vips copy st-francis.jpg x.jxl
1877692:53.78

So, ~3x less memory, similar performance and byte-identical images. Nice!

kleisauke

LGTM! I noticed a couple of unused things, but that can also be cleaned up in a follow-up PR.

libvips/foreign/jxlsave.c

oooops

jcupitt · 2025-05-08T17:59:32Z

Phew! Thanks for the review!

jcupitt added 11 commits September 16, 2024 10:30

start adding chunked load for jxl

c36aa02

Merge branch 'master' into add-jxl-chunked-load

0dea771

update todo

150b5d3

sort-of works

6104da7

wrote some NOTES on what we need to do next

Merge branch 'master' into add-jxl-chunked-save

cc37d71

sync

a598131

Merge branch 'add-jxl-chunked-load' into add-jxl-chunked-save

a7e7168

sync

f208181

sync before direction change

4533358

instead, try to drive libvips from data_at it should be A LOT simpler

switch back to crop

145bf9b

works!

7751da2

same speed, about half the memory use for a 10k x 10k pixel image still missing the signals you'd expect for a sink, eg. "preeval", "eval" etc.

jcupitt marked this pull request as draft September 29, 2024 14:28

jcupitt added 3 commits September 29, 2024 16:19

add signals for jxlsave

c5a71de

Add preeval, eval, posteval, minimise signals. We have to move vips_image_preeval(), vips_image_eval(), vips_image_posteval() to the public API.

Merge branch 'master' into add-jxl-chunked-save2

70a4b82

fix jxlsave DEBUG mode

c3ae7c0

jcupitt added 3 commits September 29, 2024 22:40

Merge branch 'add-jxl-chunked-save2' of github.com:libvips/libvips in…

c64b352

…to add-jxl-chunked-save2

use crop() for paralle demand

9a13a10

experiment with various caching strategies ... but it looks like none will work, since we will always throw away the cache on the original jpegload on minimise we're going to have to use the super-complicated sink_disc plan

works, sort of

e244055

jcupitt added 2 commits October 1, 2024 15:48

don't emit "minimise" for sub-evaluations

4cb4728

small cleanups

404c5d0

jcupitt added 4 commits October 1, 2024 16:10

Merge branch 'master' into add-jxl-chunked-save2

5f3a045

reformat jxlload

f5f5d7f

ready for chunked load to go in

Merge branch 'master' into add-jxl-chunked-save2

a121b7c

update jxlload ICC profile load API

a206dad

Merge branch 'master' into add-jxl-chunked-save2

dda21ff

jcupitt added 2 commits May 7, 2025 12:54

add a loop local

33f2c93

Merge branch 'master' into add-jxl-chunked-save2

cbcac9b

jcupitt added 2 commits May 7, 2025 19:33

fix g_mutex use in foreign/

e8a8bad

Merge branch 'master' into add-jxl-chunked-save2

703b05f

revise GMutex use again

f690d7a

more testing and polishing

f4fdeb9

jcupitt marked this pull request as ready for review May 8, 2025 14:09

kleisauke reviewed May 8, 2025

View reviewed changes

jcupitt and others added 7 commits May 8, 2025 16:07

Update libvips/foreign/jp2kload.c

e628d4f

Co-authored-by: Kleis Auke Wolthuizen <github@kleisauke.nl>

Merge branch 'master' into add-jxl-chunked-save2

2d91c48

review comments

ff6e337

revert jxlload to master

fa7dece

revert jxlload changes

d54278c

stray comment

b3a0a90

remove some stray args

f6b98e1

libjxl 0.11 no longer has these

kleisauke approved these changes May 8, 2025

View reviewed changes

catch seek errors correctly

6017462

oooops

jcupitt merged commit 214a23f into master May 8, 2025
14 checks passed

jcupitt mentioned this pull request Jun 3, 2025

Lower libjxl version requirement to 0.9.x #4550

Merged

Uh oh!

Add jxl chunked save #4174

Add jxl chunked save #4174

Uh oh!

Conversation

jcupitt commented Sep 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcupitt commented Sep 29, 2024

Uh oh!

jcupitt commented Oct 1, 2024

Uh oh!

jcupitt commented Oct 1, 2024

Uh oh!

jcupitt commented Oct 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcupitt commented Oct 5, 2024

Uh oh!

jcupitt commented May 7, 2025

Uh oh!

jcupitt commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcupitt commented May 7, 2025

Uh oh!

jcupitt commented May 7, 2025

Uh oh!

jcupitt commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcupitt commented May 8, 2025

Uh oh!

kleisauke commented May 8, 2025

Uh oh!

jcupitt commented May 8, 2025

Uh oh!

jcupitt commented May 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kleisauke May 8, 2025

Choose a reason for hiding this comment

Uh oh!

jcupitt May 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kleisauke commented May 8, 2025

Uh oh!

kleisauke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jcupitt commented May 8, 2025

Uh oh!

Uh oh!

jcupitt commented Sep 29, 2024 •

edited

Loading

jcupitt commented Oct 5, 2024 •

edited

Loading

jcupitt commented May 7, 2025 •

edited

Loading

jcupitt commented May 7, 2025 •

edited

Loading