Skip to content

Add jxl chunked save #4174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 30 commits into
base: master
Choose a base branch
from
Draft

Add jxl chunked save #4174

wants to merge 30 commits into from

Conversation

jcupitt
Copy link
Member

@jcupitt jcupitt commented Sep 29, 2024

This PR adds support for the new libjxl chunked save API.

We were preparing a whole frame of pixels then writing in one shot, but that's obviously not great for large images. This adds a new save path using JxlEncoderAddChunkedFrame(). With a 30k x 30k sample image I see:

$ /usr/bin/time -f %M:%e vips copy st-francis.jpg x.jxl
5719092:586.74
$ /usr/bin/time -f %M:%e vips copy st-francis.jpg x.jxl[chunked]
854360:573.48

It's very slightly faster, since it's able to overlap JPEG decode with JXL encode, and needs 7x less memory.

This is version 3 of this PR I think :( and is quite simple. The NOTES file outlines earlier (and crazily complicated) schemes I tried.

TODO:

  • this uses the iteration in JxlEncoderAddChunkedFrame() to drive libvips evaluation (and not vips_sink_*()), so it's missing things like preeval, eval etc. -- we should add these
  • the old sequential save path can go once testing is over
  • we'll need to bump the min libjxl version to at least 0.11
  • add chunked load as well
  • add extra channel support (more than 4 bands)

wrote some NOTES on what we need to do next
instead, try to drive libvips from data_at

it should be A LOT simpler
same speed, about half the memory use for a 10k x 10k pixel image

still missing the signals you'd expect for a sink, eg. "preeval", "eval"
etc.
@jcupitt jcupitt marked this pull request as draft September 29, 2024 14:28
Add preeval, eval, posteval, minimise signals.

We have to move vips_image_preeval(), vips_image_eval(), vips_image_posteval()
to the public API.
@jcupitt
Copy link
Member Author

jcupitt commented Sep 29, 2024

It seems (from #libjxl on discord) that we'll need to make a threadpool and generate libjxl tiles in parallel. We can't rely on _data_at() being executed in parallel.

experiment with various caching strategies ...  but it looks like none will
work, since we will always throw away the cache on the original jpegload
on minimise

we're going to have to use the super-complicated sink_disc plan
@jcupitt
Copy link
Member Author

jcupitt commented Oct 1, 2024

We are doing parallel pixel generation by running a libvips pipeline for each 2k x 2k JXL tile. But this means that we emit a minimise signal at the end of each crop, and this drops the cache from the sequential attached to the jpegload at the end of each tile.

This PR works, but only by making sequential() persist over minimise. This will cause huge memory use.

We should probably bite the bullet and implement the incredibly complicated jxlsave outlined in NOTES.

@jcupitt
Copy link
Member Author

jcupitt commented Oct 1, 2024

Ah, a better fix! Don't emit minimise for sub-evaluations, just as we don't emit preeval, eval, etc.

@jcupitt
Copy link
Member Author

jcupitt commented Oct 5, 2024

Sample JXL HDR image for testing:

https://github.com/immich-app/immich/files/14755028/JPEG.XL.HDR.Test.zip

libvips loads this as a ushort, it looks fine, I guess the HDR element is in the transfer function. There's a synthetic profile attached, but it only supports perceptual intent (libvips is relative by default).

$ vipsheader -f icc-profile-data JPEG_XL_HDR.jxl | base64 -d > hdr.icc
$ file hdr.icc 
hdr.icc: ColorSync color profile 4.4, type jxl, RGB/Lab-mntr device by jxl, 4344 bytes, 1-12-2019, 0xaa027a64eabe3514 MD5 'RGB_D65_DCI_Per_PeQ'

You can import to make a LAB image:

$ vips icc_import JPEG_XL_HDR.jxl x.v --intent perceptual
$ vipsheader x.v
x.v: 6000x4000 float, 3 bands, lab, jxlload

It looks nice:

image

But I can't see any out of range brightnesses:

$ vips stats x.v .csv
-75.852142333984375	94.802780151367188	2672661272.8074112	136308783384.3548	37.120295455658486	22.699366058239452	5923	3900	2379	2770
0	94.802780151367188	1401669949.0775206	89046135071.592117	58.402914544896689	17.301884669527556	4779	3376	2379	2770
-75.852142333984375	76.128402709960938	430761739.44445068	11741448443.999996	17.948405810185445	12.926010766460045	5923	3900	5390	3426
-58.478599548339844	89.867706298828125	840229584.28544009	35521199868.76268	35.009566011893334	15.949303834018592	4781	3346	2876	3245

(the second column in the second row is peak L*, 94.8 here, I'd expect to see values over 100 for specular highlights)

You can get the LAB->sRGB conversion automatically like this:

$ vips icc_import JPEG_XL_HDR.jxl x.jpg --intent perceptual

Making this image:

x

That will use the libvips LAB->sRGB conversion. You could also write:

$ vips icc_transform JPEG_XL_HDR.jxl x.jpg srgb --intent perceptual 

That will convert to sRGB using LCMS and the srgb profile bundled with libvips.

@jcupitt
Copy link
Member Author

jcupitt commented Oct 5, 2024

We'll should probably make libvips ICC import fall back to perceptual if relative is not supported.

@jcupitt
Copy link
Member Author

jcupitt commented Oct 6, 2024

Useful JXL HDR test images:

https://sneyers.info/hdrtest/

@jcupitt
Copy link
Member Author

jcupitt commented Apr 30, 2025

With libjxl 0.11.1 in release mode and the current version of this PR I see:

$ /usr/bin/time -f %M:%e vips copy st-francis.jpg x.jxl
memory: high-water mark 3.99 GB
5475176:59.19
$ /usr/bin/time -f %M:%e vips copy st-francis.jpg x.jxl[chunked]
memory: high-water mark 1.69 GB
2151556:66.88

So it's still better, though less than it used to be thanks to libjxl improvements.

I bump the min libjxl version requirement to 0.11, which rules out a lot of current linuxes.

I'll give it another once-over and see if I can can improve it a bit more.

Copy link
Member

@kleisauke kleisauke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

g_mutex_{init,clear}() still needs to be done for mutexes that are part of dynamically allocated structures.

Co-authored-by: Kleis Auke Wolthuizen <github@kleisauke.nl>
Comment on lines +826 to +830
if (jxl->delay &&
n < jxl->delay_length)
header.duration = jxl->delay[n];
else
header.duration = jxl->gif_delay * 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporates the changes from 94502d8 and 2672bb6.

Suggested change
if (jxl->delay &&
n < jxl->delay_length)
header.duration = jxl->delay[n];
else
header.duration = jxl->gif_delay * 10;
if (!jxl->is_animated)
header.duration = 0xffffffff;
else
header.duration =
vips_foreign_save_jxl_get_delay(jxl, n);

*/
memcpy(exif_data + 4, data, length);
if (jxl->info.have_animation) {
JxlFrameHeader header = { 0 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! It looks like the memset() in vips_foreign_save_jxl_add_frame() can also be replaced with zero initialization now (= { 0 }).

@kleisauke
Copy link
Member

macOS CI failures looks real, but I couldn't reproduce that locally (on Fedora 42 with libjxl 0.11.1).

@kleisauke
Copy link
Member

  • the old sequential save path can go once testing is over

We could also consider keeping it behind a oneshot flag (like jp2kload), but I'm not sure if that's worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants