Skip to content

Move tiff JPEG compression outside the lock and thread it #3679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Sep 26, 2023

Conversation

jcupitt
Copy link
Member

@jcupitt jcupitt commented Sep 22, 2023

At the moment, libvips uses libtiff to write JPEG compressed tiles. Since compression is inside libtiff, it has to be single-threaded.

This PR moves JPEG compression into libvips and uses a threadpool to write JPEG and JP2K compressed tiles in parallel.

git master:

$ time vips copy CMU-1.svs[rgb] x.tif[tile,compression=jpeg,pyramid]
real	0m11.440s
user	1m10.627s
sys	0m4.162s

This PR:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]
real	0m4.972s
user	1m3.305s
sys	0m7.089s

@jcupitt jcupitt marked this pull request as draft September 22, 2023 13:01
@jcupitt jcupitt changed the title Move tiff JPEG compression outside the lock WiP: Move tiff JPEG compression outside the lock Sep 22, 2023
@euzada
Copy link

euzada commented Sep 22, 2023

Thank you @jcupitt for starting this PR. If I understand correctly, it still running on single thread even after move it inside libvips?

I failed to compile this version on ubuntu. There may be some issues compiling jpeg2vips.

Looking forward to test it.

@jcupitt
Copy link
Member Author

jcupitt commented Sep 23, 2023

TODO:

  • only do this for tiled TIFFs
  • run a threadpool to write we_compress tiles (see dzsave)
  • support all JPEG TIFF write options eg. subsample mode etc.
  • need to pad edge tiles?
  • error handling
  • check level copy in pyr generation
  • check it still works without HAVE_JPEG
  • do we decompress/recompress on level copy? can't remember
  • all tests should pass
  • changelog note
  • clang-format

@jcupitt
Copy link
Member Author

jcupitt commented Sep 23, 2023

Thank you @jcupitt for starting this PR. If I understand correctly, it still running on single thread even after move it inside libvips?

You're right, it still needs a threadpool to write each line of tiles. I'll put that in next.

@jcupitt
Copy link
Member Author

jcupitt commented Sep 24, 2023

I added threaded write (as well as compression by libvips).

Git master libvips:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]
real	0m11.408s
user	1m10.671s
sys	0m4.546s

This PR:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]
real	0m7.385s
user	1m5.646s
sys	0m6.712s

So about 50% faster on this PC, nice!

@euzada
Copy link

euzada commented Sep 24, 2023

I added threaded write (as well as compression by libvips).

Git master libvips:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]
real	0m11.408s
user	1m10.671s
sys	0m4.546s

This PR:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]
real	0m7.385s
user	1m5.646s
sys	0m6.712s

So about 50% faster on this PC, nice!

That's pretty impressive. I am waiting for tomorrow to do some testing. Hope I can build it for windows successfully first. My test will be using a very large image where the old version took almost 1h to finish.

@jcupitt
Copy link
Member Author

jcupitt commented Sep 24, 2023

This is the largest test image I have to hand:

$ time vips copy C2023000589S1-0-1_743eee76-f53b-aca2-4da5-fa97a44d9933_125232.svs[rgb] x.tif[tile,compression=jpeg,pyramid,tile-width=256,tile-height=256] --vips-progress --vips-leak
vips temp-2: 211213 x 87581 pixels, 32 threads, 128 x 128 tiles, 640 lines in buffer
vips temp-2: done in 86.9s          
vips_threadset_free: peak of 64 threads
memory: high-water mark 1.44 GB

real	1m27.625s
user	12m44.819s
sys	0m47.753s

@jcupitt
Copy link
Member Author

jcupitt commented Sep 24, 2023

Ah jp2k compression sees more of a speedup. Before:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jp2k,tile,pyramid]
real	7m40.086s
user	9m17.533s
sys	9m3.294s

After:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jp2k,tile,pyramid]
real	2m25.948s
user	10m17.321s
sys	18m49.749s

@euzada
Copy link

euzada commented Sep 24, 2023

Ah jp2k compression sees more of a speedup. Before:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jp2k,tile,pyramid]
real	7m40.086s
user	9m17.533s
sys	9m3.294s

After:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jp2k,tile,pyramid]
real	2m25.948s
user	10m17.321s
sys	18m49.749s

That again a very big improvement. So 70 % faster in case of jp2k and 35% faster in jpeg.

Silly question : what's the advantage of using jp2k over jpeg in tiff compression? The jpeg tiles are very small and in term of their sizes, they are probably smaller and faster to load.

@jcupitt
Copy link
Member Author

jcupitt commented Sep 24, 2023

jp2k doesn't really have any advantages imo, it's just there for compatibility with some deranged software that still uses it.

by copying the TIFFTAG_JPEGTABLES tag
@jcupitt
Copy link
Member Author

jcupitt commented Sep 24, 2023

Ah I found another speedup: we can copy the TIFFTAG_JPEGTABLES tag and skip decompress/recompress on level copy.

Git master:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]
real	0m11.408s
user	1m10.671s
sys	0m4.546s

This PR:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]
real	0m4.972s
user	1m3.305s
sys	0m7.089s

More than 2x faster now.

@euzada
Copy link

euzada commented Sep 24, 2023

Ah I found another speedup: we can copy the TIFFTAG_JPEGTABLES tag and skip decompress/recompress on level copy.

Git master:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]
real	0m11.408s
user	1m10.671s
sys	0m4.546s

This PR:

$ time vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]
real	0m4.972s
user	1m3.305s
sys	0m7.089s

More than 2x faster now.

Is this TIFFTAG_JPEGTABLES only optimized for Q=70 the default value? Tifftag_jpegtables

@jcupitt
Copy link
Member Author

jcupitt commented Sep 24, 2023

Is this TIFFTAG_JPEGTABLES only optimized for Q=70 the default value? Tifftag_jpegtables

It's created for each Q value.

@jcupitt jcupitt changed the title WiP: Move tiff JPEG compression outside the lock Move tiff JPEG compression outside the lock and thread it Sep 24, 2023
@jcupitt jcupitt requested a review from kleisauke September 24, 2023 18:03
@jcupitt jcupitt marked this pull request as ready for review September 24, 2023 18:03
@jcupitt
Copy link
Member Author

jcupitt commented Sep 24, 2023

OK, all done, I think.

@euzada
Copy link

euzada commented Sep 24, 2023

OK, all done, I think.

That was very fast for >50% improvement. Now to test it, do I need to compile this branch in mxe for windows?

@jcupitt
Copy link
Member Author

jcupitt commented Sep 24, 2023

do I need to compile this branch in mxe for windows?

No idea, probably! I think it should work.

@kleisauke
Copy link
Member

do I need to compile this branch in mxe for windows?

I just updated the 8.15-master branch within the build-win64-mxe repository, allowing you to build Windows binaries for this PR using:

$ ./build.sh --ref tiffsave-threaded-jpeg

@euzada
Copy link

euzada commented Sep 25, 2023

./build.sh --ref tiffsave-threaded-jpeg

Thank you. Build successfully. Now I will test it using my large images.

update: Thank you @jcupitt for the excellent results; I did not believe it :)

**Before: vips 8.14.3**
Image C:\Users\euzada\Desktop\TA24b2-5X_2023-08-09_8p14p3.tif[tile,compression=jpeg,pyramid,bigtiff] is now saved.
Process size in memory: 5319 MB
save_image took 5956.271 sec to execute!
Elapsed time: 1h39m51s

vipsheader.exe -a C:\Users\euzada\Desktop\TA24b2-5X_2023-08-09_8p14p3.tif
C:\Users\euzada\Desktop\TA24b2-5X_2023-08-09_8p14p3.tif: 323520x315128 uchar, 3 bands, srgb, tiffload
width: 323520
height: 315128
bands: 3
format: uchar
coding: none
interpretation: srgb
xoffset: 0
yoffset: 0
xres: 3.77953
yres: 3.77953
filename: C:\Users\euzada\Desktop\TA24b2-5X_2023-08-09_8p14p3.tif
vips-loader: tiffload
n-pages: 13
resolution-unit: in
orientation: 1

file_size= 6.16GB

**After: vips 8.15.0**
Image C:\Users\euzada\Desktop\TA24b2-5X_2023-08-09_8p15.tif[tile,compression=jpeg,pyramid,bigtiff] is now saved.
Process size in memory: 14822.34 MB
save_image took 797.650 sec to execute!
Elapsed time: 13m37s

vipsheader.exe -a C:\Users\euzada\Desktop\TA24b2-5X_2023-08-09_8p15.tif
C:\Users\euzada\Desktop\TA24b2-5X_2023-08-09_8p15.tif: 323520x315128 uchar, 3 bands, srgb, tiffload
width: 323520
height: 315128
bands: 3
format: uchar
coding: none
interpretation: srgb
xoffset: 0
yoffset: 0
xres: 3.77953
yres: 3.77953
filename: C:\Users\euzada\Desktop\TA24b2-5X_2023-08-09_8p15.tif
vips-loader: tiffload
n-pages: 13
resolution-unit: in
orientation: 1

file_size = 7.67GB

edit:
the difference in the size file probably due to a corrupted file in case of Vips 8.14.3. I was not able to open it using QuPath. While the new file was opening with no issue. Incorrect statement: Both files opened in QuPath. The second is larger in size. So, I get improvement of > 86% faster with no issue opening the file. The CPU was most of the time close to 100% while before was close to 10% on average.

Thank you @jcupitt and @kleisauke for the improvement.

@kleisauke
Copy link
Member

It looks like this PR would cause non-deterministic output. See for example this test script:

vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]

expected_sha256=$(sha256sum "x.tif" | awk '{ print $1 }')
for run in {1..5}; do
  vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]
  echo "$expected_sha256 x.tif" | sha256sum --check
done

Git master:

$ ./check-deterministic.sh
x.tif: OK
x.tif: OK
x.tif: OK
x.tif: OK
x.tif: OK

This PR:

$ ./check-deterministic.sh
x.tif: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
x.tif: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
x.tif: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
x.tif: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
x.tif: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match

@euzada
Copy link

euzada commented Sep 25, 2023

It looks like this PR would cause non-deterministic output. See for example this test script:

vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]

expected_sha256=$(sha256sum "x.tif" | awk '{ print $1 }')
for run in {1..5}; do
  vips copy CMU-1.svs[rgb] x.tif[compression=jpeg,tile,pyramid]
  echo "$expected_sha256 x.tif" | sha256sum --check
done

Git master:

$ ./check-deterministic.sh
x.tif: OK
x.tif: OK
x.tif: OK
x.tif: OK
x.tif: OK

This PR:

$ ./check-deterministic.sh
x.tif: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
x.tif: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
x.tif: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
x.tif: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
x.tif: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match

Is this test will verify the output hash? The size of the output is changing from run to run.

@jcupitt
Copy link
Member Author

jcupitt commented Sep 25, 2023

It looks like this PR would cause non-deterministic output.

Certainly -- it's writing the tiles across a row in parallel, so the tile order is set by which thread finishes first.

We could buffer all the compressed tiles in the row, then sort and write them sequentially, but it would obviously raise memory use.

jcupitt and others added 3 commits September 25, 2023 22:04
Co-authored-by: Kleis Auke Wolthuizen <github@kleisauke.nl>
Co-authored-by: Kleis Auke Wolthuizen <github@kleisauke.nl>
Co-authored-by: Kleis Auke Wolthuizen <github@kleisauke.nl>
@jcupitt
Copy link
Member Author

jcupitt commented Sep 25, 2023

Thinking about it, you're right, non-deterministic output is pretty bad. I'll add gather-sort-write. Hopefully memory use won't increase too much.

@jcupitt
Copy link
Member Author

jcupitt commented Sep 26, 2023

It should be sorted now.

vips2tiff.c is getting quite big :( but maybe we can move parts of it out it later.

Copy link
Member

@kleisauke kleisauke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kleisauke kleisauke merged commit 1131b7d into master Sep 26, 2023
@euzada
Copy link

euzada commented Sep 26, 2023

LGTM!

Test it and it works well on windows. Thank you.

@jcupitt
Copy link
Member Author

jcupitt commented Sep 27, 2023

Great! Thanks for testing @euzada

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants