Skip to content

archive: avoid creating entries for each directory #3687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 26, 2023

Conversation

kleisauke
Copy link
Member

This is not necessary; the ZIP format maintains a hierarchical structure.


On a somewhat tangent, I noticed that dzsave causes non-deterministic ZIP output if concurrency is > 1.

# Ensure `vips__get_iso8601()` (i.e. the date in `scan-properties.xml` / `vips-properties.xml`) is deterministic
export LD_PRELOAD=/usr/lib64/faketime/libfaketime.so.1
export FAKETIME=$(date +"%Y-%m-%d %H:%M:%S")

vips black x.jpg 1000 1000
vips dzsave x.jpg x.szi --overlap 0 # --vips-concurrency=1

expected_sha256=$(sha256sum "x.szi" | awk '{ print $1 }')
for run in {1..5}; do
  vips dzsave x.jpg x.szi --overlap 0 # --vips-concurrency=1
  echo "$expected_sha256 x.szi" | sha256sum --check
done
$ ./check-deterministic.sh 
x.szi: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
x.szi: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
x.szi: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
x.szi: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
x.szi: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match

It looks like we need to write those ZIP entries in a sorted manner.

zipinfo diff
@ -1,25 +1,25 @@
-Archive:  x.szi
+Archive:  x-sorted.szi
 Zip file size: 34925 bytes, number of entries: 32
 -rw-rw-r--  1.0 unx     1082 bX stor 80-Jan-01 00:00 ./x/x_files/10/0_0.jpeg
--rw-rw-r--  1.0 unx     1034 bX stor 80-Jan-01 00:00 ./x/x_files/10/3_0.jpeg
 -rw-rw-r--  1.0 unx     1082 bX stor 80-Jan-01 00:00 ./x/x_files/10/1_0.jpeg
 -rw-rw-r--  1.0 unx     1082 bX stor 80-Jan-01 00:00 ./x/x_files/10/2_0.jpeg
+-rw-rw-r--  1.0 unx     1034 bX stor 80-Jan-01 00:00 ./x/x_files/10/3_0.jpeg
 -rw-rw-r--  1.0 unx     1082 bX stor 80-Jan-01 00:00 ./x/x_files/10/0_1.jpeg
 -rw-rw-r--  1.0 unx     1082 bX stor 80-Jan-01 00:00 ./x/x_files/10/1_1.jpeg
--rw-rw-r--  1.0 unx     1034 bX stor 80-Jan-01 00:00 ./x/x_files/10/3_1.jpeg
 -rw-rw-r--  1.0 unx     1082 bX stor 80-Jan-01 00:00 ./x/x_files/10/2_1.jpeg
--rw-rw-r--  1.0 unx     1058 bX stor 80-Jan-01 00:00 ./x/x_files/9/1_0.jpeg
+-rw-rw-r--  1.0 unx     1034 bX stor 80-Jan-01 00:00 ./x/x_files/10/3_1.jpeg
 -rw-rw-r--  1.0 unx     1082 bX stor 80-Jan-01 00:00 ./x/x_files/9/0_0.jpeg
+-rw-rw-r--  1.0 unx     1058 bX stor 80-Jan-01 00:00 ./x/x_files/9/1_0.jpeg
 -rw-rw-r--  1.0 unx     1082 bX stor 80-Jan-01 00:00 ./x/x_files/10/0_2.jpeg
+-rw-rw-r--  1.0 unx     1082 bX stor 80-Jan-01 00:00 ./x/x_files/10/1_2.jpeg
 -rw-rw-r--  1.0 unx     1082 bX stor 80-Jan-01 00:00 ./x/x_files/10/2_2.jpeg
 -rw-rw-r--  1.0 unx     1034 bX stor 80-Jan-01 00:00 ./x/x_files/10/3_2.jpeg
--rw-rw-r--  1.0 unx     1082 bX stor 80-Jan-01 00:00 ./x/x_files/10/1_2.jpeg
--rw-rw-r--  1.0 unx     1034 bX stor 80-Jan-01 00:00 ./x/x_files/10/1_3.jpeg
 -rw-rw-r--  1.0 unx     1034 bX stor 80-Jan-01 00:00 ./x/x_files/10/0_3.jpeg
--rw-rw-r--  1.0 unx      989 bX stor 80-Jan-01 00:00 ./x/x_files/10/3_3.jpeg
+-rw-rw-r--  1.0 unx     1034 bX stor 80-Jan-01 00:00 ./x/x_files/10/1_3.jpeg
 -rw-rw-r--  1.0 unx     1034 bX stor 80-Jan-01 00:00 ./x/x_files/10/2_3.jpeg
--rw-rw-r--  1.0 unx     1035 bX stor 80-Jan-01 00:00 ./x/x_files/9/1_1.jpeg
+-rw-rw-r--  1.0 unx      989 bX stor 80-Jan-01 00:00 ./x/x_files/10/3_3.jpeg
 -rw-rw-r--  1.0 unx     1058 bX stor 80-Jan-01 00:00 ./x/x_files/9/0_1.jpeg
+-rw-rw-r--  1.0 unx     1035 bX stor 80-Jan-01 00:00 ./x/x_files/9/1_1.jpeg
 -rw-rw-r--  1.0 unx     1082 bX stor 80-Jan-01 00:00 ./x/x_files/8/0_0.jpeg
 -rw-rw-r--  1.0 unx      506 bX stor 80-Jan-01 00:00 ./x/x_files/7/0_0.jpeg
 -rw-rw-r--  1.0 unx      362 bX stor 80-Jan-01 00:00 ./x/x_files/6/0_0.jpeg

This is not necessary; the ZIP format maintains a hierarchical
structure.
@jcupitt
Copy link
Member

jcupitt commented Sep 26, 2023

Oh, nice!

Yes, I'll make a PR to buffer compressed tiles, then sort and write at the end of the row.

@kleisauke
Copy link
Member Author

Yes, I'll make a PR to buffer compressed tiles, then sort and write at the end of the row.

Great, I'm not sure if this was also a issue with the old libgsf-based saver.

@kleisauke kleisauke merged commit 0819ecc into libvips:master Sep 26, 2023
@kleisauke kleisauke deleted the archive-avoid-dir-entries branch September 26, 2023 18:20
@jcupitt
Copy link
Member

jcupitt commented Sep 26, 2023

I'm not sure if this was also a issue with the old libgsf-based saver.

The ND only came in with parallel compression, the old one was strictly serial.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants