Skip to content

shred: use 4K block #7886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

alexs-sh
Copy link
Contributor

@alexs-sh alexs-sh commented May 5, 2025

About

Issue #7870

This commit changes the default block size for the shred utility to make it more consistent with the GNU coreutils version.

Before

echo a > foo && cargo run -q --features shred shred -v -n1 foo && du -b foo
shred: foo: pass 1/1 (random)...
65536	foo

After

echo a > foo && cargo run -q --features shred shred -v -n1 foo && du -b foo
shred: foo: pass 1/1 (random)...
4096	foo

Expected

echo a > foo && shred -v -n1 foo && du -b foo
shred: foo: pass 1/1 (random)...
4096	foo

Please note: The original shred utility uses fstat along with the macroses to determine the block size. Therefore, the block size is not constant and may vary across platforms. The most common sizes are 512 bytes and 4 KB. Here, I started by changing the constant and collecting test results and observations. However, to replicate the behavior of shred, deeper modifications are required. Please feel free to share your thoughts and opinions on which approach is preferable: simply changing the constant or moving block size detection to runtime.

Thank you

This commit changes the default block size for the shred utility to make
it more consistent with the GNU coreutils version.

Issue uutils#7870
Copy link

github-actions bot commented May 5, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@sylvestre
Copy link
Contributor

did you look if it has an impact on performances ? thanks

@alexs-sh
Copy link
Contributor Author

alexs-sh commented May 10, 2025

@sylvestre Hello. I performed a brief test on shred using two different types of files: one smaller than the block size (1 byte), and one larger (1MB). I used hyperfine to collect information about time and deviation.

hyperfine -r 10000 './target/release/shred -v -n1 foo'

I recreated the file before each test.

On 1 byte file

  • the 4K and 65K blocks show the same time
  • the performance remains the same because, for any block size, the number of write calls is the same (11 calls), and each call writes exactly one block of data.

1.4 ms ± 0.3 ms - uutils shred [128 bytes]
1.4 ms ± 0.3 ms - uutils shred [512 bytes]
1.4 ms ± 0.3 ms - uutils shred [2048 bytes]
1.4 ms ± 0.3 ms - uutils shred [4096 bytes]
1.4 ms ± 0.3 ms - uutils shred [16384 bytes]
1.4 ms ± 0.3 ms - uutils shred [65536 bytes]
1.3 ms ± 0.3 ms - GNU shred

On 1MB file

  • performance decreases when using smaller block sizes. 4K block works ~20% slower than 65K
  • GNU shred has significantly better performance, while the uutils version is ~20% slower with a 65K block and nearly 50% slower with a 4K block

3.4 ms ± 0.5 ms - uutils shred [2048 bytes]**
2.8 ms ± 0.4 ms - uutils shred [4096 bytes]
2.4 ms ± 0.4 ms - uutils shred [16384 bytes]
2.3 ms ± 0.4 ms - uutils shred [65536 bytes]
1.9 ms ± 0.4 ms - GNU shred

It seems that with a 1MB file, we have many more write syscalls (266 for 4K blocks and 26 for 65K blocks), which leads to a noticeable slowdown.

Please let me know if more detailed measurements are needed or if different data collection methods should be used.

Thank you

@alexs-sh
Copy link
Contributor Author

I'm sure it's also possible to avoid the differences with GNU shred in another way. Instead of changing the buffer size, we could modify the number of bytes that are overwritten in do_pass. Something like

        let size = if exact { bytes_left } else { std::cmp::min(4096, BLOCK_SIZE) };

In this case, the behavior would be the same as with GNU shred, but without the slowdown caused by the extra write calls. What do you think?

@alexs-sh
Copy link
Contributor Author

alexs-sh commented May 11, 2025

Here is a PR with alignment changes, but without any modification to the block size.

@alexs-sh
Copy link
Contributor Author

I'm going to close this PR, as it results in a noticeable slowdown. Fixing the differences between the GNU and uutils versions by adjusting alignment seems to be a better approach.

@alexs-sh alexs-sh closed this May 11, 2025
@alexs-sh alexs-sh deleted the 7870-shred-4K-block-size branch May 21, 2025 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants