Skip to content

wc: Perf gains with the bytecount crate. #7495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 20, 2025
Merged

Conversation

karlmcdowall
Copy link
Contributor

@karlmcdowall karlmcdowall commented Mar 19, 2025

Fixes #7494
Improve performace of wc app.

  • Use the bytecount::num_chars API to count UTF-8 characters in a file.
  • Enable runtime-dispatch-simd feature in the bytecount crate.

@karlmcdowall
Copy link
Contributor Author

Benchmarking results are in the issue tracking this change.

@sylvestre
Copy link
Contributor

nice! could you please document a bit your changes in https://github.com/uutils/coreutils/blob/main/src/uu/wc/BENCHMARKING.md ?

Copy link

GNU testsuite comparison:

Congrats! The gnu test misc/stdbuf.log is no longer failing!

Issue uutils#7494
Improve performace of wc app.
 - Use the bytecount::num_chars API to count UTF-8 characters in a file.
 - Enable runtime-dispatch-simd feature in the bytecount crate.
@karlmcdowall
Copy link
Contributor Author

nice! could you please document a bit your changes in https://github.com/uutils/coreutils/blob/main/src/uu/wc/BENCHMARKING.md ?

I've added a little note to highlight that we use the bytecount crate to count UTF-8 characters as well as newlines.
Let me know if you think anything else is necessary.
Thanks.

Copy link

GNU testsuite comparison:

Congrats! The gnu test misc/stdbuf.log is no longer failing!
Congrats! The gnu test timeout/timeout.log is no longer failing!

@sylvestre
Copy link
Contributor

With:
hyperfine --export-markdown a.md -L wc /usr/bin/wc,"./target/release/coreutils.prev wc","./target/release/coreutils wc" '{wc} -l ./odyssey256.txt'

Command Mean [ms] Min [ms] Max [ms] Relative
/usr/bin/wc -l ./odyssey256.txt 25.7 ± 2.6 15.4 33.5 1.00
./target/release/coreutils.prev wc -l ./odyssey256.txt 28.9 ± 6.6 19.1 41.8 1.12 ± 0.28
./target/release/coreutils wc -l ./odyssey256.txt 26.1 ± 4.3 15.4 32.9 1.01 ± 0.20

and
hyperfine --export-markdown a.md -L wc /usr/bin/wc,"./target/release/coreutils.prev wc","./target/release/coreutils wc" '{wc} -m ./odyssey256.txt'

Command Mean [s] Min [s] Max [s] Relative
/usr/bin/wc -m ./odyssey256.txt 1.022 ± 0.029 0.998 1.094 40.36 ± 7.45
./target/release/coreutils.prev wc -m ./odyssey256.txt 0.066 ± 0.014 0.049 0.089 2.61 ± 0.74
./target/release/coreutils wc -m ./odyssey256.txt 0.025 ± 0.005 0.015 0.033 1.00

kudos!

@sylvestre sylvestre merged commit 187d3e5 into uutils:main Mar 20, 2025
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wc: Performance improvements with bytecount
2 participants