-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
wc: Speed optimization #7934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wc: Speed optimization #7934
Conversation
Improves performance by about 4% on large files.
bytecount uses vector operations to speed up line counting. At least on x86 with AVX2 support, the vectors are 256-byte wide, and operations are much faster if the data is aligned. Saves about 4% of total performance, matching wc's performance.
GNU testsuite comparison:
|
Nice, same result on my machine, which is a 7040 series Ryzen laptop chip.
|
well done! |
It's actually a lot better than what I see. Interesting! Thanks for testing. |
previously the coreutils implementation was about 10x faster on my m1 mac, now its about 13x faster!
|
I wonder how it is possible :) wc is the BSD apple implementation ? |
yes, i am using the default apple implementation |
@willshuttleworth oh nice, thanks! I'm not completely sure about the aarch64 code, but I see some loading operations on 8x16x4, which sound like 512-bit/64-bytes (the underlying registers are 128-bit wide though). Mind trying to see if 64-byte alignment improves performance?
(you can Thanks! |
@drinkcat i'm not seeing a difference with 64 byte alignment:
|
Neat, thanks for trying! |
Fixes #7929.
wc: Align buffer to 32-byte boundary
bytecount uses vector operations to speed up line counting.
At least on x86 with AVX2 support, the vectors are 256-byte wide,
and operations are much faster if the data is aligned.
Saves about 4% of total performance, matching wc's performance.
wc: Increase buffer size to 256kb
Improves performance by about 4% on large files.
This gets us close or better than GNU's version:
And on 1brc dataset from original report: