-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
cksum: take out regex pattern matching #7580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
GNU testsuite comparison:
|
of course, perf and file size are key! |
bbbbd00
to
b1a19f7
Compare
GNU testsuite comparison:
|
b1a19f7
to
8f41040
Compare
I think this is ready for review.
|
GNU testsuite comparison:
|
looks great! bravo Dunno if it is a regression from this change but it seems that the fuzzer isn't happy:
|
I cannot find your fuzzer error in the latest test, was it in a previous test ? The CICD / MinRustV issue was my fault, I accidentally swapped variables |
The fuzzer discrepancy is not caused by this change. This PR seems only affect |
Very good changes overall 👍
|
let (filename, checksum) = ByteSliceExt::split_once(after_paren, b") = ") | ||
.or_else(|| ByteSliceExt::split_once(after_paren, b")= "))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work for files called like 'foo) = '
, because the .position()
used in ByteSliceExt is short-circuiting, when we'd like it to be greedy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I did not think of this case. If I implement rsplit_once
that iterates from right to left, that would solve this, no?
I'll add a test case.
Also, I think there is a second issue here.
MD5(filename)= fds65dsf46as5df4d6f54asd
is valid openssl syntax.
MD5 (filename) = fds65dsf46as5df4d6f54asd
is valid posix syntax.
MD5(filename) = fds65dsf46as5df4d6f54asd
is invalid syntax I think but would match here. I'll add a test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that'll work. I'd add a test where there is a ' ) = '
in the middle of the digest as well to see if the error message is the one expected.
I am looking into it none the less, but independently of this PR. |
GNU testsuite comparison:
|
5178fc0
to
80e5b2f
Compare
GNU testsuite comparison:
|
failures:
---- test_who::test_boot stdout ----
run: who --version
uutils-tests-info: version for the reference coreutil 'who' is higher than expected; expected: 8.3, found: 9.4
run: who -b
run: /home/runner/work/coreutils/coreutils/target/x86_64-unknown-linux-gnu/debug/coreutils who -b
thread 'test_who::test_boot' panicked at tests/by-util/test_who.rs:37:39:
assertion failed: `(left == right)`
Diff < left / right > :
> system boot Mar 27 13:00
>
stack backtrace:
0: rust_begin_unwind
at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/panicking.rs:692:5
1: core::panicking::panic_fmt
at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/panicking.rs:75:14
2: tests::common::util::CmdResult::stdout_is
at ./tests/common/util.rs:481:9
3: tests::test_who::test_boot
at ./tests/by-util/test_who.rs:37:9
4: tests::test_who::test_boot::{{closure}}
at ./tests/by-util/test_who.rs:33:15
5: core::ops::function::FnOnce::call_once
at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/ops/function.rs:250:5
6: core::ops::function::FnOnce::call_once
at /rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
failures:
test_who::test_boot
test result: FAILED. 3836 passed; 1 failed; 54 ignored; 0 measured; 0 filtered out; finished in 92.27s I am pretty confident I did not cause this CICD/build error. |
Could you please improve the stack ? |
The utf-8 was a regression on my part, the original regex handled filenames as bytes, my first rewrite did not.
maybe a fourth:
What do you think ? |
The fourth one is not needed IMO, the 3 others are good 👍 |
647adc2
to
fca3971
Compare
Is this ok? |
GNU testsuite comparison:
|
removes dependency on the regex crate for LineFormat detection and parsing, resulting in a faster and lighter cksum binary.
…dd tests fixes this non-regex implementation's flaw with file_names containing the separator's pattern: - replaces left-to-right greedy separator match with right-to-left one. - added bugfix tests fixes secondary bug: positive match on hybrid posix-openssl format adds secondary bugfix tests Co-authored-by: Dorian Péron <72708393+RenjiSann@users.noreply.github.com>
fca3971
to
09a9dc7
Compare
I'm wondering what's wrong with the SELinux test |
GNU testsuite comparison:
|
After reading #7504, i looked in https://github.com/uutils/coreutils-tracking/blob/main/individual-size-results/ and saw that
cksum
had a big jump in size as well. I identified the culprit: bringing in the regex crate.After looking a bit into regex for bytes, I tried removing the regex calls altogether and i came up with this branch.
The binary size is much smaller:
1.4MB
vs3.1MB
.The binary is much faster (?!) tested on a 162500 lines long file of b2sum 50% tagged and 50%untagged checksum :
And passes the gnu tests:
as well as:
If this is of value, please let me know. Then I would still have some cleanup to do.
Any remarks/thoughts are very welcome.