-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
split: fix bug with large arguments to -C #7128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
GNU testsuite comparison:
|
5e5209b
to
03c2e53
Compare
GNU testsuite comparison:
|
Fix the behavior of `split -C` when given large arguments. Before this commit, bytes were being greedily assigned to a chunk too aggressively, leading to a situation where a split happened in the middle of a line even though the entire line could have fit within the next chunk. This was appearing for large arguments to `-C` and long lines that extended beyond the size of the read buffer. This commit fixes the behavior. Fixes uutils#7026
03c2e53
to
071e72f
Compare
GNU testsuite comparison:
|
let line1 = format!("{:131070}\n", ""); | ||
let line2 = format!("{:1}\n", ""); | ||
let line3 = format!("{:131071}\n", ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there!
Over on our side in the Rust compiler, we're considering changing the width
and precision
formatting fields to a u16, meaning these lines will no longer compile as the max width will be {:65535}
. See rust-lang/rust#136932 for details.
These two lines are the only ones we found on all of crates.io and github that would trigger the new compiler error.
I'd love to hear your thoughts on the proposed compiler change, and what you think of changing these lines here to " ".repeat(131070)
instead.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello! Thanks for letting us know. We can definitely just change the code to " ".repeat(...)
, or something similar. There's no need to use a very large width in the format string in this test case. I'll make a pull request to change that and link it here.
As for the compiler change, it seems reasonable to me, as it seems very unlikely for applications to need such an extreme formatting behavior. (The only reason I did it here is because I was trying to mimic an existing test case that used the printf
command from GNU coreutils.) And there is always a workaround by explicitly writing the space characters directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid an extremely long format width specifier in test case `test_long_lines`. The Rust compiler is planning an upcoming change to restrict the maximum width that can be specified to 65535, so this change defends against future limitations in the compiler. For more information, see <uutils#7128 (comment)>.
Avoid an extremely long format width specifier in test case `test_long_lines`. The Rust compiler is planning an upcoming change to restrict the maximum width that can be specified to 65535, so this change defends against future limitations in the compiler. For more information, see <uutils#7128 (comment)>.
Fix the behavior of
split -C
when given large arguments. Before this commit, bytes were being greedily assigned to a chunk too aggressively, leading to a situation where a split happened in the middle of a line even though the entire line could have fit within the next chunk. This was appearing for large arguments to-C
and long lines that extended beyond the size of the read buffer. This commit fixes the behavior.I couldn't figure out how to make this work with the existing
LineBytesWriter
and instead wrote the code as a plain function that loops over each line, writing it if it fits in the chunk or otherwise moving to the next chunk. There may be a way to get it working with theLineBytesWriter
, I just couldn't figure it out. The new code will probably be less efficient.Fixes #7026