-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
tr: fix high memory use, possible heap exhaustion #8435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Somehow test_cp is failing with invalid file descriptor in cp, in the l10n tests. 😕 |
I think we may want a larger buffer size, as it could be beneficial for performance, say 4MB at a time, but I just went with GNU buffer size for now. |
Please rerun the l10n test, I wonder if there is a race condition with pipe_in of some sort. |
GNU testsuite comparison:
|
Yeah since you're at it, larger buffer seems worth it... Mind doing a few simple benchmarks with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Change looks fine and we can merge it first, but if you want to do more benchmarking/optimizations, some ideas.
let filtered = buf.iter().filter_map(|&c| translator.translate(c)); | ||
let filtered = buf[..length] | ||
.iter() | ||
.filter_map(|&c| translator.translate(c)); | ||
output_buf.extend(filtered); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for a follow-up, but if you want to do benchmarking (and fine-tuning) here, I wonder if we can just directly write_all
filtered, instead of collecting the data in output_buf
first?
(but then... you might want to restore a buffered writer on stdout)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should deal with performance separately, this is a surprisingly big issue:
$ hyperfine 'gnutr a b < zeroes'
Benchmark 1: gnutr a b < zeroes
Time (mean ± σ): 558.3 ms ± 80.6 ms [User: 356.6 ms, System: 201.5 ms]
Range (min … max): 528.3 ms … 787.5 ms 10 runs
$ hyperfine './target/release/tr a b < zeroes'
Benchmark 1: ./target/release/tr a b < zeroes
⠴ Current estimate: 5.681 s
(main is at 6.8 ish)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should run hyperfine this way:
hyperfine 'gnutr a b < zeroes' './target/release/tr a b < zeroes'
you get much better output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch. Yes, sounds good, would be great if you can file an issue for the performance issue.
(I didn't look at our code for tr
, but we use some vector instructions in wc
to find newline chars that may also be helpful here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance Bug in #8439
Read the input into a statically sized buffer - 8192, matching GNU - instead of reading until the end of the line, as reading until the end of the line in a file with no end of line would result in reading the entire file into memory. Confusingly, GNU tr seems to write the 8192 byte in two chunks of 1024 and 7168 byte, but I can't figure out why it would do that; I don't see any line buffering in GNU tr. Bug-Ubuntu: https://launchpad.net/bugs/2119520
Our stdin that we transform already is buffered (using 8192 byte buffers in the previous commit), so avoid buffering our output needlessly. This effectively changes the code to write complete lines immediately, for example, in `( echo a; sleep 1 ) | tr a b` we receive read(0, "a\n", 8192) = 2 write(1, "b\n", 2) = 2 read(0, "", 8192) = 0 instead of read(0, "a\n", 8192) = 2 read(0, "", 8192) = 0 write(1, "b\n", 2) = 2 which matches the GNU coreutils behavior.
GNU testsuite comparison:
|
First commit changes the code to use a fixed-size buffer to avoid heap exhaustion when reading a file with no new lines.
Second commit removes the buffering on stdout, since we already buffer our reads on stdin (we read in 8192b chunks now). This was needed for observed identical buffering behavior. It's not entirely exact in that GNU writes a first chunk of 1024 bytes and then the remaining 7168 bytes when running on my file created with
truncate file -s 96G
(to simulate heap exhaustion).We could add a test case in the form of
ulimit -v 128M
and generating a file that has say256M
but I'm not sure how meaningful it is.