Skip to content

tr: fix high memory use, possible heap exhaustion #8435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 6, 2025

Conversation

julian-klode
Copy link
Contributor

First commit changes the code to use a fixed-size buffer to avoid heap exhaustion when reading a file with no new lines.

Second commit removes the buffering on stdout, since we already buffer our reads on stdin (we read in 8192b chunks now). This was needed for observed identical buffering behavior. It's not entirely exact in that GNU writes a first chunk of 1024 bytes and then the remaining 7168 bytes when running on my file created with truncate file -s 96G (to simulate heap exhaustion).

We could add a test case in the form of ulimit -v 128M and generating a file that has say 256M but I'm not sure how meaningful it is.

@julian-klode
Copy link
Contributor Author

Somehow test_cp is failing with invalid file descriptor in cp, in the l10n tests. 😕

@julian-klode
Copy link
Contributor Author

I think we may want a larger buffer size, as it could be beneficial for performance, say 4MB at a time, but I just went with GNU buffer size for now.

@julian-klode
Copy link
Contributor Author

Please rerun the l10n test, I wonder if there is a race condition with pipe_in of some sort.

Copy link

github-actions bot commented Aug 5, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@drinkcat
Copy link
Collaborator

drinkcat commented Aug 6, 2025

I think we may want a larger buffer size, as it could be beneficial for performance, say 4MB at a time, but I just went with GNU buffer size for now.

Yeah since you're at it, larger buffer seems worth it... Mind doing a few simple benchmarks with hyperfine?

Copy link
Collaborator

@drinkcat drinkcat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Change looks fine and we can merge it first, but if you want to do more benchmarking/optimizations, some ideas.

let filtered = buf.iter().filter_map(|&c| translator.translate(c));
let filtered = buf[..length]
.iter()
.filter_map(|&c| translator.translate(c));
output_buf.extend(filtered);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for a follow-up, but if you want to do benchmarking (and fine-tuning) here, I wonder if we can just directly write_all filtered, instead of collecting the data in output_buf first?

(but then... you might want to restore a buffered writer on stdout)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should deal with performance separately, this is a surprisingly big issue:

$ hyperfine 'gnutr a b <  zeroes'
Benchmark 1: gnutr a b <  zeroes
  Time (mean ± σ):     558.3 ms ±  80.6 ms    [User: 356.6 ms, System: 201.5 ms]
  Range (min … max):   528.3 ms … 787.5 ms    10 runs

$ hyperfine './target/release/tr a b <  zeroes'
Benchmark 1: ./target/release/tr a b <  zeroes
 ⠴ Current estimate: 5.681 s  

(main is at 6.8 ish)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should run hyperfine this way:
hyperfine 'gnutr a b < zeroes' './target/release/tr a b < zeroes'

you get much better output

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch. Yes, sounds good, would be great if you can file an issue for the performance issue.

(I didn't look at our code for tr, but we use some vector instructions in wc to find newline chars that may also be helpful here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance Bug in #8439

Read the input into a statically sized buffer - 8192, matching
GNU - instead of reading until the end of the line, as reading
until the end of the line in a file with no end of line would
result in reading the entire file into memory.

Confusingly, GNU tr seems to write the 8192 byte in two chunks
of 1024 and 7168 byte, but I can't figure out why it would do
that; I don't see any line buffering in GNU tr.

Bug-Ubuntu: https://launchpad.net/bugs/2119520
Our stdin that we transform already is buffered (using 8192 byte
buffers in the previous commit), so avoid buffering our output
needlessly.

This effectively changes the code to write complete lines
immediately, for example, in

`( echo a; sleep 1 ) | tr a b`

we receive

    read(0, "a\n", 8192)                    = 2
    write(1, "b\n", 2)                      = 2
    read(0, "", 8192)                       = 0

instead of

    read(0, "a\n", 8192)                    = 2
    read(0, "", 8192)                       = 0
    write(1, "b\n", 2)                      = 2

which matches the GNU coreutils behavior.
Copy link

github-actions bot commented Aug 6, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/tee (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@sylvestre sylvestre merged commit 76bb429 into uutils:main Aug 6, 2025
88 of 90 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants