Skip to content

more: constant memory initialization overhead #7765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

aaron-ang
Copy link
Contributor

@aaron-ang aaron-ang commented Apr 16, 2025

close #6397

enum InputType {
	File(BufReader<File>),
	Stdin(Stdin),
}

This PR improves uu_more by achieving constant memory overhead during initialization. Previously (#7680), we read the entire file once and store byte positions. We now do so incrementally using the following techniques:

  1. We handle files and pipes together using the InputType enum shown above. Stdin already implements BufRead and we need the underlying File type to get the file size for displaying the status.
  2. When initializing the Pager object, we scan the only up to the start line (options.from_line). We store the lines read in lines, and the byte positions in cumulative_line_sizes to display the file navigation status.
  3. If pattern matching is required, we scan from the initialized start line to find first line containing the pattern. Note that from_line takes precedence over pattern which is in line with more. This means that if the specified pattern only exists before from_line, it will not be found.
  4. The file will be read on-demand without seeking and the newly read lines are appended toself.lines. This means we scan the entire file at most once.
  5. When drawing lines, we check that self.upper_mark + self.content_rows have been read into memory. Then, we iterate over the relevant indexes in self.lines to print.

@aaron-ang
Copy link
Contributor Author

waiting for #7680 to be merged

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@@ -27,7 +27,10 @@ use uucore::{format_usage, help_about, help_usage};

const ABOUT: &str = help_about!("more.md");
const USAGE: &str = help_usage!("more.md");
const BELL: &str = "\x07";
const BELL: char = '\x07';
const MULTI_FILE_TOP_PROMPT: &str = "\r::::::::::::::\n\r{}\n\r::::::::::::::\n";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a bit cryptic, could you please add a comment for this? thanks

Copy link
Contributor Author

@aaron-ang aaron-ang Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do @sylvestre. Please note that this PR is still WIP and not ready for full review. I'm awaiting my previous PR to be merged. Once that is done, I can better compare changes to main.

I will also do a writeup on the changes made, benchmarking, and add comments in the code.

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

@aaron-ang
Copy link
Contributor Author

aaron-ang commented Apr 20, 2025

@tertsdiepraam @sylvestre here is how I plan to refactor more in broad strokes

  1. Create a new terminal screen (similar to less). On exit, restore previous screen. This can be easily done with crossterm::terminal::{EnterAlternateScreen, LeaveAlternateScreen}.
  2. Fixing options: I realized some are labelled wrongly.
  3. Instead of storing line byte offset, we incrementally read and store file data into lines. This means reverting back to the original implementation. We replace the line_count attribute with file byte size and an eof flag in the paging logic.

Would be great to hear your feedback!

Copy link

GNU testsuite comparison:

GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?

@aaron-ang
Copy link
Contributor Author

aaron-ang commented Apr 21, 2025

Benchmark

< /dev/urandom base64 | fold -w 120 | head -c 100M > bigfile
< /dev/urandom base64 | fold -w 120 | head -c 1M > smallfile
cargo build --release
# file
/usr/bin/time ./target/release/coreutils more bigfile 
/usr/bin/time ./target/release/coreutils more smallfile
# pipe
/usr/bin/time bash -c 'cat bigfile | ./target/release/coreutils more'
/usr/bin/time bash -c 'cat smallfile | ./target/release/coreutils more'

# in `main`
# bigfile: 0.24user 0.08system 0:01.34elapsed 24%CPU (0avgtext+0avgdata 131416maxresident)k 0inputs+0outputs (0major+31438minor)pagefaults 0swaps
# smallfile: 0.00user 0.00system 0:01.57elapsed 0%CPU (0avgtext+0avgdata 8820maxresident)k 0inputs+0outputs (0major+827minor)pagefaults 0swaps
# bigfile(pipe): 0.25user 0.14system 0:01.64elapsed 24%CPU (0avgtext+0avgdata 130852maxresident)k 8inputs+0outputs (0major+31788minor)pagefaults 0swaps
# smallfile(pipe): 0.00user 0.01system 0:01.52elapsed 1%CPU (0avgtext+0avgdata 8444maxresident)k 0inputs+0outputs (1major+1190minor)pagefaults 0swaps

# in `more-mem-constant`
# bigfile: 0.00user 0.00system 0:01.18elapsed 0%CPU (0avgtext+0avgdata 7664maxresident)k 0inputs+0outputs (0major+504minor)pagefaults 0swaps
# smallfile: 0.00user 0.00system 0:00.82elapsed 0%CPU (0avgtext+0avgdata 7692maxresident)k 0inputs+0outputs (0major+506minor)pagefaults 0swaps
# bigfile(pipe): 0.00user 0.00system 0:01.26elapsed 0%CPU (0avgtext+0avgdata 7432maxresident)k 0inputs+0outputs (0major+867minor)pagefaults 0swaps
# smallfile(pipe): 0.00user 0.00system 0:00.94elapsed 0%CPU (0avgtext+0avgdata 7424maxresident)k 0inputs+0outputs (0major+866minor)pagefaults 0swaps

# system
# bigfile: 0.00user 0.00system 0:00.95elapsed 0%CPU (0avgtext+0avgdata 2508maxresident)k 0inputs+0outputs (0major+136minor)pagefaults 0swaps
# smallfile: 0.00user 0.00system 0:01.15elapsed 0%CPU (0avgtext+0avgdata 2508maxresident)k 0inputs+0outputs (0major+136minor)pagefaults 0swaps
# bigfile(pipe): 0.00user 0.00system 0:01.11elapsed 0%CPU (0avgtext+0avgdata 3584maxresident)k 0inputs+0outputs (0major+494minor)pagefaults 0swaps
# smallfile(pipe): 0.00user 0.00system 0:01.10elapsed 0%CPU (0avgtext+0avgdata 3524maxresident)k 0inputs+0outputs (0major+492minor)pagefaults 0swaps

Greatly reduced time to first paint and memory usage (17x) for large files. Verified that the overhead is constant with respect to input size.

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

Copy link

GNU testsuite comparison:

GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

@aaron-ang aaron-ang marked this pull request as ready for review April 22, 2025 00:40
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/misc/usage_vs_getopt (fails in this run but passes in the 'main' branch)

Comment on lines -108 to 103
// Disable raw mode before exiting if a panic occurs
set_hook(Box::new(|panic_info| {
terminal::disable_raw_mode().unwrap();
print!("\r");
println!("{panic_info}");
}));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic for modifying the terminal has been moved from uumain to more.

Copy link

GNU testsuite comparison:

GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@aaron-ang aaron-ang requested a review from sylvestre April 22, 2025 03:44
@aaron-ang
Copy link
Contributor Author

@tertsdiepraam could you please review this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

excessive memory usage in more
2 participants