Skip to content

more: keep only screen lines in mem #7680

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 17, 2025

Conversation

aaron-ang
Copy link
Contributor

@aaron-ang aaron-ang commented Apr 7, 2025

fix(partially) #6397.

This PR improves uu_more by storing only the lines to be drawn in memory.

  1. We pass in a file object that implements BufRead + Seek into more() .
  2. When initializing the Pager object, we scan the entire file once and store the byte positions of each line for faster subsequent file scans. If pattern matching is required, we save the first line containing the pattern.
  3. When drawing lines, we first read only self.content_rows no. lines into memory (in load_visible_lines) before writing to stdout.

@aaron-ang aaron-ang force-pushed the more-reduce-mem branch 2 times, most recently from ba51b22 to 328da32 Compare April 7, 2025 01:59
@aaron-ang aaron-ang changed the title more: store file on screen more: keep only screen lines in mem Apr 7, 2025
@aaron-ang aaron-ang marked this pull request as ready for review April 7, 2025 05:36
@aaron-ang
Copy link
Contributor Author

I will squash the commits once the PR is good to merge.

Copy link

github-actions bot commented Apr 7, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

Copy link

github-actions bot commented Apr 7, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/misc/stdbuf (fails in this run but passes in the 'main' branch)

Copy link

github-actions bot commented Apr 7, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

Copy link

github-actions bot commented Apr 7, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Congrats! The gnu test tests/misc/tee is no longer failing!

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh lovely!

When initializing the Pager object, we scan the entire file once and store the byte positions of each line for faster subsequent file scans.

I wonder whether that is beneficial? It might be nice to just have a really fast startup time. Maybe we could make it so the positions are stored when they are first read, but then we store them?

I'd love to see some benchmarks about the memory usage by the way! I think they'll be really good.

} else {
let mut buff = String::new();
stdin().read_to_string(&mut buff).unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should open an issue for this too. This shouldn't be necessary I think. For now, though, we can absolutely keep it.

Copy link
Contributor Author

@aaron-ang aaron-ang Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A buffer is used (with io::Cursor) for now because I need a way seek the contents of stdin.

Copy link

github-actions bot commented Apr 7, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

1 similar comment
Copy link

github-actions bot commented Apr 7, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

@aaron-ang
Copy link
Contributor Author

aaron-ang commented Apr 7, 2025

Oh lovely!

When initializing the Pager object, we scan the entire file once and store the byte positions of each line for faster subsequent file scans.

I wonder whether that is beneficial? It might be nice to just have a really fast startup time. Maybe we could make it so the positions are stored when they are first read, but then we store them?

I'd love to see some benchmarks about the memory usage by the way! I think they'll be really good.

Thank you for the comments. Evidently, this PR needs a lot more work. I will incorporate the changes and update the PR description.

I want to bring up a point that refactoring the current implementation is especially tricky with squeezed lines. I initially used a deque to only update the front and back of the visible lines on every line up/down to reduce I/O and computation. However, I got stuck trying to integrate this logic with squeezed lines. Therefore, I am reverting to clearing all lines on every display operation. We incur more I/O cost but hopefully the OS cache can minimize this. Lmk if this approach sounds reasonable.

@aaron-ang
Copy link
Contributor Author

rebased

@aaron-ang
Copy link
Contributor Author

aaron-ang commented Apr 7, 2025

Benchmark

# Create random 100MB file with ASCII characters
< /dev/urandom base64 | fold -w 120 | head -c 100M > bigfile
cargo build --release
/usr/bin/time ./target/release/coreutils more bigfile

# in `main`
# 0.26user 0.10system 0:01.11elapsed 32%CPU (0avgtext+0avgdata 130872maxresident)k 0inputs+0outputs (0major+31431minor)pagefaults 0swaps

# in `more-reduce-mem`
# 0.06user 0.02system 0:00.91elapsed 9%CPU (0avgtext+0avgdata 18068maxresident)k 0inputs+0outputs (0major+3179minor)pagefaults 0swaps

# system more
# 0.00user 0.00system 0:00.75elapsed 0%CPU (0avgtext+0avgdata 2528maxresident)k 0inputs+0outputs (0major+133minor)pagefaults 0swaps

Big improvement in terms of first paint time (~7x) as well as memory usage (~4x), but still lagging behind default.

@aaron-ang
Copy link
Contributor Author

aaron-ang commented Apr 8, 2025

The GNU implementation seems to have constant memory overhead, probably because it doesn't read the entire file on initialization. However, this is necessary in our implementation to get line_count which is used in many parts of the code.

Furthermore, with pipes, the GNU implementation doesn't allow backwards operations and doesn't print the percentage viewed (%). Our implementation differs in this regard. We will likely need a related struct or method to handle this type altogether.

Modifying this will need significant refactoring again, so it is best addressed in another PR.

Copy link

github-actions bot commented Apr 8, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

Copy link

github-actions bot commented Apr 9, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

@aaron-ang
Copy link
Contributor Author

rebased

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@aaron-ang
Copy link
Contributor Author

hi @tertsdiepraam is this PR good to merge, or are we looking to further improve performance?

@sylvestre sylvestre requested a review from Copilot April 14, 2025 20:59
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a merge conflict, but otherwise this looks ready to be merged! Great work!

@aaron-ang
Copy link
Contributor Author

There's a merge conflict, but otherwise this looks ready to be merged! Great work!

Thank you @tertsdiepraam. I've rebased to fix the merge conflict.

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@aaron-ang aaron-ang requested a review from Copilot April 16, 2025 04:36
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

@aaron-ang
Copy link
Contributor Author

Hi @sylvestre, could you merge if it looks good? Thanks!

@tertsdiepraam tertsdiepraam merged commit da351d2 into uutils:main Apr 17, 2025
70 checks passed
@tertsdiepraam
Copy link
Member

Thanks @aaron-ang!

@aaron-ang aaron-ang deleted the more-reduce-mem branch April 17, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants