-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
more: keep only screen lines in mem #7680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ba51b22
to
328da32
Compare
328da32
to
08ac28e
Compare
I will squash the commits once the PR is good to merge. |
GNU testsuite comparison:
|
GNU testsuite comparison:
|
GNU testsuite comparison:
|
GNU testsuite comparison:
|
8cf70d1
to
360e660
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh lovely!
When initializing the Pager object, we scan the entire file once and store the byte positions of each line for faster subsequent file scans.
I wonder whether that is beneficial? It might be nice to just have a really fast startup time. Maybe we could make it so the positions are stored when they are first read, but then we store them?
I'd love to see some benchmarks about the memory usage by the way! I think they'll be really good.
src/uu/more/src/more.rs
Outdated
} else { | ||
let mut buff = String::new(); | ||
stdin().read_to_string(&mut buff).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should open an issue for this too. This shouldn't be necessary I think. For now, though, we can absolutely keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A buffer is used (with io::Cursor) for now because I need a way seek the contents of stdin.
GNU testsuite comparison:
|
1 similar comment
GNU testsuite comparison:
|
Thank you for the comments. Evidently, this PR needs a lot more work. I will incorporate the changes and update the PR description. I want to bring up a point that refactoring the current implementation is especially tricky with squeezed lines. I initially used a deque to only update the front and back of the visible lines on every line up/down to reduce I/O and computation. However, I got stuck trying to integrate this logic with squeezed lines. Therefore, I am reverting to clearing all lines on every display operation. We incur more I/O cost but hopefully the OS cache can minimize this. Lmk if this approach sounds reasonable. |
71e963c
to
22e4b0b
Compare
rebased |
Benchmark # Create random 100MB file with ASCII characters
< /dev/urandom base64 | fold -w 120 | head -c 100M > bigfile
cargo build --release
/usr/bin/time ./target/release/coreutils more bigfile
# in `main`
# 0.26user 0.10system 0:01.11elapsed 32%CPU (0avgtext+0avgdata 130872maxresident)k 0inputs+0outputs (0major+31431minor)pagefaults 0swaps
# in `more-reduce-mem`
# 0.06user 0.02system 0:00.91elapsed 9%CPU (0avgtext+0avgdata 18068maxresident)k 0inputs+0outputs (0major+3179minor)pagefaults 0swaps
# system more
# 0.00user 0.00system 0:00.75elapsed 0%CPU (0avgtext+0avgdata 2528maxresident)k 0inputs+0outputs (0major+133minor)pagefaults 0swaps Big improvement in terms of first paint time (~7x) as well as memory usage (~4x), but still lagging behind default. |
The GNU implementation seems to have constant memory overhead, probably because it doesn't read the entire file on initialization. However, this is necessary in our implementation to get Furthermore, with pipes, the GNU implementation doesn't allow backwards operations and doesn't print the percentage viewed (%). Our implementation differs in this regard. We will likely need a related struct or method to handle this type altogether. Modifying this will need significant refactoring again, so it is best addressed in another PR. |
GNU testsuite comparison:
|
GNU testsuite comparison:
|
7b52e9d
to
6cd31db
Compare
rebased |
GNU testsuite comparison:
|
hi @tertsdiepraam is this PR good to merge, or are we looking to further improve performance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a merge conflict, but otherwise this looks ready to be merged! Great work!
6cd31db
to
35c5094
Compare
Thank you @tertsdiepraam. I've rebased to fix the merge conflict. |
35c5094
to
2e744af
Compare
GNU testsuite comparison:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.
Hi @sylvestre, could you merge if it looks good? Thanks! |
Thanks @aaron-ang! |
fix(partially) #6397.
This PR improves
uu_more
by storing only the lines to be drawn in memory.BufRead + Seek
intomore()
.Pager
object, we scan the entire file once and store the byte positions of each line for faster subsequent file scans. If pattern matching is required, we save the first line containing the pattern.self.content_rows
no. lines into memory (inload_visible_lines
) before writing tostdout
.