Skip to content

tail: fix stdin redirect (#3842) #3845

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Sep 7, 2022
Merged

Conversation

jhscheer
Copy link
Contributor

This fixes a bug where calling tail - < file.txt would result
in invoking unbounded_tail().
However, it is a stdin redirect to a seekable regular file and
therefore bounded_tail should be invoked as if tail file.txt had
been called.

fix #3842

@sylvestre
Copy link
Contributor

cool, bravo
what is the result of hyperfine with your patch ?

@jhscheer
Copy link
Contributor Author

cool, bravo what is the result of hyperfine with your patch ?

Thanks.
I posted the results here #3842 (comment)

@jhscheer
Copy link
Contributor Author

jhscheer commented Aug 19, 2022

Bummer, tests/tail-2/start-middle.sh is failing.

$ cat tests/tail-2/start-middle.sh
(echo 1; echo 2) > k || framework_failure_
sh -c 'read x; tail' < k > out || fail=1
cat <<EOF > exp
2
EOF
compare exp out || fail=1

This is tricky. Now we tail the the whole file ("1\n2\n"), but we're expected to only tail "2\n".

@tertsdiepraam Do you have an idea for this issue?

@jhscheer
Copy link
Contributor Author

Okay I figured out how to handle this issue.
If the fd points to a regular file, we cannot always assume that the file starts at the beginning.
Instead we need to call File::seek(SeekFrom::Current(0) first to get the current position and then tail from there.
I'm working on this.

@jhscheer
Copy link
Contributor Author

This is ready now.
I added a fix and a test for the issue with gnu/tests/tail-2/start-middle.sh.

Benchmarks for comparison:

Summary
  'cat 505MB.txt | tail -n 10000' ran
    1.22 ± 0.11 times faster than 'cat 505MB.txt | tail -n 100000'
    1.47 ± 0.09 times faster than 'cat 505MB.txt | tail -n 1000000'
    1.70 ± 0.21 times faster than 'cat 505MB.txt | target/release/tail -n 10000'
    2.12 ± 0.20 times faster than 'cat 505MB.txt | target/release/tail -n 100000'
    4.89 ± 0.29 times faster than 'cat 505MB.txt | target/release/tail -n 1000000'

Summary
  'tail -n 10000 505MB.txt' ran
    1.51 ± 0.43 times faster than 'target/release/tail -n 10000 505MB.txt'
    5.74 ± 1.19 times faster than 'tail -n 100000 505MB.txt'
    9.37 ± 2.23 times faster than 'target/release/tail -n 100000 505MB.txt'
   47.04 ± 9.29 times faster than 'tail -n 1000000 505MB.txt'
   60.93 ± 13.18 times faster than 'target/release/tail -n 1000000 505MB.txt'

Summary
  'tail -n 10000 < 505MB.txt' ran
    1.41 ± 0.40 times faster than 'target/release/tail -n 10000 < 505MB.txt'
    4.27 ± 1.05 times faster than 'tail -n 100000 < 505MB.txt'
    6.99 ± 1.71 times faster than 'target/release/tail -n 100000 < 505MB.txt'
   30.48 ± 7.43 times faster than 'tail -n 1000000 < 505MB.txt'
   42.79 ± 10.05 times faster than 'target/release/tail -n 1000000 < 505MB.txt'

I don't think the failing tests are related to this PR.

    test_chgrp::basic_succeeds
    test_chgrp::test_permission_denied
    test_chgrp::test_subdir_permission_denied``

@sylvestre
Copy link
Contributor

I opened #3890 for the chgrp issue

Well done for the change. Any idea how we could be even closer to GNU's in term of perf ?

@jhscheer
Copy link
Contributor Author

Any idea how we could be even closer to GNU's in term of perf ?

I think that's what @Joining7943 is working on here: #3874

This fixes a bug where calling `tail - < file.txt` would result
in invoking `unbounded_tail()`.
However, it is a stdin redirect to a seekable regular file and
therefore `bounded_tail` should be invoked as if `tail file.txt` had
been called.
Previously, if stdin redirect pointed to a regular file,
tailing started at the beginning of the file. However,
tailing needs to start at the current position because this
is expected by tests/tail-2/start-middle.sh.

This fixes the issue by taking the current offset into account
while going backwards through the stdin redirected file.
@tertsdiepraam tertsdiepraam merged commit 987479d into uutils:main Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

uu-tail performance drop when reading from stdin
3 participants