`tail`: improve performance when stdin is piped #3874

Joining7943 · 2022-08-23T21:44:27Z

This pr intends to fix the performance issues described in #3842. In addition to the problems mentioned in #3842 with the -n {+,-}{values} option, the -c {+,-}{values} options were also very slow when reading in large files, which I also addressed in this pr. Basically, this solution uses fixed size byte reads processed in chunks instead of line based reads.

So far the performance is comparable to gnu's tail.

Latest benchmarks

❯ hyperfine --warmup 3 --output pipe -L prg tail,target/release/tail -L values 10,1000,100000,10000000 -L option c,n 'cat tests/fixtures/tail/random_ascii_505MB.BIG | {prg} -{option} +{values} -'
Benchmark 1: cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -c +10 -
  Time (mean ± σ):     304.0 ms ±   3.3 ms    [User: 59.3 ms, System: 428.5 ms]
  Range (min … max):   299.6 ms … 310.8 ms    10 runs

Benchmark 2: cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -c +10 -
  Time (mean ± σ):     318.1 ms ±   3.9 ms    [User: 73.2 ms, System: 430.4 ms]
  Range (min … max):   314.2 ms … 328.3 ms    10 runs

Benchmark 3: cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -c +1000 -
  Time (mean ± σ):     307.4 ms ±   5.0 ms    [User: 57.7 ms, System: 432.5 ms]
  Range (min … max):   302.5 ms … 317.1 ms    10 runs

Benchmark 4: cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -c +1000 -
  Time (mean ± σ):     317.0 ms ±   2.3 ms    [User: 76.2 ms, System: 425.5 ms]
  Range (min … max):   313.7 ms … 321.0 ms    10 runs

Benchmark 5: cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -c +100000 -
  Time (mean ± σ):     310.2 ms ±  14.1 ms    [User: 60.6 ms, System: 427.9 ms]
  Range (min … max):   301.1 ms … 344.6 ms    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 6: cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -c +100000 -
  Time (mean ± σ):     316.0 ms ±   2.6 ms    [User: 61.9 ms, System: 438.7 ms]
  Range (min … max):   312.8 ms … 319.7 ms    10 runs

Benchmark 7: cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -c +10000000 -
  Time (mean ± σ):     300.4 ms ±   1.2 ms    [User: 66.9 ms, System: 416.3 ms]
  Range (min … max):   298.5 ms … 302.3 ms    10 runs

Benchmark 8: cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -c +10000000 -
  Time (mean ± σ):     316.4 ms ±   3.1 ms    [User: 65.6 ms, System: 434.3 ms]
  Range (min … max):   311.2 ms … 322.2 ms    10 runs

Benchmark 9: cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -n +10 -
  Time (mean ± σ):     312.5 ms ±  17.0 ms    [User: 66.3 ms, System: 426.3 ms]
  Range (min … max):   301.8 ms … 352.7 ms    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 10: cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -n +10 -
  Time (mean ± σ):     317.7 ms ±   2.0 ms    [User: 62.7 ms, System: 440.2 ms]
  Range (min … max):   314.5 ms … 321.6 ms    10 runs

Benchmark 11: cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -n +1000 -
  Time (mean ± σ):     306.8 ms ±   7.5 ms    [User: 56.3 ms, System: 436.3 ms]
  Range (min … max):   301.3 ms … 326.6 ms    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 12: cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -n +1000 -
  Time (mean ± σ):     320.9 ms ±  12.2 ms    [User: 70.9 ms, System: 431.6 ms]
  Range (min … max):   314.3 ms … 355.3 ms    10 runs

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 13: cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -n +100000 -
  Time (mean ± σ):     300.4 ms ±   1.6 ms    [User: 60.8 ms, System: 422.5 ms]
  Range (min … max):   298.2 ms … 302.5 ms    10 runs

Benchmark 14: cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -n +100000 -
  Time (mean ± σ):     313.9 ms ±   2.9 ms    [User: 71.0 ms, System: 427.7 ms]
  Range (min … max):   310.0 ms … 318.1 ms    10 runs

Benchmark 15: cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -n +10000000 -
  Time (mean ± σ):     205.2 ms ±   2.3 ms    [User: 50.8 ms, System: 297.4 ms]
  Range (min … max):   201.8 ms … 209.8 ms    14 runs

Benchmark 16: cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -n +10000000 -
  Time (mean ± σ):     203.3 ms ±   5.5 ms    [User: 77.9 ms, System: 291.6 ms]
  Range (min … max):   198.1 ms … 217.2 ms    14 runs

Summary
  'cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -n +10000000 -' ran
    1.01 ± 0.03 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -n +10000000 -'
    1.48 ± 0.04 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -n +100000 -'
    1.48 ± 0.04 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -c +10000000 -'
    1.50 ± 0.04 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -c +10 -'
    1.51 ± 0.05 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -n +1000 -'
    1.51 ± 0.05 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -c +1000 -'
    1.53 ± 0.08 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -c +100000 -'
    1.54 ± 0.09 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | tail -n +10 -'
    1.54 ± 0.04 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -n +100000 -'
    1.55 ± 0.04 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -c +100000 -'
    1.56 ± 0.04 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -c +10000000 -'
    1.56 ± 0.04 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -c +1000 -'
    1.56 ± 0.04 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -n +10 -'
    1.56 ± 0.05 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -c +10 -'
    1.58 ± 0.07 times faster than 'cat tests/fixtures/tail/random_ascii_505MB.BIG | target/release/tail -n +1000 -'

I already added some unit tests and integration tests to ensure basic correctness. However, this is currently still a draft and I'm writing on more tests and documenation.

Fixes #3842

tertsdiepraam · 2022-08-24T08:25:24Z

Cool! It doesn't currently build on the MinRustV job, could you look into that? Also can we test this without those giant files with random noise? Maybe we can generate that data during the test?

src/uu/tail/Cargo.toml

tertsdiepraam

Whoops, selected the wrong option :)

Joining7943 · 2022-08-24T12:39:33Z

@tertsdiepraam Thanks for your feedback. I'll have a look into it as soon as possible.

Rewrite handling of stdin when it is piped and read input in chunks. Fixes uutils#3842

Fixes temporary value dropped while borrowed

…d in unit tests

tertsdiepraam

Great work on the tests! I have commented 2 nits. It looks good to me, although I'm currently reading it all on my phone, so I can't really do a full review. It feels a bit verbose in places, but I think we can merge this and simplify it later if the need arises. Excellent work!

.gitignore

src/uu/tail/src/chunks.rs

Joining7943 · 2022-08-24T23:15:40Z

Thanks :)

Since we're processing piped stdin in chunks now (fa51fe8) this limit doesn't apply anymore.

Joining7943 · 2022-08-25T00:43:49Z

I fixed the failing test on 32-bit systems by removing it. I don't think checking for memory limits for commandline arguments does make much sense and gnu's tail doesn't check either.

I still don't know why all the windows tests are failing. All tests with piped in input fail as soon as some output on stdout is expected and it seems on windows systems tail produces no output, at all. I don't have a windows system around right now. Maybe you have an idea?

This reverts commit fc1b940.

Joining7943 · 2022-08-26T18:51:15Z

83930ae will fix the errors on windows. It won't fix #3845 and I don't know how this fix interferes with that pr.

This reverts commit 83930ae.

…argets See also uutils#3881

….from_chunk()

…sChunk::from_chunk

Joining7943 · 2022-09-03T12:13:41Z

Any idea why some tests related to uu_chgrp are failing all of the sudden?

sylvestre · 2022-09-03T12:16:06Z

Dunno but I opened this bug:
#3890

tertsdiepraam · 2022-09-07T17:15:17Z

Looks like we're hitting a todo!() on mac/nightly. Could you check whether this has anything to do with your changes?

 ---- test_tail::pipe_tests::test_pipe_when_lines_option_given_input_size_is_equal_to_buffer_size stdout ----
current_directory_resolved: 
run: /Users/runner/work/coreutils/coreutils/target/debug/coreutils tail -n +0
thread 'test_tail::pipe_tests::test_pipe_when_lines_option_given_input_size_is_equal_to_buffer_size' panicked at 'Command was expected to succeed.
stdout = 
 stderr = thread 'main' panicked at 'not yet implemented', src/uu/tail/src/tail.rs:432:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
', tests/common/util.rs:176:9

Joining7943 · 2022-09-07T17:44:08Z

This doesn' have to do with my changes directly. In the tail.rs file I only changed code in unbounded_tail(). I guess it has something to do with the tests I added which test these commandline options more frequently. @jhscheer did some background checks and I added my findings to the issue #3895. See also here #3874 (comment). To summarize so far, it happens randomly and it happens on macos more frequently than on linux and it has something to do with the test system. I already switched off some of the tests on macos but the -n +0 and -c +0 options are also affected. I'll also switch off these tests for above command line options on macos until it is fixed.

See also uutils#3895

Joining7943 · 2022-09-07T18:18:06Z

I decided to switch off the pipe tests on macos entirely, since also other values than +0 to the -c or -n option are affected at random.

Joining7943 · 2022-09-08T22:08:38Z

Beside trying to fix #3910, I would like to try something else and ignore the write errors to stdin. I readded macos to the pipe tests.

Joining7943 · 2022-09-09T11:04:31Z

The tests look promising, no broken pipe errors anymore. The android test is failing but I don't see a relation to this pr. @tertsdiepraam Do you? I had concerns that the input isn't transferred through the pipe completely in case of a write error in util, but looks like this isn't the case. From my side this pr is safe to merge. Do you have any concerns?

tertsdiepraam · 2022-09-09T11:28:57Z

The Android failure is probably this: #3911. I'll do a final read-through of the PR.

tertsdiepraam

Looks all good now!

sylvestre · 2022-09-09T11:39:49Z

arf :(
you didn't squash the 34 commits :(

Joining7943 · 2022-09-09T11:47:03Z

@sylvestre sorry :( I didn't want to force push until everything's alright. I should have written something ...

sylvestre · 2022-09-09T11:49:07Z

no worries, I am taking care of that

Joining7943 · 2022-09-09T11:49:53Z

uff. ok

sylvestre · 2022-09-09T11:53:35Z

Push here:
2658f8a

What I had done:
$ git reset --hard b39f523
$ git push origin main -f
(after giving me permissions)
$ wget https://patch-diff.githubusercontent.com/raw/uutils/coreutils/pull/3874.diff
$ git commit
(and I just removed my permissions)

Joining7943 · 2022-09-09T12:05:32Z

thanks and sorry again

sylvestre · 2022-09-09T12:06:39Z

no worries :)

tertsdiepraam · 2022-09-09T12:11:10Z

Whoops, sorry! @sylvestre

sylvestre · 2022-09-09T12:13:59Z

no problem :)
in case you don't know, github allows you to do that directly in the interface :)

tertsdiepraam · 2022-09-09T12:15:18Z

Yeah I just forgot to check the commits :)

Joining7943 · 2022-09-09T12:57:46Z

@sylvestre yeah thanks but I don't trust such interfaces ;) I'm using git almost only on the command line.

Joining7943 force-pushed the tail-rewrite-piped-input-handling branch from c6f494b to 7b35143 Compare August 23, 2022 21:51

tertsdiepraam approved these changes Aug 24, 2022

View reviewed changes

src/uu/tail/Cargo.toml Outdated Show resolved Hide resolved

tertsdiepraam requested changes Aug 24, 2022

View reviewed changes

Joining7943 added 5 commits August 24, 2022 15:07

tail: improve performance of piped stdin

fa51fe8

Rewrite handling of stdin when it is piped and read input in chunks. Fixes uutils#3842

gitignore: ignore big test fixtures which are used only locally

fc1b940

tail: Fix spelling and imports

68e0131

tail: Fix incompatibilities with rust 1.56.1

247fd87

Fixes temporary value dropped while borrowed

tail: Revert changes to spacing around "=" in Cargo.toml

a6c6435

Joining7943 force-pushed the tail-rewrite-piped-input-handling branch from 7b35143 to d6c561d Compare August 24, 2022 19:10

Joining7943 added 3 commits August 24, 2022 21:17

tests/common: Add library "random.rs" to generate random strings

6cbccee

tests/tail: Restructure tests to use the new random.rs library

e07198c

fixtures/tail: Remove random_ascii_* test fixtures

d63b98c

Joining7943 force-pushed the tail-rewrite-piped-input-handling branch from d6c561d to d63b98c Compare August 24, 2022 19:17

tail: Fix failing tests because of -Zpanic_abort_tests but unwind use…

25a5cb3

…d in unit tests

tertsdiepraam reviewed Aug 24, 2022

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

src/uu/tail/src/chunks.rs Show resolved Hide resolved

tests/tail: Remove test for insufficient memory on 32-bit systems

f1e568f

Since we're processing piped stdin in chunks now (fa51fe8) this limit doesn't apply anymore.

Joining7943 added 2 commits August 25, 2022 02:46

Revert "gitignore: ignore big test fixtures which are used only locally"

0667d0e

This reverts commit fc1b940.

tail: Fix piped input on windows not recognized as tailable

83930ae

Joining7943 added 7 commits August 27, 2022 03:59

Merge branch 'main' into tail-rewrite-piped-input-handling

1cce6b3

Revert "tail: Fix piped input on windows not recognized as tailable"

38bc35b

This reverts commit 83930ae.

tests/tail: Do not execute piped input tests on windows and android t…

284ef3b

…argets See also uutils#3881

tail/chunks: Adjust BytesChunk.from_chunk() to behave like LinesChunk…

76677fe

….from_chunk()

tail/chunks: Fix missing return in in BytesChunk::from_chunk and Line…

980ce07

…sChunk::from_chunk

tail/chunks: Add documentation for BytesChunk, LinesChunk etc.

17de1a2

tail/chunks: Refactor LinesChunk to use a BytesChunk as backing buffer

4397df7

tail: Fix rust error in documentation

6b65f0e

Joining7943 mentioned this pull request Sep 3, 2022

tail: tests with piped input are failing randomly with broken pipe on macos when no output is expected #3895

Closed

Merge branch 'main' into tail-rewrite-piped-input-handling

6e160b8

tests/tail: Switch off pipe tests on macos because of random failures

e46f89d

See also uutils#3895

Joining7943 added 3 commits September 7, 2022 21:55

Merge branch 'main' into tail-rewrite-piped-input-handling

0b8aedc

tests/tail: Switch pipe tests on macos back on

997bdc1

tests/tail: ignore stdin write errors

f0319df

tertsdiepraam approved these changes Sep 9, 2022

View reviewed changes

tertsdiepraam merged commit 86f9b3e into uutils:main Sep 9, 2022

Joining7943 deleted the tail-rewrite-piped-input-handling branch September 9, 2022 12:08

sylvestre mentioned this pull request Sep 9, 2022

Recent regression on test_tail::test_stdin_default stdout #3916

Closed

Uh oh!

tail: improve performance when stdin is piped #3874

tail: improve performance when stdin is piped #3874

Uh oh!

Conversation

Joining7943 commented Aug 23, 2022

Uh oh!

tertsdiepraam commented Aug 24, 2022

Uh oh!

Uh oh!

tertsdiepraam left a comment

Choose a reason for hiding this comment

Uh oh!

Joining7943 commented Aug 24, 2022

Uh oh!

tertsdiepraam left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Joining7943 commented Aug 24, 2022

Uh oh!

Joining7943 commented Aug 25, 2022

Uh oh!

Joining7943 commented Aug 26, 2022

Uh oh!

Joining7943 commented Sep 3, 2022

Uh oh!

sylvestre commented Sep 3, 2022

Uh oh!

tertsdiepraam commented Sep 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Joining7943 commented Sep 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Joining7943 commented Sep 7, 2022

Uh oh!

Joining7943 commented Sep 8, 2022

Uh oh!

Joining7943 commented Sep 9, 2022

Uh oh!

tertsdiepraam commented Sep 9, 2022

Uh oh!

tertsdiepraam left a comment

Choose a reason for hiding this comment

Uh oh!

sylvestre commented Sep 9, 2022

Uh oh!

Joining7943 commented Sep 9, 2022

Uh oh!

sylvestre commented Sep 9, 2022

Uh oh!

Joining7943 commented Sep 9, 2022

Uh oh!

sylvestre commented Sep 9, 2022

Uh oh!

Joining7943 commented Sep 9, 2022

Uh oh!

sylvestre commented Sep 9, 2022

Uh oh!

tertsdiepraam commented Sep 9, 2022

Uh oh!

sylvestre commented Sep 9, 2022

Uh oh!

tertsdiepraam commented Sep 9, 2022

Uh oh!

Joining7943 commented Sep 9, 2022

Uh oh!

Uh oh!

`tail`: improve performance when stdin is piped #3874

`tail`: improve performance when stdin is piped #3874

tertsdiepraam commented Sep 7, 2022 •

edited

Loading

Joining7943 commented Sep 7, 2022 •

edited

Loading