Skip to content

seq: Add a print_seq fast path function for integer and positive increments #7564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

drinkcat
Copy link
Contributor

A lot of custom logic, we basically do arithmetic on character arrays, but this comes at with huge performance gains.

Unlike coreutils seq, we do this for all positive increments (because why not), and we don't fall back to slow path if the last parameter is in scientific notation.

Fixes #7482


I'm not too sure what I think about this ,-) This is a lot of added code, but this comes with huge performance gains, around 17x in some cases (and also, it was a fun puzzle, and I learnt quite a few new Rust things ,-P), and we are now competitive with GNU seq:

$ cargo build -r -p uu_seq && taskset -c 0 hyperfine --warmup 3 -L seq target/release/seq,./seq-main,seq "{seq} 1000000"
Benchmark 1: target/release/seq 1000000
  Time (mean ± σ):      10.6 ms ±   1.1 ms    [User: 9.9 ms, System: 0.7 ms]
  Range (min … max):    10.0 ms …  22.4 ms    249 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: ./seq-main 1000000
  Time (mean ± σ):     165.7 ms ±   5.4 ms    [User: 163.5 ms, System: 0.8 ms]
  Range (min … max):   160.8 ms … 182.3 ms    18 runs
 
Benchmark 3: seq 1000000
  Time (mean ± σ):       9.3 ms ±   0.1 ms    [User: 9.0 ms, System: 0.5 ms]
  Range (min … max):     8.8 ms …   9.5 ms    275 runs
 
Summary
  seq 1000000 ran
    1.14 ± 0.12 times faster than target/release/seq 1000000
   17.83 ± 0.60 times faster than ./seq-main 1000000

We're less conservative than GNU, so this path also activates for ranges likes 1 123 100000000 (the GNU manual says they do something special below 200 increment, but I suspect that value is lower) and 1e11, as long as the desired precision is still zero (a.k.a. integers).

Still a draft, probably want to add something to BENCHMARKING.md, a few more tests, and see if I can extract a bit more performance...

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link

github-actions bot commented Apr 3, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@drinkcat drinkcat force-pushed the seq-perf branch 2 times, most recently from 29bafc5 to 01816c7 Compare April 3, 2025 17:31
Copy link

github-actions bot commented Apr 3, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@drinkcat
Copy link
Contributor Author

drinkcat commented Apr 3, 2025

Scrapped a few more percents:

cargo build -r -p uu_seq && taskset -c 0 hyperfine --warmup 3 -L seq target/release/seq,./seq-main,seq "{seq} 1000000"
Benchmark 1: target/release/seq 1000000
  Time (mean ± σ):       5.8 ms ±   0.0 ms    [User: 5.4 ms, System: 0.5 ms]
  Range (min … max):     5.7 ms …   6.1 ms    442 runs
  
Benchmark 2: ./seq-main 1000000
  Time (mean ± σ):      96.7 ms ±   2.5 ms    [User: 95.6 ms, System: 0.5 ms]
  Range (min … max):    95.6 ms … 110.0 ms    30 runs
  
Benchmark 3: seq 1000000
  Time (mean ± σ):       5.4 ms ±   0.0 ms    [User: 5.1 ms, System: 0.4 ms]
  Range (min … max):     5.3 ms …   5.6 ms    473 runs
  
Summary
  seq 1000000 ran
    1.08 ± 0.01 times faster than target/release/seq 1000000
   18.04 ± 0.48 times faster than ./seq-main 1000000

Copy link

github-actions bot commented Apr 3, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@drinkcat
Copy link
Contributor Author

drinkcat commented Apr 3, 2025

(Also added support for constant width printing, it's easy to add)

Copy link

github-actions bot commented Apr 3, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

Copy link
Member

@tertsdiepraam tertsdiepraam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Just some small comments

// Clippy wants to use `if let Some(...) = ...` to avoid is_some/unwrap combination, but that's
// not possible within an "if" test with multiple sub-expressions.
#[allow(clippy::unnecessary_unwrap)]
if fast_allowed && first_bui.is_some() && increment_u64.is_some() && last_bui.is_some() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could do

if fast_allowed {
    // Test if we can use fast code path.
    // First try to convert the range to BigUint (u64 for the increment).
    let (first_bui, increment_u64, last_bui) = (
        first.to_biguint(),
        increment.to_biguint().and_then(|x| x.to_u64()),
        last.to_biguint(),
    );
    if let (Some(first_bui), Some(increment_u64), Some(last_bui)) = (first_bui, increment_u64, last_bui) {
        ...
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! That's better indeed. Thanks!

Comment on lines 257 to 260
// Fast code path increment function.
// Add inc to the string val[start..end]. This operates on ASCII digits, assuming
// val and inc are well formed.
// Returns the new value for start.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this a docstring, which makes it also show up in language servers and things like that.

Suggested change
// Fast code path increment function.
// Add inc to the string val[start..end]. This operates on ASCII digits, assuming
// val and inc are well formed.
// Returns the new value for start.
/// Fast code path increment function.
///
/// Add inc to the string val[start..end]. This operates on ASCII digits, assuming
/// val and inc are well formed.
/// Returns the new value for start.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

Copy link

github-actions bot commented Apr 4, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

…ements

A lot of custom logic, we basically do arithmetic on character
arrays, but this comes at with huge performance gains.

Unlike coreutils `seq`, we do this for all positive increments
(because why not), and we do not fall back to slow path if
the last parameter is in scientific notation.

Also, add some tests for empty separator, as that may catch
some corner cases.
It is actually quite easy to implement, we just start with
a padded number and increment as usual.
@drinkcat
Copy link
Contributor Author

I feel a bit better about this, seeing that some of the extra code can be reused in #7782. We could directly merge that PR and drop this one, either way is fine.

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@drinkcat
Copy link
Contributor Author

Merged as part of #7782!

@drinkcat drinkcat closed this Apr 22, 2025
@drinkcat drinkcat deleted the seq-perf branch April 22, 2025 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

seq performance is very poor, compared with GNU seq, when passed positive integer values
2 participants