Skip to content

seq performance is very poor, compared with GNU seq, when passed positive integer values #7482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
drinkcat opened this issue Mar 18, 2025 · 7 comments
Labels

Comments

@drinkcat
Copy link
Contributor

GNU seq is a lot faster when passed integer parameters, without format or fixed width options.

hyperfine -L seq seq,./target/release/seq "{seq} 0 1 1000000"
Benchmark 1: seq 0 1 1000000
  Time (mean ± σ):      10.8 ms ±   1.7 ms    [User: 9.9 ms, System: 0.9 ms]
  Range (min … max):     5.2 ms …  14.1 ms    265 runs
 
Benchmark 2: ./target/release/seq 0 1 1000000
  Time (mean ± σ):     289.8 ms ±   3.7 ms    [User: 226.3 ms, System: 63.0 ms]
  Range (min … max):   286.7 ms … 297.6 ms    10 runs
 
Summary
  seq 0 1 1000000 ran
   26.80 ± 4.21 times faster than ./target/release/seq 0 1 1000000

I suspect there is some very targeted optimization going on for this most common use case, as the gap is not nearly as bad when a format is specified (or -w), or when some of the values are negative:

$ hyperfine -L seq seq,./target/release/seq "{seq} -f"%f" 0 1 1000000"
Benchmark 1: seq -f%f 0 1 1000000
  Time (mean ± σ):     193.5 ms ±   3.8 ms    [User: 192.6 ms, System: 0.6 ms]
  Range (min … max):   190.0 ms … 204.1 ms    14 runs
 
Benchmark 2: ./target/release/seq -f%f 0 1 1000000
  Time (mean ± σ):     313.6 ms ±   8.5 ms    [User: 256.7 ms, System: 56.3 ms]
  Range (min … max):   305.9 ms … 336.2 ms    10 runs
 
Summary
  seq -f%f 0 1 1000000 ran
    1.62 ± 0.05 times faster than ./target/release/seq -f%f 0 1 1000000
$ hyperfine -L seq seq,./target/release/seq "{seq} -w 0 1 1000000"
Benchmark 1: seq -w 0 1 1000000
  Time (mean ± σ):     168.3 ms ±   4.6 ms    [User: 167.5 ms, System: 0.5 ms]
  Range (min … max):   165.0 ms … 182.7 ms    16 runs
 
Benchmark 2: ./target/release/seq -w 0 1 1000000
  Time (mean ± σ):     325.5 ms ±   3.6 ms    [User: 268.8 ms, System: 55.9 ms]
  Range (min … max):   321.6 ms … 333.3 ms    10 runs
 
Summary
  seq -w 0 1 1000000 ran
    1.93 ± 0.06 times faster than ./target/release/seq -w 0 1 1000000
$ hyperfine -L seq seq,./target/release/seq "{seq} -100 1 1000000"
Benchmark 1: seq -100 1 1000000
  Time (mean ± σ):     159.0 ms ±   2.9 ms    [User: 157.9 ms, System: 0.9 ms]
  Range (min … max):   155.7 ms … 167.7 ms    17 runs
 
Benchmark 2: ./target/release/seq -100 1 1000000
  Time (mean ± σ):     289.1 ms ±   1.9 ms    [User: 225.0 ms, System: 63.5 ms]
  Range (min … max):   287.4 ms … 293.8 ms    10 runs
 
Summary
  seq -100 1 1000000 ran
    1.82 ± 0.04 times faster than ./target/release/seq -100 1 1000000
@mjguzik
Copy link

mjguzik commented Mar 18, 2025

The GNU variant issues buffered writes, 4K chunks. This variant issues writes for every number.

I also did have a quick look with perf top to explain the greatly increased user time:

10.24% seq [.] uu_seq::print_seq
10.05% seq [.] num_bigint::biguint::convert::to_radix_le
6.18% [kernel] [k] entry_SYSCALL_64
6.16% seq [.] <uu_seq::extendedbigdecimal::ExtendedBigDecimal as core::fmt::Display>::fmt
5.66% libc.so.6 [.] _int_free
5.37% libc.so.6 [.] malloc
4.78% seq [.] core::fmt::write
4.06% libc.so.6 [.] __GI___libc_write

I'm not a rust programmer, so I have no idea how to redo this in this language. :-> Hopefully I did save someone a little bit of digging.

@sylvestre
Copy link
Contributor

the perf profile: https://share.firefox.dev/3DM2Bus

@drinkcat
Copy link
Contributor Author

@mjguzik great catch! I wanted to cover the integer case with this issue, but you're totally right, and it does help reducing system time in all cases.

Just adding this in print_seq makes uutils faster than GNU, it's so simple, I'll just make a PR:

let mut stream = BufWriter::new(stdout);
taskset -c 0 hyperfine -L seq seq,target/release/seq "{seq} -w 1000000"
Benchmark 1: seq -w 1000000
  Time (mean ± σ):     165.6 ms ±   0.4 ms    [User: 164.7 ms, System: 0.6 ms]
  Range (min … max):   165.1 ms … 166.4 ms    17 runs
 
Benchmark 2: target/release/seq -w 1000000
  Time (mean ± σ):     132.8 ms ±   2.8 ms    [User: 131.4 ms, System: 0.6 ms]
  Range (min … max):   131.6 ms … 145.0 ms    22 runs
 
Summary
  target/release/seq -w 1000000 ran
    1.25 ± 0.03 times faster than seq -w 1000000

BUT... only when a format is specified! There is still something else going on when printing integers. I tried to cast everything down to u64, and simplify the loop:

while value <= last {
        stream.write(separator.as_bytes())?;
        stream.write(value.to_string().as_bytes())?;
        value = value + increment;
    }

We're still quite a bit slower:

 taskset -c 0 hyperfine -L seq seq,target/release/seq "{seq} 1000000"
Benchmark 1: seq 1000000
  Time (mean ± σ):       5.5 ms ±   0.6 ms    [User: 5.2 ms, System: 0.4 ms]
  Range (min … max):     5.4 ms …  17.6 ms    461 runs
 
Benchmark 2: target/release/seq 1000000
  Time (mean ± σ):      18.9 ms ±   0.2 ms    [User: 18.5 ms, System: 0.5 ms]
  Range (min … max):    18.4 ms …  19.6 ms    152 runs
 
Summary
  seq 1000000 ran
    3.46 ± 0.36 times faster than target/release/seq 1000000

I think there's some clever caching of previously converted values (prefix matching?!), because my simplified version is actually faster for some increments:

taskset -c 0 hyperfine -L seq seq,target/release/seq "{seq} 0 999 100000000"
Benchmark 1: seq 0 999 100000000
  Time (mean ± σ):      19.5 ms ±   1.1 ms    [User: 18.9 ms, System: 0.5 ms]
  Range (min … max):    19.1 ms …  32.7 ms    143 runs
 
Benchmark 2: target/release/seq 0 999 100000000
  Time (mean ± σ):       2.5 ms ±   0.1 ms    [User: 2.1 ms, System: 0.5 ms]
  Range (min … max):     2.4 ms …   2.8 ms    858 runs
 
Summary
  target/release/seq 0 999 100000000 ran
    7.82 ± 0.49 times faster than seq 0 999 100000000

@maandree
Copy link

maandree commented Mar 19, 2025

GNU seq is really buggy; it's pointless to compare the performance. I ran "{seq} 0 1 1000000" on uutils 0.0.29, GNU and my own implementation in C (nothing fancy, just ASCII encoded decimal numbers and no optimisation tricks; -f and -s unsupported) and found the following mean±σ's:
GNU: 6.0 ms ± 0.1 ms
uutils: 560.7 ms ± 10.0 ms
mine: 30.4 ms ± 0.7 ms.

With -w I got 258.3 ms, 564.5 ms, and 22.9 ms. Similar with -100 instead of 0.

seq 18446744073709551617 -1 18446744073709551614: 431.0 µs with incorrect output, 1,9 ms, and 478.7 µs.
Similar with seq -99999999999999999999999999999999 -1 -100000000000000000000000000000001 except GNU starts at -100000000000000000000545460846592 and gets stuck there and repeats it indefinitely.
Similar with seq 57896044618658097711785492504343953926634992332820282019728792003956564819969 -1 57896044618658097711785492504343953926634992332820282019728792003956564819966 except GNU starts at 57896044618658097711785492504343953926634992332820282019728792003956564819968 and gets stuck there and repeats it indefinitely.
seq 0 -1 -1000000: 252.7 ms, 556.0 ms, 32.1 ms.

@drinkcat
Copy link
Contributor Author

Err, seq with reasonable integer parameters does work correctly right? (e.g. seq 0 1 1000000) I don't think we should just throw the towel because it is buggy in some specific cases (e.g. with extremely large numbers).

Since positive integer increment a very common usage (probably the most common usage of seq?), it does seem worth it to optimize the performance... Being 30x slower is a bit too much.

@maandree
Copy link

maandree commented Mar 19, 2025

No, you shouldn't throw in the towel, that's why I gave you comparisons against a non-buggy implementation. uutils's implementation is clearly horribly slow, but don't compare it against GNU because their performance is probably a result of their bugs, and I see no reason you would want to introduce those same bugs just for performance.

I don't know what reasonable is. It does work flawlessly with non-negative integers as far as I can tell, but negatives and non-integers is a challenge for it. It does seem to work with smaller negative integers, and but when it gets larger it gets stuck, it skips values, and it even starts at something completely different that specified. A part from test scripts, I don't know any reason you would use seq(1), and in test scripts, why large positive numbers but not large negative numbers?

My implementation was consistent 5 to 6 times slower than GNU in the cases GNU performed well. So if you get close to there I would say you have something that's reasonable. I don't really se any point in optimizing seq(1) as much as possible, it's probably not going to pay of in the real world in any noticeable way, but if do want to optimize as much as possible, not count on being able to get on par with GNU.

@drinkcat
Copy link
Contributor Author

drinkcat commented Apr 3, 2025

Implemented a fast path in #7564. It's quite a bit of added code, but it does provide significant performance benefits, and gets within 10% of the GNU implementation in those cases (and totally beats seq in cases where our fast path gets activated, but not GNU's).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants