`seq` performance is very poor, compared with GNU `seq`, when passed positive integer values #7482

drinkcat · 2025-03-18T15:11:33Z

GNU seq is a lot faster when passed integer parameters, without format or fixed width options.

hyperfine -L seq seq,./target/release/seq "{seq} 0 1 1000000"
Benchmark 1: seq 0 1 1000000
  Time (mean ± σ):      10.8 ms ±   1.7 ms    [User: 9.9 ms, System: 0.9 ms]
  Range (min … max):     5.2 ms …  14.1 ms    265 runs
 
Benchmark 2: ./target/release/seq 0 1 1000000
  Time (mean ± σ):     289.8 ms ±   3.7 ms    [User: 226.3 ms, System: 63.0 ms]
  Range (min … max):   286.7 ms … 297.6 ms    10 runs
 
Summary
  seq 0 1 1000000 ran
   26.80 ± 4.21 times faster than ./target/release/seq 0 1 1000000

I suspect there is some very targeted optimization going on for this most common use case, as the gap is not nearly as bad when a format is specified (or -w), or when some of the values are negative:

$ hyperfine -L seq seq,./target/release/seq "{seq} -f"%f" 0 1 1000000"
Benchmark 1: seq -f%f 0 1 1000000
  Time (mean ± σ):     193.5 ms ±   3.8 ms    [User: 192.6 ms, System: 0.6 ms]
  Range (min … max):   190.0 ms … 204.1 ms    14 runs
 
Benchmark 2: ./target/release/seq -f%f 0 1 1000000
  Time (mean ± σ):     313.6 ms ±   8.5 ms    [User: 256.7 ms, System: 56.3 ms]
  Range (min … max):   305.9 ms … 336.2 ms    10 runs
 
Summary
  seq -f%f 0 1 1000000 ran
    1.62 ± 0.05 times faster than ./target/release/seq -f%f 0 1 1000000
$ hyperfine -L seq seq,./target/release/seq "{seq} -w 0 1 1000000"
Benchmark 1: seq -w 0 1 1000000
  Time (mean ± σ):     168.3 ms ±   4.6 ms    [User: 167.5 ms, System: 0.5 ms]
  Range (min … max):   165.0 ms … 182.7 ms    16 runs
 
Benchmark 2: ./target/release/seq -w 0 1 1000000
  Time (mean ± σ):     325.5 ms ±   3.6 ms    [User: 268.8 ms, System: 55.9 ms]
  Range (min … max):   321.6 ms … 333.3 ms    10 runs
 
Summary
  seq -w 0 1 1000000 ran
    1.93 ± 0.06 times faster than ./target/release/seq -w 0 1 1000000
$ hyperfine -L seq seq,./target/release/seq "{seq} -100 1 1000000"
Benchmark 1: seq -100 1 1000000
  Time (mean ± σ):     159.0 ms ±   2.9 ms    [User: 157.9 ms, System: 0.9 ms]
  Range (min … max):   155.7 ms … 167.7 ms    17 runs
 
Benchmark 2: ./target/release/seq -100 1 1000000
  Time (mean ± σ):     289.1 ms ±   1.9 ms    [User: 225.0 ms, System: 63.5 ms]
  Range (min … max):   287.4 ms … 293.8 ms    10 runs
 
Summary
  seq -100 1 1000000 ran
    1.82 ± 0.04 times faster than ./target/release/seq -100 1 1000000

The text was updated successfully, but these errors were encountered:

mjguzik · 2025-03-18T17:25:18Z

The GNU variant issues buffered writes, 4K chunks. This variant issues writes for every number.

I also did have a quick look with perf top to explain the greatly increased user time:

10.24% seq [.] uu_seq::print_seq
10.05% seq [.] num_bigint::biguint::convert::to_radix_le
6.18% [kernel] [k] entry_SYSCALL_64
6.16% seq [.] <uu_seq::extendedbigdecimal::ExtendedBigDecimal as core::fmt::Display>::fmt
5.66% libc.so.6 [.] _int_free
5.37% libc.so.6 [.] malloc
4.78% seq [.] core::fmt::write
4.06% libc.so.6 [.] __GI___libc_write

I'm not a rust programmer, so I have no idea how to redo this in this language. :-> Hopefully I did save someone a little bit of digging.

sylvestre · 2025-03-18T17:53:36Z

the perf profile: https://share.firefox.dev/3DM2Bus

drinkcat · 2025-03-18T18:51:41Z

@mjguzik great catch! I wanted to cover the integer case with this issue, but you're totally right, and it does help reducing system time in all cases.

Just adding this in print_seq makes uutils faster than GNU, it's so simple, I'll just make a PR:

let mut stream = BufWriter::new(stdout);

taskset -c 0 hyperfine -L seq seq,target/release/seq "{seq} -w 1000000"
Benchmark 1: seq -w 1000000
  Time (mean ± σ):     165.6 ms ±   0.4 ms    [User: 164.7 ms, System: 0.6 ms]
  Range (min … max):   165.1 ms … 166.4 ms    17 runs
 
Benchmark 2: target/release/seq -w 1000000
  Time (mean ± σ):     132.8 ms ±   2.8 ms    [User: 131.4 ms, System: 0.6 ms]
  Range (min … max):   131.6 ms … 145.0 ms    22 runs
 
Summary
  target/release/seq -w 1000000 ran
    1.25 ± 0.03 times faster than seq -w 1000000

BUT... only when a format is specified! There is still something else going on when printing integers. I tried to cast everything down to u64, and simplify the loop:

while value <= last {
        stream.write(separator.as_bytes())?;
        stream.write(value.to_string().as_bytes())?;
        value = value + increment;
    }

We're still quite a bit slower:

 taskset -c 0 hyperfine -L seq seq,target/release/seq "{seq} 1000000"
Benchmark 1: seq 1000000
  Time (mean ± σ):       5.5 ms ±   0.6 ms    [User: 5.2 ms, System: 0.4 ms]
  Range (min … max):     5.4 ms …  17.6 ms    461 runs
 
Benchmark 2: target/release/seq 1000000
  Time (mean ± σ):      18.9 ms ±   0.2 ms    [User: 18.5 ms, System: 0.5 ms]
  Range (min … max):    18.4 ms …  19.6 ms    152 runs
 
Summary
  seq 1000000 ran
    3.46 ± 0.36 times faster than target/release/seq 1000000

I think there's some clever caching of previously converted values (prefix matching?!), because my simplified version is actually faster for some increments:

taskset -c 0 hyperfine -L seq seq,target/release/seq "{seq} 0 999 100000000"
Benchmark 1: seq 0 999 100000000
  Time (mean ± σ):      19.5 ms ±   1.1 ms    [User: 18.9 ms, System: 0.5 ms]
  Range (min … max):    19.1 ms …  32.7 ms    143 runs
 
Benchmark 2: target/release/seq 0 999 100000000
  Time (mean ± σ):       2.5 ms ±   0.1 ms    [User: 2.1 ms, System: 0.5 ms]
  Range (min … max):     2.4 ms …   2.8 ms    858 runs
 
Summary
  target/release/seq 0 999 100000000 ran
    7.82 ± 0.49 times faster than seq 0 999 100000000

maandree · 2025-03-19T21:21:50Z

GNU seq is really buggy; it's pointless to compare the performance. I ran "{seq} 0 1 1000000" on uutils 0.0.29, GNU and my own implementation in C (nothing fancy, just ASCII encoded decimal numbers and no optimisation tricks; -f and -s unsupported) and found the following mean±σ's:
GNU: 6.0 ms ± 0.1 ms
uutils: 560.7 ms ± 10.0 ms
mine: 30.4 ms ± 0.7 ms.

With -w I got 258.3 ms, 564.5 ms, and 22.9 ms. Similar with -100 instead of 0.

seq 18446744073709551617 -1 18446744073709551614: 431.0 µs with incorrect output, 1,9 ms, and 478.7 µs.
Similar with seq -99999999999999999999999999999999 -1 -100000000000000000000000000000001 except GNU starts at -100000000000000000000545460846592 and gets stuck there and repeats it indefinitely.
Similar with seq 57896044618658097711785492504343953926634992332820282019728792003956564819969 -1 57896044618658097711785492504343953926634992332820282019728792003956564819966 except GNU starts at 57896044618658097711785492504343953926634992332820282019728792003956564819968 and gets stuck there and repeats it indefinitely.
seq 0 -1 -1000000: 252.7 ms, 556.0 ms, 32.1 ms.

drinkcat · 2025-03-19T21:46:18Z

Err, seq with reasonable integer parameters does work correctly right? (e.g. seq 0 1 1000000) I don't think we should just throw the towel because it is buggy in some specific cases (e.g. with extremely large numbers).

Since positive integer increment a very common usage (probably the most common usage of seq?), it does seem worth it to optimize the performance... Being 30x slower is a bit too much.

maandree · 2025-03-19T21:57:46Z

No, you shouldn't throw in the towel, that's why I gave you comparisons against a non-buggy implementation. uutils's implementation is clearly horribly slow, but don't compare it against GNU because their performance is probably a result of their bugs, and I see no reason you would want to introduce those same bugs just for performance.

I don't know what reasonable is. It does work flawlessly with non-negative integers as far as I can tell, but negatives and non-integers is a challenge for it. It does seem to work with smaller negative integers, and but when it gets larger it gets stuck, it skips values, and it even starts at something completely different that specified. A part from test scripts, I don't know any reason you would use seq(1), and in test scripts, why large positive numbers but not large negative numbers?

My implementation was consistent 5 to 6 times slower than GNU in the cases GNU performed well. So if you get close to there I would say you have something that's reasonable. I don't really se any point in optimizing seq(1) as much as possible, it's probably not going to pay of in the real world in any noticeable way, but if do want to optimize as much as possible, not count on being able to get on par with GNU.

drinkcat · 2025-04-03T19:00:47Z

Implemented a fast path in #7564. It's quite a bit of added code, but it does provide significant performance benefits, and gets within 10% of the GNU implementation in those cases (and totally beats seq in cases where our fast path gets activated, but not GNU's).

drinkcat mentioned this issue Mar 18, 2025

Move ExtendedBigDecimal to uucore/format, make use of it in formatting functions #7458

Merged

cakebaker added the U - seq label Mar 18, 2025

drinkcat mentioned this issue Mar 18, 2025

seq: Buffer writes to stdout #7485

Merged

drinkcat mentioned this issue Mar 24, 2025

seq: Add a print_seq fast path function for integer and positive increments #7564

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`seq` performance is very poor, compared with GNU `seq`, when passed positive integer values #7482

`seq` performance is very poor, compared with GNU `seq`, when passed positive integer values #7482

drinkcat commented Mar 18, 2025

mjguzik commented Mar 18, 2025 •

edited

Loading

sylvestre commented Mar 18, 2025

drinkcat commented Mar 18, 2025

maandree commented Mar 19, 2025 •

edited

Loading

drinkcat commented Mar 19, 2025

maandree commented Mar 19, 2025 •

edited

Loading

drinkcat commented Apr 3, 2025

seq performance is very poor, compared with GNU seq, when passed positive integer values #7482

seq performance is very poor, compared with GNU seq, when passed positive integer values #7482

Comments

drinkcat commented Mar 18, 2025

mjguzik commented Mar 18, 2025 • edited Loading

sylvestre commented Mar 18, 2025

drinkcat commented Mar 18, 2025

maandree commented Mar 19, 2025 • edited Loading

drinkcat commented Mar 19, 2025

maandree commented Mar 19, 2025 • edited Loading

drinkcat commented Apr 3, 2025

`seq` performance is very poor, compared with GNU `seq`, when passed positive integer values #7482

`seq` performance is very poor, compared with GNU `seq`, when passed positive integer values #7482

mjguzik commented Mar 18, 2025 •

edited

Loading

maandree commented Mar 19, 2025 •

edited

Loading

maandree commented Mar 19, 2025 •

edited

Loading