expr: Refactor AST evaluation to avoid stack overflow #7388

LouisDISPA · 2025-03-03T23:15:39Z

Overview

This pull request addresses a stack overflow issue in the expr utility when handling deeply nested expressions.

Changes

Replaced recursive tree traversal with stack-based approach
Add identifiers for AST nodes
Use a BTreeMap with node IDs for the result stack

Testing

The implementation includes a new test case that verifies the ability to handle large expressions that would previously cause stack overflow.

Notes

A simpler solution would be to use the decurse crate.

github-actions · 2025-03-04T06:35:44Z

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

sylvestre · 2025-03-04T07:11:05Z

Windows isn't happy :)
Did you look at the performance impact of this change? Thanks

tests/by-util/test_expr.rs

LouisDISPA · 2025-03-05T10:57:27Z

Windows isn't happy :) Did you look at the performance impact of this change? Thanks

The algorithm is just wrong sorry, I wanted to avoid adding an id on the nodes but it doesn't work.
For the windows issue I suspect it's related to the 8191 characters limit.

I'll push fixes and update the PR description.

github-actions · 2025-03-05T13:59:01Z

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

github-actions · 2025-03-05T14:24:26Z

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

LouisDISPA · 2025-03-10T22:48:16Z

I did some quick benchmarking by simulating the long input here are the results:

export N=10000
echo $(seq -s" + " $N)1 > tmp.txt
hyperfine -n expr "expr $(cat tmp.txt)" -n main_uu_expr "./main_uu_expr $(cat tmp.txt)" -n new_uu_expr "./new_uu_expr $(cat tmp.txt)" -N --warmup 300 -i

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`expr`	3.3 ± 0.6	2.8	13.8	1.00
`main_uu_expr`	5.7 ± 0.4	5.1	7.4	1.73 ± 0.31
`new_uu_expr`	5.6 ± 0.3	5.1	7.3	1.70 ± 0.31

And with the input that the current implementation on the main branch can't handle:

export N=30000
echo $(seq -s" + " $N)1 > tmp.txt
hyperfine -n expr "expr $(cat tmp.txt)" -n main_uu_expr "./main_uu_expr $(cat tmp.txt)" -n new_uu_expr "./new_uu_expr $(cat tmp.txt)" -N --warmup 300 -i

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`expr`	7.1 ± 0.4	6.6	11.0	1.00
`main_uu_expr (ignoring error)`	10.1 ± 0.4	9.6	11.9	1.43 ± 0.10
`new_uu_expr`	14.3 ± 0.5	13.4	16.1	2.03 ± 0.14

Tested on a MacBook Air M1 16Go macos 14.5

github-actions · 2025-03-11T07:49:40Z

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

RenjiSann · 2025-03-11T09:37:13Z

What does the " (ignoring error)" mean in your second table ?

LouisDISPA · 2025-03-11T09:48:28Z

What does the " (ignoring error)" mean in your second table ?

It's relate to the -i option of hyperfine:

  -i, --ignore-failure
          Ignore non-zero exit codes of the benchmarked programs

The command runs but the exit code is a failure because of the stack overflow.
I find interesting to see the benchmark of it, it should be mostly the parsing time with a bit of function calls until the overflow.

RenjiSann · 2025-03-11T12:05:21Z

It sounds strange to me, because when I tried, the segfault happened instantly, not after 10 seconds 🤔

LouisDISPA · 2025-03-11T21:37:33Z

It sounds strange to me, because when I tried, the segfault happened instantly, not after 10 seconds 🤔

Yes the benchmarks are in milliseconds. For the case when there is no stack overflow, it looks like my implementation is on par with the current one. I can do more benchmarks with different inputs if you want, but it's a bit hard to have a good benchmarks. When you go under the 1ms it's not reliable.

RenjiSann · 2025-03-12T09:25:19Z

src/uu/expr/src/syntax_tree.rs

@@ -5,6 +5,11 @@

 // spell-checker:ignore (ToDO) ints paren prec multibytes

+use std::{
+    collections::BTreeMap,
+    sync::atomic::{AtomicU16, Ordering},


Is there a specific reason for using atomics here ?
I'm not very familiar with it, I thought it was only useful in threaded environments.

There is no way for the compiler to assert that the function fn get_next_id() -> u16 isn't used by another thread. The simplest solution is to use an atomic value.

Another solution would be to use a thread local variable but it would be a bit more verbose. With thread locals every thread has a unique counter and it would break the code if the parser becomes multi-threaded.

if the parser becomes multi-threaded

I don't think this is likely to happen.
After all, this is "just" a simple CLI arithmetic expression evaluator.
Making it multi-threaded seems a bit overkill to me, like if you really need that level of performance, use a dedicated tool, rather than a command from the coreutils.

In the single-threaded case, using an atomic add is just gonna make the process a tiny bit slower for no real reason.

Yes make sense, I moved the implementation to use thread locals.

RenjiSann · 2025-03-12T09:27:41Z

Yes the benchmarks are in milliseconds

Duh, didn't see it 😭

Anyway, this Pr is about fixing a SEGFAULT, not improving the performance, so it's OK if you manage to keep the same performance without immproving it.

RenjiSann

Nothing to say on the logic part.
I'd rather remove the use of atomics though, as the tool is not likely to get multi-threading support.

Apart from that, I think you could stash your commits in only 2 (changes to the code, and to the test), so we keep the main branch clean from implementation details.

Good job !

Test a stack overflow that was happening on linux for long inputs.

LouisDISPA · 2025-03-12T23:06:35Z

Nothing to say on the logic part. I'd rather remove the use of atomics though, as the tool is not likely to get multi-threading support.

Apart from that, I think you could stash your commits in only 2 (changes to the code, and to the test), so we keep the main branch clean from implementation details.

Good job !

Moved the ID generation to thread locals
Squashed the PR into 2 commits (test, impl)

Fix a stack overflow happening on long inputs

github-actions · 2025-03-13T00:01:45Z

GNU testsuite comparison:

GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

github-actions · 2025-03-13T03:07:32Z

GNU testsuite comparison:

GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

github-actions · 2025-03-13T10:00:14Z

GNU testsuite comparison:

GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

sylvestre force-pushed the main branch from b2b628e to 307ca58 Compare March 4, 2025 09:27

RenjiSann reviewed Mar 4, 2025

View reviewed changes

tests/by-util/test_expr.rs Show resolved Hide resolved

LouisDISPA force-pushed the main branch 4 times, most recently from 5a1751b to 2b35b5b Compare March 5, 2025 12:50

LouisDISPA force-pushed the main branch from 0133c9f to eebd0d6 Compare March 10, 2025 22:48

RenjiSann reviewed Mar 12, 2025

View reviewed changes

RenjiSann requested changes Mar 12, 2025

View reviewed changes

expr: Add a long input test

b954960

Test a stack overflow that was happening on linux for long inputs.

LouisDISPA force-pushed the main branch from eebd0d6 to 066d7c2 Compare March 12, 2025 23:02

LouisDISPA requested a review from RenjiSann March 12, 2025 23:06

LouisDISPA force-pushed the main branch from 066d7c2 to 6ae8e82 Compare March 12, 2025 23:48

expr: Refactor evaluation to be interative instead of recursive

6440c95

Fix a stack overflow happening on long inputs

LouisDISPA force-pushed the main branch from 6ae8e82 to 6440c95 Compare March 12, 2025 23:55

RenjiSann approved these changes Mar 13, 2025

View reviewed changes

RenjiSann merged commit 56c3553 into uutils:main Mar 13, 2025
65 of 66 checks passed

BrewTestBot mentioned this pull request May 24, 2025

uutils-coreutils 0.1.0 Homebrew/homebrew-core#224645

Merged

moonfruit mentioned this pull request May 26, 2025

uutils-selected 0.1.0 moonfruit/homebrew-tap#243

Closed

Uh oh!

expr: Refactor AST evaluation to avoid stack overflow #7388

expr: Refactor AST evaluation to avoid stack overflow #7388

Uh oh!

Conversation

LouisDISPA commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

Testing

Notes

Uh oh!

github-actions bot commented Mar 4, 2025

Uh oh!

sylvestre commented Mar 4, 2025

Uh oh!

Uh oh!

LouisDISPA commented Mar 5, 2025

Uh oh!

github-actions bot commented Mar 5, 2025

Uh oh!

github-actions bot commented Mar 5, 2025

Uh oh!

LouisDISPA commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 11, 2025

Uh oh!

RenjiSann commented Mar 11, 2025

Uh oh!

LouisDISPA commented Mar 11, 2025

Uh oh!

RenjiSann commented Mar 11, 2025

Uh oh!

LouisDISPA commented Mar 11, 2025

Uh oh!

RenjiSann Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

LouisDISPA Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

RenjiSann Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

LouisDISPA Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

RenjiSann commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RenjiSann left a comment

Choose a reason for hiding this comment

Uh oh!

LouisDISPA commented Mar 12, 2025

Uh oh!

github-actions bot commented Mar 13, 2025

Uh oh!

github-actions bot commented Mar 13, 2025

Uh oh!

github-actions bot commented Mar 13, 2025

Uh oh!

Uh oh!

Uh oh!

LouisDISPA commented Mar 3, 2025 •

edited

Loading

LouisDISPA commented Mar 10, 2025 •

edited

Loading

RenjiSann commented Mar 12, 2025 •

edited

Loading