Skip to content

expr: Refactor AST evaluation to avoid stack overflow #7388

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 13, 2025

Conversation

LouisDISPA
Copy link
Contributor

@LouisDISPA LouisDISPA commented Mar 3, 2025

Overview

This pull request addresses a stack overflow issue in the expr utility when handling deeply nested expressions.

Fixes #7338

Changes

  • Replaced recursive tree traversal with stack-based approach
  • Add identifiers for AST nodes
  • Use a BTreeMap with node IDs for the result stack

Testing

The implementation includes a new test case that verifies the ability to handle large expressions that would previously cause stack overflow.

Notes

A simpler solution would be to use the decurse crate.

Copy link

github-actions bot commented Mar 4, 2025

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@sylvestre
Copy link
Contributor

Windows isn't happy :)
Did you look at the performance impact of this change? Thanks

@LouisDISPA
Copy link
Contributor Author

Windows isn't happy :) Did you look at the performance impact of this change? Thanks

The algorithm is just wrong sorry, I wanted to avoid adding an id on the nodes but it doesn't work.
For the windows issue I suspect it's related to the 8191 characters limit.

I'll push fixes and update the PR description.

@LouisDISPA LouisDISPA force-pushed the main branch 4 times, most recently from 5a1751b to 2b35b5b Compare March 5, 2025 12:50
Copy link

github-actions bot commented Mar 5, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

1 similar comment
Copy link

github-actions bot commented Mar 5, 2025

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@LouisDISPA
Copy link
Contributor Author

LouisDISPA commented Mar 10, 2025

I did some quick benchmarking by simulating the long input here are the results:

export N=10000
echo $(seq -s" + " $N)1 > tmp.txt
hyperfine -n expr "expr $(cat tmp.txt)" -n main_uu_expr "./main_uu_expr $(cat tmp.txt)" -n new_uu_expr "./new_uu_expr $(cat tmp.txt)" -N --warmup 300 -i
Command Mean [ms] Min [ms] Max [ms] Relative
expr 3.3 ± 0.6 2.8 13.8 1.00
main_uu_expr 5.7 ± 0.4 5.1 7.4 1.73 ± 0.31
new_uu_expr 5.6 ± 0.3 5.1 7.3 1.70 ± 0.31

And with the input that the current implementation on the main branch can't handle:

export N=30000
echo $(seq -s" + " $N)1 > tmp.txt
hyperfine -n expr "expr $(cat tmp.txt)" -n main_uu_expr "./main_uu_expr $(cat tmp.txt)" -n new_uu_expr "./new_uu_expr $(cat tmp.txt)" -N --warmup 300 -i
Command Mean [ms] Min [ms] Max [ms] Relative
expr 7.1 ± 0.4 6.6 11.0 1.00
main_uu_expr (ignoring error) 10.1 ± 0.4 9.6 11.9 1.43 ± 0.10
new_uu_expr 14.3 ± 0.5 13.4 16.1 2.03 ± 0.14

Tested on a MacBook Air M1 16Go macos 14.5

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/misc/stdbuf (passes in this run but fails in the 'main' branch)

@RenjiSann
Copy link
Collaborator

What does the " (ignoring error)" mean in your second table ?

@LouisDISPA
Copy link
Contributor Author

What does the " (ignoring error)" mean in your second table ?

It's relate to the -i option of hyperfine:

  -i, --ignore-failure
          Ignore non-zero exit codes of the benchmarked programs

The command runs but the exit code is a failure because of the stack overflow.
I find interesting to see the benchmark of it, it should be mostly the parsing time with a bit of function calls until the overflow.

@RenjiSann
Copy link
Collaborator

It sounds strange to me, because when I tried, the segfault happened instantly, not after 10 seconds 🤔

@LouisDISPA
Copy link
Contributor Author

It sounds strange to me, because when I tried, the segfault happened instantly, not after 10 seconds 🤔

Yes the benchmarks are in milliseconds. For the case when there is no stack overflow, it looks like my implementation is on par with the current one. I can do more benchmarks with different inputs if you want, but it's a bit hard to have a good benchmarks. When you go under the 1ms it's not reliable.

@@ -5,6 +5,11 @@

// spell-checker:ignore (ToDO) ints paren prec multibytes

use std::{
collections::BTreeMap,
sync::atomic::{AtomicU16, Ordering},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason for using atomics here ?
I'm not very familiar with it, I thought it was only useful in threaded environments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way for the compiler to assert that the function fn get_next_id() -> u16 isn't used by another thread. The simplest solution is to use an atomic value.

Another solution would be to use a thread local variable but it would be a bit more verbose. With thread locals every thread has a unique counter and it would break the code if the parser becomes multi-threaded.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the parser becomes multi-threaded

I don't think this is likely to happen.
After all, this is "just" a simple CLI arithmetic expression evaluator.
Making it multi-threaded seems a bit overkill to me, like if you really need that level of performance, use a dedicated tool, rather than a command from the coreutils.

In the single-threaded case, using an atomic add is just gonna make the process a tiny bit slower for no real reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes make sense, I moved the implementation to use thread locals.

@RenjiSann
Copy link
Collaborator

RenjiSann commented Mar 12, 2025

Yes the benchmarks are in milliseconds

Duh, didn't see it 😭

Anyway, this Pr is about fixing a SEGFAULT, not improving the performance, so it's OK if you manage to keep the same performance without immproving it.

Copy link
Collaborator

@RenjiSann RenjiSann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to say on the logic part.
I'd rather remove the use of atomics though, as the tool is not likely to get multi-threading support.

Apart from that, I think you could stash your commits in only 2 (changes to the code, and to the test), so we keep the main branch clean from implementation details.

Good job !

Test a stack overflow that was happening on linux for long inputs.
@LouisDISPA
Copy link
Contributor Author

Nothing to say on the logic part. I'd rather remove the use of atomics though, as the tool is not likely to get multi-threading support.

Apart from that, I think you could stash your commits in only 2 (changes to the code, and to the test), so we keep the main branch clean from implementation details.

Good job !

  • Moved the ID generation to thread locals
  • Squashed the PR into 2 commits (test, impl)

Fix a stack overflow happening on long inputs
Copy link

GNU testsuite comparison:

GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

Copy link

GNU testsuite comparison:

GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

1 similar comment
Copy link

GNU testsuite comparison:

GNU test failed: tests/misc/tee. tests/misc/tee is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@RenjiSann RenjiSann merged commit 56c3553 into uutils:main Mar 13, 2025
65 of 66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

expr: Segmentation fault on big inputs
3 participants