-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Actions: Improve Bash parsing performance on command and string interpolations #19701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Actions: Improve Bash parsing performance on command and string interpolations #19701
Conversation
Extract helper predicates for `$(...)` command interpolation and backtick-quoted commands. Add some doc comments and meaningful variable names.
In the Bash parser, we compute a mostly-unique ID for each command substitution within a shell script block. Commands are then ranked and referred to individually. Avoid a performance bottleneck by ranking commands by their ID, not by their source text. I think this was the original intent of the code. Ranking by their original text ends up evaluating multiple possible orderings, which is slow on workflows that contain multiple complex command substitutions.
Add some doc comments and meaningful variable names.
In the Bash parser, we compute a mostly-unique ID for each quoted string within a shell script block. Quoted strings are then ranked and referred to individually. Avoid a performance bottleneck by ranking quoted strings by their ID, not by their source text. I think this was the original intent of the code. Ranking by their original text ends up evaluating multiple possible orderings, which is slow on workflows that contain multiple complex quoted strings, such as JSON payloads.
Anonymised version of a customer report that led to performance bottlenecks in Bash parsing. No results are expected from both query and library tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the Bash parser in CodeQL to improve performance by using unique IDs for command substitutions and quoted strings, and adds a stress-test workflow.
- Refactor command substitution predicates for clearer naming and ID-based ranking
- Refactor quoted string predicates similarly and rank by ID
- Add a new workflow test (
interpolation.yml
) to exercise complex interpolations
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
actions/ql/test/query-tests/Security/CWE-094/.github/workflows/interpolation.yml | New workflow to stress-test Bash interpolation parsing |
actions/ql/lib/codeql/actions/Bash.qll | Refactored predicates and ranking logic for commands and strings |
Comments suppressed due to low confidence (3)
actions/ql/lib/codeql/actions/Bash.qll:38
- [nitpick] The TODO comment indicates trimming could be folded into the regex. Consider updating the regex to capture only the inner command or removing the extra trim step to simplify maintenance.
.regexpReplaceAll("^\\$\\(", "")
actions/ql/lib/codeql/actions/Bash.qll:140
- [nitpick] The parameter name
i
is ambiguous here. Renaming it to something likerankIndex
would make its role clearer in the ranking logic.
private predicate rankedQuotedStringReplacements(int i, string quotedString, string quotedStringId) {
actions/ql/lib/codeql/actions/Bash.qll:104
- [nitpick] This complex regex is used directly in the predicate. Extracting it into a well-named regex constant would improve readability and reuse.
quotedStr = this.cmdSubstitutedLineProducer(lineIndex).regexpFind("\"((?:[^\"\\]|\\.)*)\"", occurrenceIndex, occurrenceOffset) and
3ec0624
to
e48a7da
Compare
DCA results look fine. No changes to alerts, and timings are comparable. Note the sample size is small (just 5). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
See linked internal issues for details. This set of changes should improve the performance of the Actions Bash script parsing and make it slightly more readable.
The bottlenecks:
Key changes:
Not changed:
Tagging @alexet @asgerf for having provided advice in this area, and @cklin to consider if we want this in the upcoming release.