Actions: Improve Bash parsing performance on command and string interpolations #19701

adityasharad · 2025-06-09T16:40:23Z

See linked internal issues for details. This set of changes should improve the performance of the Actions Bash script parsing and make it slightly more readable.

The bottlenecks:

In the Bash parser, we compute a mostly-unique ID for each command substitution and each quoted string within a shell script block.
Commands and quoted strings are then ranked and referred to individually by their rank. This allows us to reference command/string expressions individually within a shell script block, and so compute an ordered set of Bash statements in the block.
However, ranking by their original text ends up evaluating multiple possible orderings, which is slow on workflows that contain multiple complex command substitutions. Customers provided examples of repeated commands containing JSON payloads. For command substitutions this didn't affect performance too badly, but the quoted string computation blew up the size of the database string pool and led to out-of-disk errors during query evaluation.

Key changes:

Refactor the logic around identifying Bash command substitutions, to use more intuitive predicate and variable names.
Change the logic around ranking Bash commands. Rank commands by their ID, not by their source text. I think this was the original intent of the code.
Refactor the logic around identifying Bash quoted strings, to use more intuitive predicate and variable names.
Change the logic around ranking Bash quoted strings. Rank quoted strings by their ID, not by their source text. I think this was the original intent of the code.
Add a test case (with no expected results) similar to the customer reports, to stress-test this in future.

Not changed:

I have not attempted a more thorough rewrite, although I think we could greatly simplify this by using datatypes for the parsed commands and strings, rather than constructing string IDs for them.

Tagging @alexet @asgerf for having provided advice in this area, and @cklin to consider if we want this in the upcoming release.

Extract helper predicates for `$(...)` command interpolation and backtick-quoted commands. Add some doc comments and meaningful variable names.

In the Bash parser, we compute a mostly-unique ID for each command substitution within a shell script block. Commands are then ranked and referred to individually. Avoid a performance bottleneck by ranking commands by their ID, not by their source text. I think this was the original intent of the code. Ranking by their original text ends up evaluating multiple possible orderings, which is slow on workflows that contain multiple complex command substitutions.

Add some doc comments and meaningful variable names.

In the Bash parser, we compute a mostly-unique ID for each quoted string within a shell script block. Quoted strings are then ranked and referred to individually. Avoid a performance bottleneck by ranking quoted strings by their ID, not by their source text. I think this was the original intent of the code. Ranking by their original text ends up evaluating multiple possible orderings, which is slow on workflows that contain multiple complex quoted strings, such as JSON payloads.

Anonymised version of a customer report that led to performance bottlenecks in Bash parsing. No results are expected from both query and library tests.

Copilot

Pull Request Overview

This PR refactors the Bash parser in CodeQL to improve performance by using unique IDs for command substitutions and quoted strings, and adds a stress-test workflow.

Refactor command substitution predicates for clearer naming and ID-based ranking
Refactor quoted string predicates similarly and rank by ID
Add a new workflow test (interpolation.yml) to exercise complex interpolations

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
actions/ql/test/query-tests/Security/CWE-094/.github/workflows/interpolation.yml	New workflow to stress-test Bash interpolation parsing
actions/ql/lib/codeql/actions/Bash.qll	Refactored predicates and ranking logic for commands and strings

Comments suppressed due to low confidence (3)

actions/ql/lib/codeql/actions/Bash.qll:38

[nitpick] The TODO comment indicates trimming could be folded into the regex. Consider updating the regex to capture only the inner command or removing the extra trim step to simplify maintenance.

.regexpReplaceAll("^\\$\\(", "")

actions/ql/lib/codeql/actions/Bash.qll:140

[nitpick] The parameter name i is ambiguous here. Renaming it to something like rankIndex would make its role clearer in the ranking logic.

private predicate rankedQuotedStringReplacements(int i, string quotedString, string quotedStringId) {

actions/ql/lib/codeql/actions/Bash.qll:104

[nitpick] This complex regex is used directly in the predicate. Extracting it into a well-named regex constant would improve readability and reuse.

quotedStr = this.cmdSubstitutedLineProducer(lineIndex).regexpFind("\"((?:[^\"\\]|\\.)*)\"", occurrenceIndex, occurrenceOffset) and

adityasharad · 2025-06-10T04:41:15Z

DCA results look fine. No changes to alerts, and timings are comparable. Note the sample size is small (just 5).

asgerf

LGTM

adityasharad added 5 commits June 9, 2025 08:37

Actions: Refactor logic for identifying command substitution

39e710e

Extract helper predicates for `$(...)` command interpolation and backtick-quoted commands. Add some doc comments and meaningful variable names.

Actions: Refactor logic for identifying quoted strings

fbe11cf

Add some doc comments and meaningful variable names.

Actions: Add stress test for complex command and string interpolation

2ecbecb

Anonymised version of a customer report that led to performance bottlenecks in Bash parsing. No results are expected from both query and library tests.

Copilot AI review requested due to automatic review settings June 9, 2025 16:40

adityasharad requested a review from a team as a code owner June 9, 2025 16:40

github-actions bot added the Actions Analysis of GitHub Actions label Jun 9, 2025

Copilot AI reviewed Jun 9, 2025

View reviewed changes

github-actions bot added the documentation label Jun 9, 2025

Actions: Add change note for Bash parsing fixes

e48a7da

adityasharad force-pushed the actions/bash-parsing-ranking-performance branch from 3ec0624 to e48a7da Compare June 9, 2025 16:56

asgerf approved these changes Jun 10, 2025

View reviewed changes

adityasharad merged commit d659d40 into github:main Jun 10, 2025
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Actions: Improve Bash parsing performance on command and string interpolations #19701

Actions: Improve Bash parsing performance on command and string interpolations #19701

Uh oh!

adityasharad commented Jun 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

adityasharad commented Jun 10, 2025

Uh oh!

asgerf left a comment

Uh oh!

Uh oh!

Uh oh!

Actions: Improve Bash parsing performance on command and string interpolations #19701

Actions: Improve Bash parsing performance on command and string interpolations #19701

Uh oh!

Conversation

adityasharad commented Jun 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

adityasharad commented Jun 10, 2025

Uh oh!

asgerf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!