Skip to content

Actions: Diff-informed queries: phase 3 (non-trivial locations) #20072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 15, 2025

Conversation

d10c
Copy link
Contributor

@d10c d10c commented Jul 17, 2025

This PR enables diff-informed mode on queries that select a location other than dataflow source or sink. This entails adding a non-trivial location override that returns the locations that are actually selected.

Prior work includes PRs like #19663, #19759, and #19817. This PR uses the same patch script as those PRs to find candidate queries to convert to diff-enabled. This is the final step in mass-enabling diff-informed queries on all the languages.

Commit-by-commit reviewing is recommended.

  • I have split the commits that add/modify tests from the ones that enable/disable diff-informed queries.
  • If the commit modifies a .qll file, in the commit message I've included links to the queries that depend on that .qll for easier reviewing.
  • Feel free to delegate parts of the review to others who may be more specialized in certain languages.

@github-actions github-actions bot added the Actions Analysis of GitHub Actions label Jul 17, 2025
@d10c d10c added the no-change-note-required This PR does not need a change note label Jul 17, 2025
@d10c d10c requested a review from michaelnebel July 17, 2025 13:31
@d10c d10c marked this pull request as ready for review July 17, 2025 13:31
@Copilot Copilot AI review requested due to automatic review settings July 17, 2025 13:31
@d10c d10c requested a review from a team as a code owner July 17, 2025 13:31
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables diff-informed mode on security queries in the Actions language by adding location override methods to return the actual selected locations rather than just dataflow sources or sinks. This represents the final phase of mass-enabling diff-informed queries across all languages.

  • Adds observeDiffInformedIncrementalMode(), getASelectedSourceLocation(), and getASelectedSinkLocation() methods to multiple security query configurations
  • Incorporates existing "where clause" logic from corresponding Critical.ql files into the location selection methods
  • Adds necessary imports for ControlChecks and other dependencies to support the new functionality

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
EnvVarInjectionQuery.qll Adds diff-informed mode support with complex sink location logic for environment variable injection detection
EnvPathInjectionQuery.qll Enables diff-informed mode with PATH environment variable specific location selection
CommandInjectionQuery.qll Implements diff-informed mode for command injection detection with privileged context checks
CodeInjectionQuery.qll Adds diff-informed support with logic for both code injection and cache poisoning scenarios
ArtifactPoisoningQuery.qll Enables diff-informed mode for artifact poisoning detection with control check validation
ArgumentInjectionQuery.qll Implements diff-informed mode for argument injection detection with event-based location selection

Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
Added some questions and suggestions.
As far as I can tell from the implementation in the shared data flow library, it is ok that the getASelected[Source|Sink]Location predicates are over-approximations in the sense that they are allowed to "map" more source/sink nodes to locations than might be used by the query.

Location getASelectedSinkLocation(DataFlow::Node sink) {
result = sink.getLocation()
or
exists(Event event | result = event.getLocation() |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases where we copy actual business logic into the location selection predicates, we should consider to re-factor the code to enable sharing and discoverability. If someone changes the logic of the query, then it will be easy to forget to check whether location selection predicates need to be changed as well.

Maybe consider introducing a helper predicate (above the config module)

Event getRelevantEvent(DataFlow::Node node) {
  inPrivilegedContext(node.asExpr(), result) and
  not exists(ControlCheck check | check.protects(node.asExpr(), result, "argument-injection"))
}

Then it is possible to write the where clause of the query as

ArgumentInjectionFlow::flowPath(source, sink) and
event = getRelevantEvent(sink.getNode())

and then one can declare

Location getASelectedSinkLocation(DataFlow::Node sink) {
  result = sink.getLocation()
  or
  result = getRelevantEvent(sink).getLocation()
}


Location getASelectedSourceLocation(DataFlow::Node source) { none() }

Location getASelectedSinkLocation(DataFlow::Node sink) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Comment on lines 49 to 60
exists(Event event, RemoteFlowSource source | result = event.getLocation() |
inPrivilegedContext(sink.asExpr(), event) and
isSource(source) and
source.getEventName() = event.getName() and
not exists(ControlCheck check | check.protects(sink.asExpr(), event, "code-injection")) and
// exclude cases where the sink is a JS script and the expression uses toJson
not exists(UsesStep script |
script.getCallee() = "actions/github-script" and
script.getArgumentExpr("script") = sink.asExpr() and
exists(getAToJsonReferenceExpression(sink.asExpr().(Expression).getExpression(), _))
)
)
Copy link
Contributor

@michaelnebel michaelnebel Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this. The part of the where clauses this appears to mirror is

  inPrivilegedContext(sink.getNode().asExpr(), event) and
  source.getNode().(RemoteFlowSource).getEventName() = event.getName() and
  not exists(ControlCheck check | check.protects(sink.getNode().asExpr(), event, "code-injection")) and
  // exclude cases where the sink is a JS script and the expression uses toJson
  not exists(UsesStep script |
    script.getCallee() = "actions/github-script" and
    script.getArgumentExpr("script") = sink.getNode().asExpr() and
    exists(getAToJsonReferenceExpression(sink.getNode().asExpr().(Expression).getExpression(), _))
  )

and it looks like maybe we should perceive the location of an event as being related to the source node location (instead of the sink)?

Location getASelectedSourceLocation(DataFlow::Node source) {
  exists(Event event | result = event.getLocation() |
    source.(RemoteFlowSource).getEventName() = event.getName()
  )
}

This yields an over-approximation of the mapping of locations from source nodes.

Copy link
Contributor Author

@d10c d10c Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change I made from the original where-clause is to replace source.getNode().(RemoteFlowSource) with a RemoteFlowSource source where isSource(source). In this case event is related to both source and sink, so whichever one we use to determine event it will be an over-approximation. There are more ties to sink, so I chose to put it in getASelectedSinkLocation. I could also do it the way you suggest, but I don't know enough about the query to know which one will yield a tighter approximation.

Copy link
Contributor

@michaelnebel michaelnebel Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, then I understand. In that case, I am a bit worried that the current implementation might yield an under-approximation (also note that I am not familiar with the actions queries or libraries). In the where clause in the query, only the sources where there is flow to the sink are used under the negation. If we consider all remote flow sources (for each sink - no matter if there is flow) we might risk filtering out more than we want.
The two options I see are

  • Use the getASelectedSourceLocation as above.
  • Use the inPrivilegedContext predicate in getASelectedSinkLocation to get event(s) and use their locations as possible selected sink locations (and remove all the negation stuff).

I don't know which of the two that will yield the tightest approximation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since in the ArgumentInjection and ArtifactPoisoning case above we are fine with having sink under negation, that should be fine in this case too, right? Or should I avoid the negation in those cases too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In those cases, the negation should be fine as the negation is on the sink node itself.
The reason why I am worried about the negation on sources is because it basically says something like
\forall sources. \not p(source, ...)
where the sources are not limited to a specific set of sources that satisfies flow(source, sink)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so if we ignore all the sources, and only limit it to the clauses regarding sink, then I guess it should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is the best (over)approximation I can come up with. Alternatively, the query should not be diff-informed.

event.isExternallyTriggerable() and
// the checkout is not controlled by an access check
isSource(source) and
not exists(ControlCheck check | check.protects(source.asExpr(), event, "code-injection")) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look equivalent to the where clause in the query as "any" source is used (and not only the one from the flowpath), which could produce a smaller set of locations as the source is used under a negation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I suppose getting rid of the not exists(…) would be fine then? In which case, we would get an over-approximation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems plausible!

result = sink.getLocation()
or
// where clause from CommandInjectionCritical.ql
exists(Event event | result = event.getLocation() |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe re-factor to avoid code duplication.

// where clause from EnvPathInjectionCritical.ql
exists(Event event, RemoteFlowSource source | result = event.getLocation() |
inPrivilegedContext(sink.asExpr(), event) and
isSource(source) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment wrt. using "any" source and not only those from flow - we might get an under-approximation?

// where clause from EnvVarInjectionCritical.ql
exists(Event event, RemoteFlowSource source | result = event.getLocation() |
inPrivilegedContext(sink.asExpr(), event) and
isSource(source) and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment wrt. under-approximation.

d10c added 6 commits August 15, 2025 11:10
@d10c d10c force-pushed the d10c/diff-informed-phase-3-actions branch from 3f931cd to 126d24a Compare August 15, 2025 09:18
@d10c
Copy link
Contributor Author

d10c commented Aug 15, 2025

Addressed the review comments in the latest force-push. Starting a DCA run.

Copy link
Contributor

@michaelnebel michaelnebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@d10c d10c merged commit bb9daa0 into github:main Aug 15, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Actions Analysis of GitHub Actions no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants