-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Actions: Diff-informed queries: phase 3 (non-trivial locations) #20072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables diff-informed mode on security queries in the Actions language by adding location override methods to return the actual selected locations rather than just dataflow sources or sinks. This represents the final phase of mass-enabling diff-informed queries across all languages.
- Adds
observeDiffInformedIncrementalMode()
,getASelectedSourceLocation()
, andgetASelectedSinkLocation()
methods to multiple security query configurations - Incorporates existing "where clause" logic from corresponding Critical.ql files into the location selection methods
- Adds necessary imports for ControlChecks and other dependencies to support the new functionality
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
EnvVarInjectionQuery.qll | Adds diff-informed mode support with complex sink location logic for environment variable injection detection |
EnvPathInjectionQuery.qll | Enables diff-informed mode with PATH environment variable specific location selection |
CommandInjectionQuery.qll | Implements diff-informed mode for command injection detection with privileged context checks |
CodeInjectionQuery.qll | Adds diff-informed support with logic for both code injection and cache poisoning scenarios |
ArtifactPoisoningQuery.qll | Enables diff-informed mode for artifact poisoning detection with control check validation |
ArgumentInjectionQuery.qll | Implements diff-informed mode for argument injection detection with event-based location selection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Added some questions and suggestions.
As far as I can tell from the implementation in the shared data flow library, it is ok that the getASelected[Source|Sink]Location
predicates are over-approximations in the sense that they are allowed to "map" more source/sink nodes to locations than might be used by the query.
Location getASelectedSinkLocation(DataFlow::Node sink) { | ||
result = sink.getLocation() | ||
or | ||
exists(Event event | result = event.getLocation() | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cases where we copy actual business logic into the location selection predicates, we should consider to re-factor the code to enable sharing and discoverability. If someone changes the logic of the query, then it will be easy to forget to check whether location selection predicates need to be changed as well.
Maybe consider introducing a helper predicate (above the config module)
Event getRelevantEvent(DataFlow::Node node) {
inPrivilegedContext(node.asExpr(), result) and
not exists(ControlCheck check | check.protects(node.asExpr(), result, "argument-injection"))
}
Then it is possible to write the where clause of the query as
ArgumentInjectionFlow::flowPath(source, sink) and
event = getRelevantEvent(sink.getNode())
and then one can declare
Location getASelectedSinkLocation(DataFlow::Node sink) {
result = sink.getLocation()
or
result = getRelevantEvent(sink).getLocation()
}
|
||
Location getASelectedSourceLocation(DataFlow::Node source) { none() } | ||
|
||
Location getASelectedSinkLocation(DataFlow::Node sink) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above.
exists(Event event, RemoteFlowSource source | result = event.getLocation() | | ||
inPrivilegedContext(sink.asExpr(), event) and | ||
isSource(source) and | ||
source.getEventName() = event.getName() and | ||
not exists(ControlCheck check | check.protects(sink.asExpr(), event, "code-injection")) and | ||
// exclude cases where the sink is a JS script and the expression uses toJson | ||
not exists(UsesStep script | | ||
script.getCallee() = "actions/github-script" and | ||
script.getArgumentExpr("script") = sink.asExpr() and | ||
exists(getAToJsonReferenceExpression(sink.asExpr().(Expression).getExpression(), _)) | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand this. The part of the where clauses this appears to mirror is
inPrivilegedContext(sink.getNode().asExpr(), event) and
source.getNode().(RemoteFlowSource).getEventName() = event.getName() and
not exists(ControlCheck check | check.protects(sink.getNode().asExpr(), event, "code-injection")) and
// exclude cases where the sink is a JS script and the expression uses toJson
not exists(UsesStep script |
script.getCallee() = "actions/github-script" and
script.getArgumentExpr("script") = sink.getNode().asExpr() and
exists(getAToJsonReferenceExpression(sink.getNode().asExpr().(Expression).getExpression(), _))
)
and it looks like maybe we should perceive the location of an event as being related to the source node location (instead of the sink)?
Location getASelectedSourceLocation(DataFlow::Node source) {
exists(Event event | result = event.getLocation() |
source.(RemoteFlowSource).getEventName() = event.getName()
)
}
This yields an over-approximation of the mapping of locations from source nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only change I made from the original where-clause is to replace source.getNode().(RemoteFlowSource)
with a RemoteFlowSource source
where isSource(source)
. In this case event
is related to both source
and sink
, so whichever one we use to determine event
it will be an over-approximation. There are more ties to sink
, so I chose to put it in getASelectedSinkLocation
. I could also do it the way you suggest, but I don't know enough about the query to know which one will yield a tighter approximation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, then I understand. In that case, I am a bit worried that the current implementation might yield an under-approximation (also note that I am not familiar with the actions queries or libraries). In the where clause in the query, only the sources where there is flow to the sink are used under the negation. If we consider all remote flow sources (for each sink - no matter if there is flow) we might risk filtering out more than we want.
The two options I see are
- Use the
getASelectedSourceLocation
as above. - Use the
inPrivilegedContext
predicate ingetASelectedSinkLocation
to get event(s) and use their locations as possible selected sink locations (and remove all the negation stuff).
I don't know which of the two that will yield the tightest approximation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since in the ArgumentInjection and ArtifactPoisoning case above we are fine with having sink
under negation, that should be fine in this case too, right? Or should I avoid the negation in those cases too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In those cases, the negation should be fine as the negation is on the sink node itself.
The reason why I am worried about the negation on sources is because it basically says something like
\forall sources. \not p(source, ...)
where the sources are not limited to a specific set of sources that satisfies flow(source, sink)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so if we ignore all the sources, and only limit it to the clauses regarding sink, then I guess it should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is the best (over)approximation I can come up with. Alternatively, the query should not be diff-informed.
event.isExternallyTriggerable() and | ||
// the checkout is not controlled by an access check | ||
isSource(source) and | ||
not exists(ControlCheck check | check.protects(source.asExpr(), event, "code-injection")) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look equivalent to the where clause in the query as "any" source is used (and not only the one from the flowpath), which could produce a smaller set of locations as the source is used under a negation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I suppose getting rid of the not exists(…)
would be fine then? In which case, we would get an over-approximation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that seems plausible!
result = sink.getLocation() | ||
or | ||
// where clause from CommandInjectionCritical.ql | ||
exists(Event event | result = event.getLocation() | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe re-factor to avoid code duplication.
// where clause from EnvPathInjectionCritical.ql | ||
exists(Event event, RemoteFlowSource source | result = event.getLocation() | | ||
inPrivilegedContext(sink.asExpr(), event) and | ||
isSource(source) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment wrt. using "any" source and not only those from flow - we might get an under-approximation?
// where clause from EnvVarInjectionCritical.ql | ||
exists(Event event, RemoteFlowSource source | result = event.getLocation() | | ||
inPrivilegedContext(sink.asExpr(), event) and | ||
isSource(source) and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment wrt. under-approximation.
3f931cd
to
126d24a
Compare
Addressed the review comments in the latest force-push. Starting a DCA run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent!
This PR enables diff-informed mode on queries that select a location other than dataflow source or sink. This entails adding a non-trivial location override that returns the locations that are actually selected.
Prior work includes PRs like #19663, #19759, and #19817. This PR uses the same patch script as those PRs to find candidate queries to convert to diff-enabled. This is the final step in mass-enabling diff-informed queries on all the languages.
Commit-by-commit reviewing is recommended.