-
Notifications
You must be signed in to change notification settings - Fork 1.7k
How to write a cross-function isAdditionalFlowStep while preserving context sensitive dataflow. #19308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
That's an interesting question. I think with |
So how CodeQL implements its contex sensitive feature? Can I just "tell" CodeQL this step is a call? As far as I know, |
CodeQL roughly performs local dataflow within function bodies, and in addition has a call graph. When performing global dataflow between functions it uses the call graph to resolve function calls and jump from one function body to another while keeping track of the context. I don't know enough of the internals to explain how the context is kept. Normally one just instantiates the dataflow library with a local flow predicate and a call graph, and let the library work its "magic". One thing you could try is to extend the call graph so that the dataflow library knows how to resolve the
Yes, that is why I had my doubts. I think it could work if you want to implement some special handling of a couple of calls. Like you said, I don't think it would work for the general case. |
Thank you! I managed to implement a context sensitive taint tracking by extending import python
import semmle.python.ApiGraphs
import semmle.python.dataflow.new.DataFlow
import semmle.python.dataflow.new.TaintTracking
import semmle.python.dataflow.new.internal.DataFlowDispatch
import semmle.python.dataflow.new.internal.DataFlowPublic
module TConfig implements DataFlow::ConfigSig{
predicate isSource(DataFlow::Node source) {
source = API::builtin("open").getReturn().asSource()
}
predicate isSink(DataFlow::Node sink) {
sink = API::moduleImport("json").getMember("dump").getACall().getArg(0)
}
}
class MyRpcCall extends ExtractedDataFlowCall{
CallNode call;
Function target;
CallType type;
MyRpcCall() {
this = TPotentialLibraryCall(call) and
call.getLocation().getFile() = target.getLocation().getFile() and
exists(FunctionObject rpc_func|
call.getFunction() = rpc_func.theCallable().getAReference() and
rpc_func.getName() = "mock_rpc_call" and
target.getName() = call.getArg(0).getNode().(StringLiteral).getS() and
type = CallTypePlainFunction()
)
}
override string toString() {
result = call.getNode().toString()
}
override ControlFlowNode getNode() { result = call }
override Scope getScope() { result = call.getScope() }
override DataFlowCallable getCallable() { result.(DataFlowFunction).getScope() = target }
override ArgumentNode getArgument(ArgumentPosition apos) {
(getCallArg(call, target, type, result, apos) and not apos.isPositional(_)) or
exists(int index |
apos.isPositional(index) and
result.asCfgNode() = call.getArg(index + 1)
)
}
CallType getCallType() { result = type }
}
module TFlow = TaintTracking::Global<TConfig>;
import TFlow::PathGraph
from TFlow::PathNode source, TFlow::PathNode sink
where
TFlow::flowPath(source, sink)
select
source.getNode(), source, sink, "$@", sink.getNode().getLocation(), "test" But one more thing: When I "construct" the class |
I think that is fine for your experiment. I don't think there is currently a recommended API for extending the call graph. |
Is there any side effects if I use // =============================================================================
// DataFlowCall
// =============================================================================
newtype TDataFlowCall =
TNormalCall(CallNode call, Function target, CallType type) { resolveCall(call, target, type) } or
/** A call to the generated function inside a comprehension */
TComprehensionCall(Comp c) or
TPotentialLibraryCall(CallNode call) or
/** A synthesized call inside a summarized callable */
TSummaryCall(
FlowSummaryImpl::Public::SummarizedCallable c, FlowSummaryImpl::Private::SummaryNode receiver
) {
FlowSummaryImpl::Private::summaryCallbackRange(c, receiver)
- }
+ } or
+ TUserExtensions() |
I wonder if the right approach here would be to properly model |
For example:
This works as expected.
json_file
only have a flow tojson.dump(json_obj, out)
.But when it comes to
It will return 4 flow, and 3 of them is context insensitive.
json_file
will have flows tojson.dump(json_obj, out)
andjson.dump(json_obj2, out)
.The text was updated successfully, but these errors were encountered: