Skip to content

C++: Use the new dataflow library in cpp/missing-check-scanf #12818

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2023

Conversation

MathiasVP
Copy link
Contributor

This PR rewrites the cpp/missing-check-scanf query to use the new use-use based dataflow library.

When we introduced the query #10163 it was deemed too difficult to use dataflow for the query since we couldn't properly mark one scanf as being a sanitizer of another call to scanf. However, this is now completely trivial with use-use flow. That makes the query a perfect fit for dataflow now 🎉

I've also changed the query to only raise an alert if a variable isn't initialized prior to the scanf call. This removes a very large number of results, and I think this is for the better.

I'm running a difference query on MRVA now to see the impact of this change.

@github-actions github-actions bot added the C++ label Apr 13, 2023
@MathiasVP MathiasVP force-pushed the dataflow-for-missing-scanf-qery branch from 3d8d3f6 to bef7b00 Compare April 13, 2023 13:15
@MathiasVP MathiasVP force-pushed the dataflow-for-missing-scanf-qery branch from bef7b00 to 0db05fe Compare April 13, 2023 13:51
@MathiasVP MathiasVP marked this pull request as ready for review April 13, 2023 16:42
@MathiasVP MathiasVP requested a review from a team as a code owner April 13, 2023 16:42
@MathiasVP MathiasVP added the no-change-note-required This PR does not need a change note label Apr 13, 2023
@geoffw0
Copy link
Contributor

geoffw0 commented Apr 17, 2023

The DCA run claims there are 10,258 fixed alerts for cpp/missing-check-scanf. I don't remember the query ever having that many results! Only 32 are in source files though...

@MathiasVP
Copy link
Contributor Author

The DCA run claims there are 10,258 fixed alerts for cpp/missing-check-scanf. I don't remember the query ever having that many results! Only 32 are in source files though...

Indeed. It's also by far the query with the most number of results on CodeQL-data. I think one of the problems is that the original query raised an issue on any use of the variable that was identified as being problematic. However, by disabling flow out of sinks (i.e., using isSanitizerOut) we only raise an alert on the first use of the variable. This should hopefully provide a better UX.

@geoffw0
Copy link
Contributor

geoffw0 commented Apr 17, 2023

I'm having difficulty interpreting the DCA query differences to be honest, I'll see if things look a bit clearer on MRVA instead.

Copy link
Contributor

@geoffw0 geoffw0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QL looks fine.

On my (small) MRVA run I saw a reduction in duplicate results along the lines of:

scanf(..., x);
a = x; // result
b = x; // duplicate result

👍

This PR rewrites the cpp/missing-check-scanf query to use the new use-use based dataflow library.

I'm a bit disappointed that there is still so much manually written dataflow (the fwdFlow and revFlow predicates in particular) that the data flow libraries are apparently unable to do for us. I had hoped to see the whole query reduced to one or two standard looking dataflow configs.

I guess we don't actually want interprocedural dataflow in this case?

* Holds if `instr` is control-flow reachable in 0 or more steps from
* a call to a `scanf`-like function.
*/
/** Holds if `n` reaches an argument to a call to a `scanf`-like function. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Holds if `n` reaches an argument to a call to a `scanf`-like function. */
/** Holds if `n` reaches an argument to a call to a `scanf`-like function. */

@MathiasVP
Copy link
Contributor Author

We could convert it to interprocedural dataflow quite easily, yeah. That would get rid of those forwards and backwards localflow predicates. I just wasn't sure if we wanted to do that.

@geoffw0
Copy link
Contributor

geoffw0 commented Apr 17, 2023

I don't think interprocedural results will be interesting. I think this is a mistake people make locally and interprocedural results will more likely be a source of false positives.

I guess I'd look to use DataFlow::localFlow for a local flow problem ... but then that doesn't support sanitizers (or flow paths).

@MathiasVP
Copy link
Contributor Author

I guess I'd look to use DataFlow::localFlow for a local flow problem ... but then that doesn't support sanitizers (or flow paths).

Exactly. This PR does use Dataflow::localFlowStep, but performs the transitive closure manually to get sanitization

@MathiasVP MathiasVP merged commit d975ceb into github:main Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants