-
Notifications
You must be signed in to change notification settings - Fork 1.8k
C++: Use range analysis in Overflow.qll #5667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Use range analysis in Overflow.qll #5667
Conversation
af69f4f
to
d145799
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
The "Check change note" check failed, reminding us that this PR should have a brief change note, explaining that there should be fewer false positive results in the three queries you mentioned. |
Fixed in aa52585. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes and test changes LGTM.
I assume convertedExprMightOverflowNegatively
is cached and there's a minimal effect on performance? I see about a 10% slowdown locally but that's in the context of a cleared cache.
I tried these on a samate snapshot and we lose some results:
cpp/uncontrolled-arithmetic
2709 -> 2080 resultscpp/arithmetic-with-extreme-values
1770 -> 1606 resultscpp/tainted-arithmetic
4281 -> 3067 results
These appear to be [mostly?] results in 'bad' cases, i.e. they're cases we should flag as far as SAMATE is concerned. This might be a price we're prepared to pay if we can actually use the queries in code scanning, but it would be good to know what's going on here.
I was confused by that as well. Looking at the results for As an example of this, consider this one which we flag on main as both an overflow and an underflow: void CWE190_Integer_Overflow__char_rand_multiply_01_bad()
{
char data;
data = ' ';
/* POTENTIAL FLAW: Use a random value */
data = (char)RAND32();
if(data > 0) /* ensure we won't have an underflow */
{
/* POTENTIAL FLAW: if (data*2) > CHAR_MAX, this will overflow */
char result = data * 2;
printHexCharLine(result);
}
} and with this PR we flag it as a potential overflow only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that we sometimes flag the same result as both a possible underflow and an overflow. However, by adding range analysis we exclude the false result and keep the one that's actually relevant.
Good point. I'm struggling to confirm that's what's happening though. I'll have another look in a bit, failing that I'll run the old SAMATE job on this PR and we'll have a clear answer tomorrow.
Co-authored-by: Geoffrey White <40627776+geoffw0@users.noreply.github.com>
SimpleRangeAnalysis doesn't seem to cope with some of the multiplications very well. Take this one:
It isn't considered multiplication by a constant, and the |
We ruled out general multiplication in range analysis because it involves too many recursive calls (no negative recursion, fortunately). See here for the discussion regarding this. You're right in that this is a problem. I actually thought |
I don't think we'd quite need a general case, just the case of positive number * positive number multiplication. This would use
Sounds good to me. |
… range analysis fails to deduce a bound.
More or less, yes. It leads to non-linear recursion (which is tolerable but a bit slow) and sometimes an explosion in the number of candidate bounds. Because of how abstract interpretation in QL works (the only lattice is the powerset lattice), we store candidate bounds until the end of the main range-analysis recursion, even if they're no longer relevant. |
I'm happy with the PR as it is now (results are fine), but I'd prefer to have a more principled solution using #5678 (if that works out). |
I agree. I think it's fine to merge this one as-is, though. How about I create a backlog issue that says something along the lines of "Fixup bb447d7 if we end up merging #5678"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
(Part of https://github.com/github/codeql-c-analysis-team/issues/272.)
Three of the queries in
Security/CEW/CWE-190
use thesemmle.code.cpp.security.Overflow
library to detect missing overflow/underflow guards. However, that library doesn't use the range analysis library to rule out non-interesting operations. This seems like low-hanging fruit.This PR adds range analysis to this library. This should improve the false-positivity rate of the queries:
cpp/tainted-arithmetic
cpp/uncontrolled-arithmetic
cpp/arithmetic-with-extreme-values