-
Notifications
You must be signed in to change notification settings - Fork 1.7k
C++: New Query: missing return-value check for scanf-like functions #10163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
94c43c0
76ef779
c62ae3b
69911d4
ca162a4
170d12b
6158ee1
5c894ae
d8800c0
e39229d
a6a30b3
ad56274
2bd866c
02772ed
7d24d96
e10042b
ce1e4ad
0729e42
38f185b
f5a30c7
f956999
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
int i, j, r; | ||
|
||
r = scanf("%d %d", &i, &j); | ||
|
||
use(i); // BAD: i is not guarded | ||
|
||
if (r >= 1) { | ||
use(i); // GOOD: i is guarded correctly | ||
use(j); // BAD: j is guarded incorrectly | ||
} | ||
|
||
if (r != 2) | ||
return; | ||
|
||
use(j); // GOOD: j is guarded correctly | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
<!DOCTYPE qhelp PUBLIC | ||
"-//Semmle//qhelp//EN" | ||
"qhelp.dtd"> | ||
<qhelp> | ||
|
||
|
||
<overview> | ||
<p> | ||
This query finds calls of <tt>scanf</tt>-like functions with missing or | ||
improper return-value checking. | ||
</p> | ||
<p> | ||
Specifically, the query flags uses of variables that may have been modified by | ||
<tt>scanf</tt> and subsequently are used without being guarded by a correct | ||
return-value check. A proper check is one that ensures that the corresponding | ||
<tt>scanf</tt> has returned (at least) a certain minimum constant. | ||
</p> | ||
<p> | ||
Functions in the <tt>scanf</tt> family return either EOF (a negative value) | ||
in case of IO failure, or the number of items successfully read from the | ||
input. Consequently, a simple check that the return value is truthy (nonzero) | ||
is not enough. | ||
</p> | ||
<warning> | ||
This query has medium precision because, in the current implementation, it | ||
takes a strict stance on unguarded uses of output variables, and flags them | ||
as problematic even if they have already been initialized. | ||
</warning> | ||
</overview> | ||
|
||
<recommendation> | ||
<p> | ||
Ensure that all subsequent uses of <tt>scanf</tt> output arguments occur in a | ||
branch of an <tt>if</tt> statement (or similar), in which it is known that the | ||
corresponding <tt>scanf</tt> call has in fact read all possible items from its | ||
input. This can be done by comparing the return value to a numerical constant. | ||
</p> | ||
</recommendation> | ||
|
||
<example> | ||
<p>This example shows different ways of guarding a <tt>scanf</tt> output: | ||
</p> | ||
<sample src="MissingCheckScanf.cpp" /> | ||
</example> | ||
|
||
<references> | ||
<li>SEI CERT C++ Coding Standard: <a href="https://wiki.sei.cmu.edu/confluence/display/cplusplus/ERR62-CPP.+Detect+errors+when+converting+a+string+to+a+number">ERR62-CPP. Detect errors when converting a string to a number</a>.</li> | ||
<li>SEI CERT C Coding Standard: <a href="https://wiki.sei.cmu.edu/confluence/display/c/ERR33-C.+Detect+and+handle+standard+library+errors">ERR33-C. Detect and handle standard library errors</a>.</li> | ||
<li>cppreference.com: <a href="https://en.cppreference.com/w/c/io/fscanf">scanf, fscanf, sscanf, scanf_s, fscanf_s, sscanf_s</a>.</li> | ||
</references> | ||
</qhelp> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,122 @@ | ||
/** | ||
* @name Missing return-value check for a 'scanf'-like function | ||
* @description Failing to check that a call to 'scanf' actually writes to an | ||
* output variable can lead to unexpected behavior at reading time. | ||
* @kind problem | ||
* @problem.severity warning | ||
* @security-severity 7.5 | ||
* @precision medium | ||
* @id cpp/missing-check-scanf | ||
* @tags security | ||
* correctness | ||
* external/cwe/cwe-252 | ||
* external/cwe/cwe-253 | ||
*/ | ||
|
||
import cpp | ||
import semmle.code.cpp.commons.Scanf | ||
import semmle.code.cpp.controlflow.Guards | ||
import semmle.code.cpp.dataflow.DataFlow | ||
import semmle.code.cpp.ir.IR | ||
import semmle.code.cpp.ir.ValueNumbering | ||
|
||
/** An expression appearing as an output argument to a `scanf`-like call */ | ||
class ScanfOutput extends Expr { | ||
ScanfFunctionCall call; | ||
int varargIndex; | ||
Instruction instr; | ||
ValueNumber valNum; | ||
|
||
ScanfOutput() { | ||
this = call.getOutputArgument(varargIndex).getFullyConverted() and | ||
instr.getConvertedResultExpression() = this and | ||
valueNumber(instr) = valNum | ||
} | ||
|
||
ScanfFunctionCall getCall() { result = call } | ||
|
||
/** | ||
* Returns the smallest possible `scanf` return value that would indicate | ||
* success in writing this output argument. | ||
*/ | ||
int getMinimumGuardConstant() { | ||
result = | ||
varargIndex + 1 - | ||
count(ScanfFormatLiteral f, int n | | ||
// Special case: %n writes to an argument without reading any input. | ||
// It does not increase the count returned by `scanf`. | ||
n <= varargIndex and f.getUse() = call and f.getConversionChar(n) = "n" | ||
) | ||
} | ||
|
||
predicate hasGuardedAccess(Access e, boolean isGuarded) { | ||
e = this.getAnAccess() and | ||
if | ||
exists(int value, int minGuard | minGuard = this.getMinimumGuardConstant() | | ||
e.getBasicBlock() = blockGuardedBy(value, "==", call) and minGuard <= value | ||
or | ||
e.getBasicBlock() = blockGuardedBy(value, "<", call) and minGuard - 1 <= value | ||
or | ||
e.getBasicBlock() = blockGuardedBy(value, "<=", call) and minGuard <= value | ||
) | ||
then isGuarded = true | ||
else isGuarded = false | ||
} | ||
|
||
/** | ||
* Get a subsequent access of the same underlying storage, | ||
* but before it gets reset or reused in another `scanf` call. | ||
*/ | ||
Access getAnAccess() { | ||
exists(Instruction dst | | ||
this.bigStep() = dst and | ||
dst.getAst() = result and | ||
valueNumber(dst) = valNum | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we move this conjunct into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That would be tricky, because while we do want the final Accesses to have the same ValueNumber, we don't want all intermediate instructions to have to have the same ValueNumber too, because then some crucial paths will be severed. Currently, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, on second thought, maybe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Ah, right. Yeah, I do see what you mean here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, I think that's a good idea. Speaking of optimizations: Currently, you're big step is defined something like: predicate bigStep(Instruction from, Instruction to) { source(from) and smallStep*(from, to) and sink(to) } The The trick is to define a unary predicate to avoid producing a lot of unnecessary tuples: // Holds if `i` can be reached starting at a source.
predicate reachedFromSource(Instruction i) {
source(i)
or
exists(Instruction mid | reachedFromSource(mid) | step(mid, i))
}
// Holds if `i` can be reached starting at a source, and can eventually reach a sink.
predicate reachesSink(Instruction i) {
reachedFromSource(i) and
(
sink(i)
or
exists(Instruction mid | reachesSink(mid) | step(i, mid))
)
} Then, you can use this predicate to define a smaller step relation: predicate prunedSmallStep(Instruction from, Instruction to) {
reachesSink(from) and reachesSink(to) and smallStep(from, to)
} and finally, you can rewrite your predicate bigStep(Instruction from, Instruction to) { source(from) and prunedSmallStep*(from, to) and sink(to) } There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have saved the best for last, implementing your proposed optimizations ^. Here are the before-and-after MRVA runs... Let's see how much of a difference it makes :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm... Nope, I'm calling it. It's not worth it. I've had to inline some predicates to avoid the runaway you see in the "After", and still it's not a significant difference. This is my WIP, so it could be improved in the future: 7cc0d71 P.S. MRVA top 100 run on that WIP code: https://github.com/github/semmle-code/actions/runs/2929372003 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The fact that the new code is not timing out on opencv suggest that's it's a worthwhile improvement to make. I've left a comment on your branch. We can also discuss them tomorrow if you want :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, adding the So along with removing the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yep, I think I'm still not completely sure how to read the MRVA run output. I assume the 50 minutes of runtime on |
||
) | ||
} | ||
|
||
private Instruction bigStep() { | ||
result = this.smallStep(instr) | ||
or | ||
exists(Instruction i | i = this.bigStep() | result = this.smallStep(i)) | ||
} | ||
|
||
private Instruction smallStep(Instruction i) { | ||
instr.getASuccessor*() = i and | ||
i.getASuccessor() = result and | ||
not this.isBarrier(result) | ||
} | ||
|
||
private predicate isBarrier(Instruction i) { | ||
valueNumber(i) = valNum and | ||
exists(Expr e | i.getAst() = e | | ||
i = any(StoreInstruction s).getDestinationAddress() | ||
or | ||
[e, e.getParent().(AddressOfExpr)] instanceof ScanfOutput | ||
) | ||
} | ||
} | ||
|
||
/** Returns a block guarded by the assertion of `value op call` */ | ||
BasicBlock blockGuardedBy(int value, string op, ScanfFunctionCall call) { | ||
exists(GuardCondition g, Expr left, Expr right | | ||
right = g.getAChild() and | ||
value = left.getValue().toInt() and | ||
DataFlow::localExprFlow(call, right) | ||
| | ||
g.ensuresEq(left, right, 0, result, true) and op = "==" | ||
or | ||
g.ensuresLt(left, right, 0, result, true) and op = "<" | ||
or | ||
g.ensuresLt(left, right, 1, result, true) and op = "<=" | ||
) | ||
} | ||
|
||
from ScanfOutput output, ScanfFunctionCall call, Access access | ||
where | ||
output.getCall() = call and | ||
output.hasGuardedAccess(access, false) | ||
select access, | ||
"$@ is read here, but may not have been written. " + | ||
"It should be guarded by a check that the $@ returns at least " + | ||
output.getMinimumGuardConstant() + ".", access, access.toString(), call, call.toString() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
--- | ||
category: newQuery | ||
--- | ||
* Added a new medium-precision query, `cpp/missing-check-scanf`, which detects `scanf` output variables that are used without a proper return-value check to see that they were actually written. A variation of this query was originally contributed as an [experimental query by @ihsinme](https://github.com/github/codeql/pull/8246). |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
| test.cpp:30:7:30:7 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:30:7:30:7 | i | i | test.cpp:29:3:29:7 | call to scanf | call to scanf | | ||
| test.cpp:46:7:46:7 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:46:7:46:7 | i | i | test.cpp:45:3:45:7 | call to scanf | call to scanf | | ||
| test.cpp:63:7:63:7 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:63:7:63:7 | i | i | test.cpp:62:3:62:7 | call to scanf | call to scanf | | ||
| test.cpp:75:7:75:7 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:75:7:75:7 | i | i | test.cpp:74:3:74:7 | call to scanf | call to scanf | | ||
| test.cpp:87:7:87:7 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:87:7:87:7 | i | i | test.cpp:86:3:86:8 | call to fscanf | call to fscanf | | ||
| test.cpp:94:7:94:7 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:94:7:94:7 | i | i | test.cpp:93:3:93:8 | call to sscanf | call to sscanf | | ||
| test.cpp:143:8:143:8 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:143:8:143:8 | i | i | test.cpp:141:7:141:11 | call to scanf | call to scanf | | ||
| test.cpp:152:8:152:8 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:152:8:152:8 | i | i | test.cpp:150:7:150:11 | call to scanf | call to scanf | | ||
| test.cpp:184:8:184:8 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:184:8:184:8 | i | i | test.cpp:183:7:183:11 | call to scanf | call to scanf | | ||
| test.cpp:203:8:203:8 | j | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 2. | test.cpp:203:8:203:8 | j | j | test.cpp:200:7:200:11 | call to scanf | call to scanf | | ||
| test.cpp:227:9:227:9 | d | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 2. | test.cpp:227:9:227:9 | d | d | test.cpp:225:25:225:29 | call to scanf | call to scanf | | ||
| test.cpp:231:9:231:9 | d | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 2. | test.cpp:231:9:231:9 | d | d | test.cpp:229:14:229:18 | call to scanf | call to scanf | | ||
| test.cpp:243:7:243:7 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:243:7:243:7 | i | i | test.cpp:242:3:242:7 | call to scanf | call to scanf | | ||
| test.cpp:251:7:251:7 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:251:7:251:7 | i | i | test.cpp:250:3:250:7 | call to scanf | call to scanf | | ||
| test.cpp:259:7:259:7 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:259:7:259:7 | i | i | test.cpp:258:3:258:7 | call to scanf | call to scanf | | ||
| test.cpp:271:7:271:7 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:271:7:271:7 | i | i | test.cpp:270:3:270:7 | call to scanf | call to scanf | | ||
| test.cpp:281:8:281:12 | ptr_i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:281:8:281:12 | ptr_i | ptr_i | test.cpp:280:3:280:7 | call to scanf | call to scanf | | ||
| test.cpp:289:7:289:7 | i | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:289:7:289:7 | i | i | test.cpp:288:3:288:7 | call to scanf | call to scanf | | ||
| test.cpp:383:25:383:25 | u | $@ is read here, but may not have been written. It should be guarded by a check that the $@ returns at least 1. | test.cpp:383:25:383:25 | u | u | test.cpp:382:6:382:11 | call to sscanf | call to sscanf | |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Critical/MissingCheckScanf.ql |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great attention to detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think there are even more special cases, that I'm parking for a follow-up issue for now (namely, the *_s variants of scanf introduced in C11).