-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
bpo-29505: Add fuzzing for re.compile, re.load and csv.reader #14255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7425844
to
bbe54f6
Compare
Modules/_xxtestfuzz/fuzzer.c
Outdated
} | ||
/* Use the first byte as a uint8_t specifying the index of the | ||
regex to use */ | ||
uint8_t idx = ((uint8_t*) data)[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsafe. Safer would be:
uint8_t idx = (uint8_t)(data[0]);
(at any rate, this is defined). But if you want an unsigned type that is guaranteed to hold all of the values of a char, then you want unsigned char
instead of uint8_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, unsigned char
makes way more sense, done.
Out of curiosity, why is accessing the element and then casting safer than casting the pointer and then accessing the element?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, looks good. just some pedantry around NULL checks after CPython API calls even if you think they're unlikely to be taken.
Modules/_xxtestfuzz/fuzzer.c
Outdated
|
||
/* Some random patterns used to test re.match. | ||
Be careful not to add catostraphically slow regexes here, we want to | ||
excercise the matchign code without causing timeouts.*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matching
Modules/_xxtestfuzz/fuzzer.c
Outdated
rv |= _run_fuzz(data, size, fuzz_json_loads); | ||
#endif | ||
#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_sre_compile) | ||
/* Impore sre_compile.compile and sre.error */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import
Modules/_xxtestfuzz/fuzzer.c
Outdated
#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_sre_match) | ||
/* Precompile all the regex patterns on the first run for faster fuzzing */ | ||
if (compiled_patterns == NULL) { | ||
PyObject* re_module = PyImport_ImportModule("re"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
being pedantic: check re_module for NULL.
Modules/_xxtestfuzz/fuzzer.c
Outdated
#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_sre_compile) | ||
/* Impore sre_compile.compile and sre.error */ | ||
if (sre_compile_method == NULL) { | ||
PyObject* sre_compile_module = PyImport_ImportModule("sre_compile"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pedantic: check sre_compile_module for NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same goes for all of your GetAttrString return values.
Modules/_xxtestfuzz/fuzzer.c
Outdated
PyObject* sre_compile_module = PyImport_ImportModule("sre_compile"); | ||
sre_compile_method = PyObject_GetAttrString(sre_compile_module, "compile"); | ||
|
||
PyObject* sre_constants = PyImport_ImportModule("sre_constants"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NULL check
Modules/_xxtestfuzz/fuzzer.c
Outdated
#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_csv_reader) | ||
/* Import csv and csv.Error */ | ||
if (csv_module == NULL) { | ||
csv_module = PyImport_ImportModule("csv"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NULL checks
sre_compile_method, pattern_bytes, flags_obj, NULL); | ||
/* Ignore ValueError as the fuzzer will more than likely | ||
generate some invalid combination of flags */ | ||
if (compiled == NULL && PyErr_ExceptionMatches(PyExc_ValueError)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this series if if's with PyErr_ExceptionMatches calls seems somewhat redundant (the comments for each match type are useful). It can also call PyErr_ExceptionMatches after PyErr_Clear has been called as these are sequential if's not else if's. perhaps write it as:
if (compiled == NULL) {
if (PyErr_ExceptionMatches(xxx) ||
PyErr_ExceptionMatches(yyy) ||
...) {
}
}
with the comments interspersed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, the comments were primarily why I went with this style but I'll see if I can put them between the ORs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, kinda iffy about the formatting. Got any suggestions?
if (PyErr_ExceptionMatches(xxx) ||
PyErr_ExceptionMatches(yyy)
) {
PyErr_Clear();
}
}
and
if (PyErr_ExceptionMatches(xxx) ||
PyErr_ExceptionMatches(yyy)) {
PyErr_Clear();
}
}
both look really confusing, so I went with some alternative style.
Thanks @ammaraskar for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8. |
…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader (cherry picked from commit 5cbbbd7) Co-authored-by: Ammar Askar <ammar@ammaraskar.com>
GH-14478 is a backport of this pull request to the 3.8 branch. |
GH-14479 is a backport of this pull request to the 3.7 branch. |
…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader (cherry picked from commit 5cbbbd7) Co-authored-by: Ammar Askar <ammar@ammaraskar.com>
…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader
…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader
@gpshead This should finish up the original bpo ticket, adds fuzzing for all the high-risk functions likely to be exposed to user input. Please let me know if I missed anything you intended to have fuzzed in the original ticket. Current list:
float(x)
int(x)
str(x)
json.loads(x)
re.compile(x)
re.match(..., x)
csv.reader(x)
I would have liked to combine the
re.compile
andre.match
tests but what ended up happening was we would generate some catastrophically slow regex like^(A+)*B
and then"A" * 25
will cause a timeout.https://bugs.python.org/issue29505