bpo-29505: Add fuzzing for re.compile, re.load and csv.reader #14255

ammaraskar · 2019-06-20T03:48:55Z

@gpshead This should finish up the original bpo ticket, adds fuzzing for all the high-risk functions likely to be exposed to user input. Please let me know if I missed anything you intended to have fuzzed in the original ticket. Current list:

float(x)
int(x)
str(x)
json.loads(x)
re.compile(x)
re.match(..., x)
csv.reader(x)

I would have liked to combine the re.compile and re.match tests but what ended up happening was we would generate some catastrophically slow regex like ^(A+)*B and then "A" * 25 will cause a timeout.

https://bugs.python.org/issue29505

mangrisano · 2019-06-20T10:38:54Z

/cc @serhiy-storchaka @ezio-melotti

drj11 · 2019-06-26T08:05:34Z

Modules/_xxtestfuzz/fuzzer.c

+    }
+    /* Use the first byte as a uint8_t specifying the index of the
+       regex to use */
+    uint8_t idx = ((uint8_t*) data)[0];


This is unsafe. Safer would be:

uint8_t idx = (uint8_t)(data[0]);

(at any rate, this is defined). But if you want an unsigned type that is guaranteed to hold all of the values of a char, then you want unsigned char instead of uint8_t.

Thank you, unsigned char makes way more sense, done.

Out of curiosity, why is accessing the element and then casting safer than casting the pointer and then accessing the element?

gpshead

overall, looks good. just some pedantry around NULL checks after CPython API calls even if you think they're unlikely to be taken.

gpshead · 2019-06-28T05:24:04Z

Modules/_xxtestfuzz/fuzzer.c

+
+/* Some random patterns used to test re.match.
+   Be careful not to add catostraphically slow regexes here, we want to
+   excercise the matchign code without causing timeouts.*/


gpshead · 2019-06-28T05:26:26Z

Modules/_xxtestfuzz/fuzzer.c

    rv |= _run_fuzz(data, size, fuzz_json_loads);
+#endif
+#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_sre_compile)
+    /* Impore sre_compile.compile and sre.error */


gpshead · 2019-06-28T05:29:13Z

Modules/_xxtestfuzz/fuzzer.c

+#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_sre_match)
+    /* Precompile all the regex patterns on the first run for faster fuzzing */
+    if (compiled_patterns == NULL) {
+        PyObject* re_module = PyImport_ImportModule("re");


being pedantic: check re_module for NULL.

gpshead · 2019-06-28T05:29:31Z

Modules/_xxtestfuzz/fuzzer.c

+#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_sre_compile)
+    /* Impore sre_compile.compile and sre.error */
+    if (sre_compile_method == NULL) {
+        PyObject* sre_compile_module = PyImport_ImportModule("sre_compile");


pedantic: check sre_compile_module for NULL

the same goes for all of your GetAttrString return values.

gpshead · 2019-06-28T05:29:43Z

Modules/_xxtestfuzz/fuzzer.c

+        PyObject* sre_compile_module = PyImport_ImportModule("sre_compile");
+        sre_compile_method = PyObject_GetAttrString(sre_compile_module, "compile");
+
+        PyObject* sre_constants = PyImport_ImportModule("sre_constants");


gpshead · 2019-06-28T05:31:14Z

Modules/_xxtestfuzz/fuzzer.c

+#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_csv_reader)
+    /* Import csv and csv.Error */
+    if (csv_module == NULL) {
+        csv_module = PyImport_ImportModule("csv");


NULL checks

gpshead · 2019-06-28T09:43:09Z

Modules/_xxtestfuzz/fuzzer.c

+        sre_compile_method, pattern_bytes, flags_obj, NULL);
+    /* Ignore ValueError as the fuzzer will more than likely
+       generate some invalid combination of flags */
+    if (compiled == NULL && PyErr_ExceptionMatches(PyExc_ValueError)) {


this series if if's with PyErr_ExceptionMatches calls seems somewhat redundant (the comments for each match type are useful). It can also call PyErr_ExceptionMatches after PyErr_Clear has been called as these are sequential if's not else if's. perhaps write it as:

if (compiled == NULL) { if (PyErr_ExceptionMatches(xxx) || PyErr_ExceptionMatches(yyy) || ...) { } }

with the comments interspersed.

Agreed, the comments were primarily why I went with this style but I'll see if I can put them between the ORs.

Done, kinda iffy about the formatting. Got any suggestions?

if (PyErr_ExceptionMatches(xxx) || PyErr_ExceptionMatches(yyy) ) { PyErr_Clear(); } }

and

if (PyErr_ExceptionMatches(xxx) || PyErr_ExceptionMatches(yyy)) { PyErr_Clear(); } }

both look really confusing, so I went with some alternative style.

miss-islington · 2019-06-30T05:54:46Z

Thanks @ammaraskar for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8.
🐍🍒⛏🤖

…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader (cherry picked from commit 5cbbbd7) Co-authored-by: Ammar Askar <ammar@ammaraskar.com>

bedevere-bot · 2019-06-30T05:54:55Z

GH-14478 is a backport of this pull request to the 3.8 branch.

bedevere-bot · 2019-06-30T05:55:01Z

GH-14479 is a backport of this pull request to the 3.7 branch.

…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader (cherry picked from commit 5cbbbd7) Co-authored-by: Ammar Askar <ammar@ammaraskar.com>

…H-14255) Add more fuzz testing for re.compile, re.load and csv.reader (cherry picked from commit 5cbbbd7) Co-authored-by: Ammar Askar <ammar@ammaraskar.com>

…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader

the-knights-who-say-ni added the CLA signed label Jun 20, 2019

bedevere-bot added the awaiting review label Jun 20, 2019

gpshead self-assigned this Jun 20, 2019

gpshead self-requested a review June 20, 2019 03:56

bpo-29505: Add fuzzing for re.compile, re.load and csv.reader

bbe54f6

ammaraskar force-pushed the fuzz_everything branch from 7425844 to bbe54f6 Compare June 20, 2019 04:05

drj11 reviewed Jun 26, 2019

View reviewed changes

Use unsigned char for indexing

f94cb1e

gpshead reviewed Jun 28, 2019

View reviewed changes

ammaraskar and others added 2 commits June 28, 2019 05:21

Clean up initialization code to handle all errors

7d0e0e2

return 0 on init_json_loads error.

664161e

gpshead reviewed Jun 28, 2019

View reviewed changes

Collect up conditionals for some exceptions

9f70de8

gpshead added skip news needs backport to 3.7 labels Jun 30, 2019

gpshead merged commit 5cbbbd7 into python:master Jun 30, 2019

bedevere-bot removed the awaiting review label Jun 30, 2019

bedevere-bot removed the needs backport to 3.8 label Jun 30, 2019

bedevere-bot removed the needs backport to 3.7 label Jun 30, 2019

lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019

bpo-29505: Add more fuzzing for re.compile, re.load and csv.reader (p…

84c996c

…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader

DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020

bpo-29505: Add more fuzzing for re.compile, re.load and csv.reader (p…

2ad6759

…ythonGH-14255) Add more fuzz testing for re.compile, re.load and csv.reader

gpshead mentioned this pull request Apr 10, 2022

Submit the re, json, csv, & struct modules to oss-fuzz testing #73691

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bpo-29505: Add fuzzing for re.compile, re.load and csv.reader #14255

bpo-29505: Add fuzzing for re.compile, re.load and csv.reader #14255

ammaraskar commented Jun 20, 2019 •

edited by bedevere-bot

Loading

mangrisano commented Jun 20, 2019

drj11 Jun 26, 2019

ammaraskar Jun 26, 2019 •

edited

Loading

gpshead left a comment

gpshead Jun 28, 2019

gpshead Jun 28, 2019

gpshead Jun 28, 2019

gpshead Jun 28, 2019

gpshead Jun 28, 2019

gpshead Jun 28, 2019

gpshead Jun 28, 2019

gpshead Jun 28, 2019

ammaraskar Jun 28, 2019

ammaraskar Jun 28, 2019 •

edited

Loading

miss-islington commented Jun 30, 2019

bedevere-bot commented Jun 30, 2019

bedevere-bot commented Jun 30, 2019

bpo-29505: Add fuzzing for re.compile, re.load and csv.reader #14255

bpo-29505: Add fuzzing for re.compile, re.load and csv.reader #14255

Conversation

ammaraskar commented Jun 20, 2019 • edited by bedevere-bot Loading

mangrisano commented Jun 20, 2019

Choose a reason for hiding this comment

ammaraskar Jun 26, 2019 • edited Loading

Choose a reason for hiding this comment

gpshead left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ammaraskar Jun 28, 2019 • edited Loading

Choose a reason for hiding this comment

miss-islington commented Jun 30, 2019

bedevere-bot commented Jun 30, 2019

bedevere-bot commented Jun 30, 2019

ammaraskar commented Jun 20, 2019 •

edited by bedevere-bot

Loading

ammaraskar Jun 26, 2019 •

edited

Loading

ammaraskar Jun 28, 2019 •

edited

Loading