Skip to content

bpo-29505: Add fuzzing for re.compile, re.load and csv.reader #14255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 30, 2019

Conversation

ammaraskar
Copy link
Member

@ammaraskar ammaraskar commented Jun 20, 2019

@gpshead This should finish up the original bpo ticket, adds fuzzing for all the high-risk functions likely to be exposed to user input. Please let me know if I missed anything you intended to have fuzzed in the original ticket. Current list:

  • float(x)
  • int(x)
  • str(x)
  • json.loads(x)
  • re.compile(x)
  • re.match(..., x)
  • csv.reader(x)

I would have liked to combine the re.compile and re.match tests but what ended up happening was we would generate some catastrophically slow regex like ^(A+)*B and then "A" * 25 will cause a timeout.

https://bugs.python.org/issue29505

@mangrisano
Copy link
Contributor

/cc @serhiy-storchaka @ezio-melotti

}
/* Use the first byte as a uint8_t specifying the index of the
regex to use */
uint8_t idx = ((uint8_t*) data)[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsafe. Safer would be:

uint8_t idx = (uint8_t)(data[0]);

(at any rate, this is defined). But if you want an unsigned type that is guaranteed to hold all of the values of a char, then you want unsigned char instead of uint8_t.

Copy link
Member Author

@ammaraskar ammaraskar Jun 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, unsigned char makes way more sense, done.

Out of curiosity, why is accessing the element and then casting safer than casting the pointer and then accessing the element?

Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall, looks good. just some pedantry around NULL checks after CPython API calls even if you think they're unlikely to be taken.


/* Some random patterns used to test re.match.
Be careful not to add catostraphically slow regexes here, we want to
excercise the matchign code without causing timeouts.*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matching

rv |= _run_fuzz(data, size, fuzz_json_loads);
#endif
#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_sre_compile)
/* Impore sre_compile.compile and sre.error */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import

#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_sre_match)
/* Precompile all the regex patterns on the first run for faster fuzzing */
if (compiled_patterns == NULL) {
PyObject* re_module = PyImport_ImportModule("re");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

being pedantic: check re_module for NULL.

#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_sre_compile)
/* Impore sre_compile.compile and sre.error */
if (sre_compile_method == NULL) {
PyObject* sre_compile_module = PyImport_ImportModule("sre_compile");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pedantic: check sre_compile_module for NULL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same goes for all of your GetAttrString return values.

PyObject* sre_compile_module = PyImport_ImportModule("sre_compile");
sre_compile_method = PyObject_GetAttrString(sre_compile_module, "compile");

PyObject* sre_constants = PyImport_ImportModule("sre_constants");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULL check

#if !defined(_Py_FUZZ_ONE) || defined(_Py_FUZZ_fuzz_csv_reader)
/* Import csv and csv.Error */
if (csv_module == NULL) {
csv_module = PyImport_ImportModule("csv");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULL checks

sre_compile_method, pattern_bytes, flags_obj, NULL);
/* Ignore ValueError as the fuzzer will more than likely
generate some invalid combination of flags */
if (compiled == NULL && PyErr_ExceptionMatches(PyExc_ValueError)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this series if if's with PyErr_ExceptionMatches calls seems somewhat redundant (the comments for each match type are useful). It can also call PyErr_ExceptionMatches after PyErr_Clear has been called as these are sequential if's not else if's. perhaps write it as:

if (compiled == NULL) {
  if (PyErr_ExceptionMatches(xxx) ||
      PyErr_ExceptionMatches(yyy) ||
      ...) {
  }
}

with the comments interspersed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, the comments were primarily why I went with this style but I'll see if I can put them between the ORs.

Copy link
Member Author

@ammaraskar ammaraskar Jun 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, kinda iffy about the formatting. Got any suggestions?

if (PyErr_ExceptionMatches(xxx) ||
    PyErr_ExceptionMatches(yyy)
    ) {
    PyErr_Clear();
    }
}

and

if (PyErr_ExceptionMatches(xxx) ||
    PyErr_ExceptionMatches(yyy)) {
    PyErr_Clear();
    }
}

both look really confusing, so I went with some alternative style.

@miss-islington
Copy link
Contributor

Thanks @ammaraskar for the PR, and @gpshead for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7, 3.8.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 30, 2019
…ythonGH-14255)

Add more fuzz testing for re.compile, re.load and csv.reader
(cherry picked from commit 5cbbbd7)

Co-authored-by: Ammar Askar <ammar@ammaraskar.com>
@bedevere-bot
Copy link

GH-14478 is a backport of this pull request to the 3.8 branch.

@bedevere-bot
Copy link

GH-14479 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 30, 2019
…ythonGH-14255)

Add more fuzz testing for re.compile, re.load and csv.reader
(cherry picked from commit 5cbbbd7)

Co-authored-by: Ammar Askar <ammar@ammaraskar.com>
miss-islington added a commit that referenced this pull request Jun 30, 2019
…H-14255)

Add more fuzz testing for re.compile, re.load and csv.reader
(cherry picked from commit 5cbbbd7)

Co-authored-by: Ammar Askar <ammar@ammaraskar.com>
lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019
…ythonGH-14255)

Add more fuzz testing for re.compile, re.load and csv.reader
DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020
…ythonGH-14255)

Add more fuzz testing for re.compile, re.load and csv.reader
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants