-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Add a fuzz target for _elementtree.XMLParser._parse_whole
#111477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These files were copied from `Lib/test/xmltestdata` in this same repo.
It looks like there is CIFuzz CI job. But that is failing to get a seed corpus, even though I've added one to the source tree. I wonder if I'm missing a step to ensure that the corpus is used?
|
It also doesn't look like the provided dictionary file for the new fuzz target is being used in the CIFuzz job... |
Why 60 files? |
Oh, it looks like I made a typo in the corpus and dictionary pathnames (" |
So it's actually just 3 existing files that were changed. I suppose the seed corpus could be shrunk down if that's objectionable. |
No worries, I was just curious (few PRs touch that many files so I thought there might have been a mistake). |
Have you run this locally and verified that it makes progress / reaches new functions / gains coverage? |
Yes! Since it has a relatively small starting corpus, it gets new coverage and hits new functions very quickly. In my own non-oss-fuzz setup, I have a corpus of some 15k inputs that would get regularly minimized and pruned. You can see in the CIFuzz GitHub Actions job (look at the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, makes sense to me. A couple of minor edits to make in the C code that really only matter for pedantic correctness in the never likely MemoryError situation during initialization.
Not related to this PR, but I'm curious: what is |
Also another question: was it considered to write the fuzzers as Python code and execute them using the |
LGTM modulo @gpshead's comments. |
[off-topic-ish]
This PR is for the set oss-fuzz runs which we maintain directly (_xxtestfuzz) which is configured in https://github.com/google/oss-fuzz/tree/master/projects/cpython3. (via #73691) There is a second Python fuzzing project not in our repo using a different approach from a later contributor, also linked from the above issue that can be found in https://github.com/guidovranken/python-library-fuzzers configured to run via https://github.com/google/oss-fuzz/tree/master/projects/python3-libraries. (it tends to write things directly in Python) |
Yes, it looks like something along those lines. The plumbing is a bit weird. All the fuzz targets in the cpython tree are defined in that one The entire file also gets compiled into a Python module at |
Indeed, a different approach to fuzz-testing Python would be to write the fuzz targets in normal Python code. Guido Vranken's repo above takes that approach. I also experimented with that years ago when I was first doing fuzzing of CPython. The advantages of writing fuzz targets in Python, when trying to fuzz CPython itself:
The disadvantages of writing fuzz targets in Python, when trying to fuzz CPython itself:
|
…11477) * Add a fuzzer for `_elementtree.XMLParser._parse_whole`
…11477) * Add a fuzzer for `_elementtree.XMLParser._parse_whole`
This PR adds a new fuzz target for the
_elementtree.XMLParser._parse_whole
method, which is implemented in C. The fuzz target merely attempts to parse the given input data using that method, checking that it does not crash. No higher-level properties are checked.A dictionary of XML-relevant tokens is included, as is a seed corpus based on the XML files from
Lib/test/xmltestdata
.