-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
bpo-29505: Add fuzz tests for float(str), int(str), unicode(str) #2878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is an easy place to start, and these functions are probably safe. Please review with it in mind that I want to add more fuzz tests later. While the fuzz tests are included in CPython and compiled / tested on a very basic level inside CPython itself, the actual fuzzing happens as part of oss-fuzz (https://github.com/google/oss-fuzz). The reason to include the tests in CPython is to make sure that they're maintained as part of the CPython project, especially when (as some will) they use internal implementation details in the test. (This will be necessary sometimes because e.g. the fuzz test should never enter Python's interpreter loop, whereas some APIs only expose themselves publicly as Python functions.) This particular set of changes is part of testing Python's builtins, tracked internally at Google by b/37562550.
… occurs. I'm still not happy with how many times I need to repeat the fuzz test name. This can go wrong way too easily. :/
It's possible there's a way to compile this with clang and then link it with clang++, but I don't know how to do that. Specifically, compilation fails with this: $CC $CFLAGS \ -D _Py_FUZZ_ONE -D _Py_FUZZ_$fuzz_test \ -Wno-unused-function \ $($OUT/bin/python3-config --cflags) -g -O1 \ $FUZZ_DIR/fuzzer.c -o $OUT/$fuzz_test -lFuzzingEngine \ $($OUT/bin/python3-config --ldflags) With errors like: /usr/local/bin/../include/c++/v1/new:234: undefined reference to `operator delete(void*)' But it works just fine with this: $CXX $CXXFLAGS \ -D _Py_FUZZ_ONE -D _Py_FUZZ_$fuzz_test \ -Wno-unused-function \ $($OUT/bin/python3-config --cflags) -g -O1 \ $FUZZ_DIR/fuzzer.c -o $OUT/$fuzz_test -lFuzzingEngine \ $($OUT/bin/python3-config --ldflags) Presumably there are ways to do this in C, so for expediency I'm doing it in C++ right now, with the minimal possible C++ changes (extern "C"), so that if I need to change it back to C in code review, it won't be too hard.
(i.e. just to get increased confidence we won't immediately crash on fuzzing.) I'm using s# because I'd like to minimize the diff between Python 2 and 3.
Friendly ping to @gpshead who I believe has more background than is average (e.g. they filed the original issue), and might be useful as a code reviewer. (Not actually sure if that's how this works, but I still think it'd be good to get their input.) |
Modules/_fuzz/fuzzer.cpp
Outdated
// only return 0.) | ||
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { | ||
if (!Py_IsInitialized()) { | ||
// LLVMFuzzerTestOneInput is called repeatedly from the same process, with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is LLVMFuzzerInitialize
not usable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK so, for the record: I didn't know about LLVMFuzzerInitialize
/ it wasn't documented where I was looking.
I googled it to see how dumb I was, and truthfully I still can't find where it's documented, but I did find this thread:
https://codereview.chromium.org/2900373002
To quote it:
This function should only be used if argv is needed, otherwise libfuzzer
best practice is to just use static initialization in
LLVMFuzzerTestOneInput [1].
So actually, I think this is reasonable: we would use a static variable to guard the initialization, but actually, it's useful to do the Py_IsInitialized
check instead, because then we can call it from CPython in the unit test without double-initializing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I didn't even notice the link to docs.
http://llvm.org/docs/LibFuzzer.html#startup-initialization
So the official word is:
Alternatively, you may define an optional init function and it will receive the program arguments that you can read and modify. Do this only if you really need to access argv/argc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 -- I wonder where we file a bug on libFuzzer to document this, the OpenSSL fuzzers use LLVMFuzzerInitialize
in precisely the way this counsels against.
Modules/_fuzz/fuzzer.cpp
Outdated
|
||
// Run fuzzer and abort on failure. | ||
static int _run_fuzz(const uint8_t *data, size_t size, int(*fuzzer)(const char* , size_t)) { | ||
int rv = fuzzer("", 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this should be calling it with (data, size)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes. Worst kind of refactoring error. Thanks for the catch! (Fixed in latest commit.)
Modules/_fuzz/fuzzer.cpp
Outdated
|
||
int rv = 0; | ||
|
||
#define _Py_FUZZ_YES(test_name) (defined(_Py_FUZZ_##test_name) || !defined(_Py_FUZZ_ONE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to make these multiple C files that define their fuzz
function and then #include
the common bits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I understand the suggestion. Care to elaborate a bit?
It might be worth looking at commit d112252 to see how test discovery was when I first hacked something together. I couldn't find any solution I liked, but I'm no C expert. (In Python this would be no sweat!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My suggestion was to have one file per fuzzer, and each of which defines:
static int pyfuzz(const char* data, size_t size) {
// ...
}
#include "fuzz_template.c"
Where fuzz_template.c
defines LLVMFuzzerTestOneInput
and calls pyfuzz
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These won't link together due to ODR violation -- so, the reason I didn't do it that way was so that I could run the code at test-time. (Which leaves your other question below...)
(Screwed up my refactoring when I moved the fuzz definition to CPython.)
Modules/_fuzz/_fuzzmodule.c
Outdated
@@ -0,0 +1,52 @@ | |||
#include <Python.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this module? It's only usage seems to be for the tests, and it looks like it has as much code as the fuzzers themselves!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but this would be like 2 lines in Cython. It's just the typical boilerplate of extension modules, nothing to write home about, right?
We don't need it. The thing we need is to make sure that the fuzz tests are maintained and don't break. That is, they need to be in CI for Python, not just run in the fuzz suite in oss-fuzz.
For most ways it could break, it would be enough just to build them, but not run them. I think it is a good idea to also run them. Although there's a cost: the macro silliness to make them runnable individually and collectively probably makes it easy to screw up, so I'm not happy about it. (Every fuzz test name is repeated like four times.) The strategy you mentioned above would make it less error prone, at the cost of losing the ability to call them from a unit test like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me what I see in this code is pretty standard CPython API calls. That raises the question of if there is a good reason to check these in and keep them working as part of CPython or if this code should be a stand alone module available on pypi that people could use to launch fuzz tests of any Python interpreter it is built with. The answer to that question likely comes from answering how oss-fuzz the fuzz testing service works. If it monitors our github repo and reruns fuzz tests as things change in cpython, including these here may make sense. I think a comment describing the situation and why this exists here with links to relevant oss-fuzz documentation (either in here or in the _fuzz/README.rst) would help that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a little to README.rst, but to summarize and expand here:
You're correct that oss-fuzz regularly pulls from CPython and runs the fuzz tests locally.
oss-fuzz prefers to have the tests be a tested part of the upstream project so that they don't break. Certainly not all of these fuzz tests are using the public API (e.g._Py_HashBytes), and there will be more in the future. Those are more likely to they might break in the future -- having them here ensures that breaking them would break the tests, and they will always be good to run, hopefully.
We could keep the public-API-using fuzz tests separate in oss-fuzz, but they wouldn't really like that, and that actually splits up the tests in a confusing way -- some would be here, some would be in the oss-fuzz repo. I think they're better off all being here.
Lib/test/test_fuzz.py
Outdated
@@ -0,0 +1,20 @@ | |||
import unittest | |||
from test import support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see below if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Deleted support import.)
Lib/test/test_fuzz.py
Outdated
_fuzz.run(b"1") | ||
|
||
if __name__ == "__main__": | ||
support.run_unittest(TestFuzz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the usual unittest.main(verbosity=<select>)
not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cribbed this from an existing test, (I assumed CPython had some weird test framework), but I see I must've picked the wrong test to crib from. :)
Fixed.
Travis says the I can't find the compilation error on that page. Is there somewhere else I should be looking? Similar with gcc: https://travis-ci.org/python/cpython/jobs/257543387 |
That is the page where I would have expected something. |
Modules/_fuzz/fuzzer.cpp
Outdated
@@ -0,0 +1,118 @@ | |||
// A fuzz test for CPython. | |||
// | |||
// Unusually for CPython, this is written in C++ for the benefit of linking with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Unusually"? This would be literally the first C++ code shipped by CPython that ran inside the interpreter. (The existing CPP files shipped by CPython are all for Windows support, and all run outside the interpreter.) It's my understanding that CPython has zero C++ in it very much on purpose. I understand why you use C++ here, but I'd want to get BDFL blessing before I merged this file.
Also, a quick perusal of the documentation for libfuzzer suggests a strong dependency on Clang. CPython supports at least two other compilers--gcc and Microsoft Visual C++--and supports at least one major platform where Clang is not a supported compiler (Windows). Can you confirm that the build process and test suite will degrade gracefully when compiling with gcc or MSVC++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be literally the first C++ code shipped by CPython that ran inside the interpreter.
The only reason it even "ships" is because I want to run it in tests. Is there a way to include it in test builds but exclude it from the release, or does it not make much difference?
I understand why you use C++ here
To be clear: I actually don't understand myself why I use C++ here. Isn't there a way to run the compiler on a C file and link it with something else that uses C++? My experience with running the C/C++ compilers is minimal, and I am pretty sure we can use C here, somehow.
I definitely don't want this PR to come across as "we must use C++", but rather -- I did it in C++ to get it working so I could mail something out, but I'd really appreciate learning how to do it in C instead. (Most of my time in this PR was actually spent getting it building with oss-fuzz's environment, sadly.)
See commit cb9cdc0 for more details on what fails to build and why.
Also, a quick perusal of the documentation for libfuzzer suggests a strong dependency on Clang. CPython supports at least two other compilers--gcc and Microsoft Visual C++--and supports at least one major platform where Clang is not a supported compiler (Windows). Can you confirm that the build process and test suite will degrade gracefully when compiling with gcc or MSVC++?
I can do one better: nothing about this actually depends on libFuzzer, so this should compile and run on every platform (that has C++ support).
Of course, adding a dependency on C++ support to pass tests is, per above, probably not acceptable. My hope is we can get it working in C, otherwise I will make the test fail gracefully in the absence of a working C++ compiler.
Some background might help: tl;dr is that to support fuzzing / libFuzzer, all you need to do is export a specific function, LLVMFuzzerTestOneInput
. libFuzzer actually provides a main()
which calls that function for fuzz tests. We aren't doing that here though, we're just defining and testing LLVMFuzzerTestOneInput
.
In other words, to write a fuzzer, you define LLVMFuzzerTestOneInput
in foo.cpp
and then use clang++ foo.cpp -lfuzzer
to compile the fuzz test to an executable binary. If you want to test that LLVMFuzzerTestOneInput
is remotely valid in your continuous integration, separately from running it in a fuzz test, you can call it from your unit tests.
So the only relationship this code has with libFuzzer is that it exports a function that libFuzzer expects to exist. It doesn't actually depend on libFuzzer, and isn't built against libFuzzer within Python continuous integration. It works on all platforms you'd expect it to -- it's just a C++ file that calls the regular CPython API. It is only built against libFuzzer in the oss-fuzz project, which will actually check out a copy of cpython at the master revision and compile the fuzz tests against libFuzzer, and run them.
And to go full circle, this is the core of why it was written in C++ instead of C -- if I compile a C file with "-lfuzzer" it completely blows up because libFuzzer still depends on libstdc++ and I don't know how to compile with C but still link with C++ stuff. (Or something like that. I'm a C++/C noob, I use blaze/bazel for everything.)
I hope that gives enough background that this makes a little more sense. And I'll try to write this up a bit in README.rst once I get back to my work computer tomorrow!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a small note, but it's probably insufficient. Fortunately, we do link to the oss-fuzz docs if people are confused.
Lib/test/test_fuzz.py
Outdated
@@ -0,0 +1,19 @@ | |||
import unittest | |||
import _fuzz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hard dependency on the _fuzz module suggests that this file simply won't work unless _fuzz is available, and AFAICT it won't be available when compiling with gcc or MSVC++. You need to make sure these tests are skipped when the _fuzz module is unavailable.
My hypothesis right now for why the test is broken in Travis is that the |
Ah, no, the compile error is still there, just hard to find because ctrl+f doesn't seem to work correctly with travis/chrome (???) I suspect the error is this one:
I don't know why that works on my system. @kcc do you have any advice on how to run a fuzzer written in C, while linking to libFuzzer? See commit cb9cdc0. |
[sorry for delay, missed the message] Using a C target with libFuzzer is as simple as using a C++ target. |
I'll figure out how to do whatever it is @kcc said. (I guess I run |
of course it will, we have lots of C-only fuzz targets. |
(Despite not understanding how to compile C code.)
I should amend: indeed it did work, as of the latest commit in the PR to oss-fuzz. (With the catch that I had no idea how to build .o files at first, apparently there's some |
How do I add a "skip news" label? Not seeing anything in the github UI like it says in https://help.github.com/articles/applying-labels-to-issues-and-pull-requests/ |
Since it's now required in setup.py, we know that the test will run on linux. On Windows we don't really care if it runs or not -- the fuzz tests are on a linux machine as far as I'm aware -- and I don't have a windows box to mess with Visual Studio files in. Big thanks to zware on #python-dev for walking me through this.
This seems to cause the coverage build to crash; see here. I've confirmed the crash on Gentoo Linux with gcc 5.4.0. |
I can't reproduce. What do we do next? |
…ryError on some machines. (But not mine, so I can't debug. If anyone can debug, I'm sure this would be better off fixed than ignored...) See discussion on python#2878
Since I can't reproduce the failure, which means I can't debug it, I blacklisted it instead. |
Modules/_fuzz/fuzzer.c
Outdated
@@ -0,0 +1,118 @@ | |||
/* A fuzz test for CPython. | |||
|
|||
Unusually for CPython, this is written in C++ for the benefit of linking with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scanning though this, it looks like C to me (as the filename suggests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Fixed.
setup.py
Outdated
@@ -715,6 +715,12 @@ def detect_modules(self): | |||
# syslog daemon interface | |||
exts.append( Extension('syslog', ['syslogmodule.c']) ) | |||
|
|||
# Fuzz tests. | |||
exts.append( Extension( | |||
'_fuzz', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What prevents the _fuzz extension module from being installed and shipped with CPython? It seems like something needed only at build + test time in order to prevent bitrot of code that is primarily used outside of our own test suite. Not something to be distributed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing. I don't think it should be distributed with Python, but I don't know how to stop that while still keeping it built for tests. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the _fuzz
module and it's corresponding test_fuzz.py add any real value. I expect you could make fuzzer.c
be compile-only (with no attempt to link it into the main binary) with a Makefile.pre.in modification which should be good enough. At that point it could live in the Programs/
directory next to with _testembed.c
with which it seems more akin to instead of under Modules/
A test that calling the fuzz function with a trivial input in our own test suite not crashing is not so important as the oss-fuzz project by its very nature is going to be doing that so we'd be notified. That we at least keep the code compiling is pretty good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, we can do this. It isn't ideal / isn't what oss-fuzz would like us to do. See e.g. this doc: https://github.com/google/oss-fuzz/blob/master/docs/ideal_integration.md#regression-testing
AIUI the oss-fuzz project doesn't want to be in the game of reporting broken tests, only of reporting actual security issues. (Unfortunately, I'm already deviating fairly substantially from what's ideal, but I'd like to be as close as possible).
(In particular, I think it is adding value: we should separate fuzz tests from unit tests / smoke tests so that reporting is easier and breakages less common.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, talking to @ncoghlan at the dev. sprint this week, we've got other modules that are built and effectively "shipped" but are really used for testing/development only. so this isn't the first, second, or even third such thing. _test_capi
is one of them. xxlimited
is another. so i'm going to propose _xx_test_fuzz as the name to "Use all the naming idioms!" . Python distribution creators are expected to realize that isn't needed in the main interpreter runtime package.
... let me rename things and look at this in whole again from that point of view.
Modules/_fuzz/fuzzer.c
Outdated
|
||
/* Fuzz PyLong_FromUnicodeObject as a proxy for int(str). */ | ||
static int fuzz_builtin_int(const char* data, size_t size) { | ||
int base = _Py_HashBytes(data, size) % 36; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment on what you are doing here. it looks like choosing a random base value in the 0..35 range based on the random input data. why that specific range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh. That range is a bug! It should be [0] union [2..36], not [0] union [2..35] >_<
Fixed and added comments.
Modules/_fuzz/fuzzer.c
Outdated
} | ||
|
||
PyObject* s = PyUnicode_FromStringAndSize(data, size); | ||
if (PyErr_Occurred()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would ordinarily check if (s == NULL) rather than blindly calling PyErr_Occurred() on these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Modules/_fuzz/_fuzzmodule.c
Outdated
@@ -0,0 +1,52 @@ | |||
#include <Python.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me what I see in this code is pretty standard CPython API calls. That raises the question of if there is a good reason to check these in and keep them working as part of CPython or if this code should be a stand alone module available on pypi that people could use to launch fuzz tests of any Python interpreter it is built with. The answer to that question likely comes from answering how oss-fuzz the fuzz testing service works. If it monitors our github repo and reruns fuzz tests as things change in cpython, including these here may make sense. I think a comment describing the situation and why this exists here with links to relevant oss-fuzz documentation (either in here or in the _fuzz/README.rst) would help that.
I think the AppVeyor failure is spurious (I haven't really changed anything to make subprocess fail, and it passed before). Is there a way to rerun? |
Testing it resulted in a few cleanups such as PY_SSIZE_T_CLEAN being required to avoid from crashing in an opt build.
That was just hiding the actual error in the first place that happened due to PY_SSIZE_T_CLEAN not being there.
while those tests are running (I expect them to pass), given I've renamed the path I assume a config update on the oss-fuzz side would be needed if this went in as needed so it could find the |
Thanks for checking in with Nick and committing those changes! Yes, I can update the oss-fuzz PR to use the new directory name. (It won't be merged until after this PR is merged). I'm unsure about commit 43620ae, the reason I skipped it was that @zware reported that it crashed coverage, and I couldn't reproduce, but I could skip it. It's a hack, but without some way of reliably reproducing test failures... maybe someone else who can reproduce it can fix the test. |
It crashed the Travis coverage check, but was unrelated to coverage (I reproduced on my own machine with just |
yeah, i expect it was related to the actual bug I fixed. without PY_SSIZE_T_CLEAN defined, the upper half of the size_t was garbage. When it was zero by chance (common on debug builds) things passed, otherwise they crashed as that length was nonsense. :) |
This is an easy place to start, and these functions are probably safe. Please review with it in mind that I want to add more fuzz tests later.
While the fuzz tests are included in CPython and compiled / tested on a very basic level inside CPython itself, the actual fuzzing happens as part of oss-fuzz (https://github.com/google/oss-fuzz). The reason to include the tests in CPython is to make sure that they're maintained as part of the CPython project, especially when (as some will) they use internal implementation details in the test, and can be broken by changes to CPython.
(This will be necessary sometimes because e.g. the fuzz test should never enter Python's interpreter loop, whereas some APIs only expose themselves publicly as Python functions.)
The corresponding oss-fuzz PR adding CPython to oss-fuzz is google/oss-fuzz#731
This particular set of changes is part of testing Python's builtins, tracked internally at Google by b/37562550.
BTW, I think the code here is pretty questionable, and am not happy with how complex it is to add a new fuzz test. I'd appreciate input on how to make this friendlier. I'm also new to CPython and C/C++ development as a whole and suspect I've done everything wildly wrong. (In particular, I think the fuzzer should be written in C, but can't figure out how to make that compile -- see commit cb9cdc0).
https://bugs.python.org/issue29505