-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: structured datatype safety checks #5548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dd549f5
to
2322a79
Compare
2322a79
to
ab604c4
Compare
8c5736f
to
73e4afe
Compare
All right, I think this is done. I'd appreciate any comments or suggestions! |
count = 1 | ||
for dim in dtype.shape: | ||
count *= dim | ||
fields.extend((typ,off + dtype.base.itemsize*j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after comma
Thanks for putting this together, @ahaldane. There has obviously been a lot of hard thinking going on! I will go on reviewing later, just two things for now, aside from all the style nitpicks:
|
Thanks for looking over it @jaimefrio. I think I've fixed the style issues, and I also made a few other small changes. I made it a second commit for clarity on what's changing. I'll look into adding some extra tests, and I agree it might be possible to make the overlap checks smarter. I'll also ping the list. |
/* check that we are not reinterpreting memory containing Objects */ | ||
_numpy_internal = PyImport_ImportModule("numpy.core._internal"); | ||
if (_numpy_internal == NULL) { | ||
Py_DECREF(newtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only indent more than 4 spaces if the if
condition spans more than 1 line.
00a31fe
to
432ddcd
Compare
Static allocation is now in, using #5940. It looks like I don't need to set the error string if npy_cache_pyfunc fails, since it will already be set. If that's fine, then I think this is done. |
f3903e3
to
958bf63
Compare
Previously views of structured arrays containing objects were completely disabled. This commit adds more lenient check for whether an object-array view is allowed, and adds similar checks to getfield/setfield Fixes numpy#2346. Fixes numpy#3256. Fixes numpy#2599. Fixes numpy#3253. Fixes numpy#3286. Fixes numpy#5762.
958bf63
to
e2cd6fa
Compare
@jaimefrio My concerns have been taken care of. |
It shall then be merged, @charris ;-) |
ENH: structured datatype safety checks
Thanks, @ahaldane! |
Thanks to both of you too! |
Because of slowdowns caused by the view safety checks introduced in numpy#5548 they are removed here for 1.10. The plan is to reintroduce a better version of the checks in 1.11.
Because of slowdowns caused by the view safety checks introduced in numpy#5548 they are removed here for 1.10. The plan is to reintroduce a better version of the checks in 1.11.
Because of slowdowns caused by the view safety checks introduced in numpy#5548 they are removed here for 1.10. The plan is to reintroduce a better version of the checks in 1.11.
This PR tries to solve some issues related structured datatypes, especially view/getfield/setfield. Previously views of structured arrays containing objects were completely disabled. This commit adds more lenient check for whether an object-array view is allowed, and adds similar checks to getfield/setfield
(This PR has been edited around 3/4/15. I originally planned other changes which I will submit as separate PRs)
I believe this PR solves issues #2599, #3256, #3286, #2346, #3253. Issue #5081 is also now partially fixed, but requires further thought (please take a look at the example there). It also fixes the new scipy issue scipy/scipy#4589.
Only one test needed modification to pass:
test_objectarray_setfield
checks thatsetfield
with objects is disallowed, and expects aRuntimeError
but I give itTypeError
.Summary
Previously views of structured arrays containing objects were completely disabled, which was a problem since views are needed in a few important cases such as when indexing multiple fields of a structured array, or in
__getitem__
in record arrays.With help from @jaimefrio I now have functions that carefully check whether a view into array would interpret non-Object memory as an object, or vice versa, and allow the view if there is no problem. Big thanks to Jaime for help, code, and explanations on this. I think the checks are compatible with his PR #5508. I check that 1) No memory that is not an object will be interpreted as a object, 2) No memory containing an object will be interpreted as an arbitrary type. Both cases can cause segfaults.
I also added similar checks to
PyArray_GetField
andPyArray_SetField
. Previsouly, the user could view arbitrary data as a python Object, and it was fairly easy to make cases where a segfault would happen.Note, however, that there are still "unsafe" operations (wrt Objects) in numpy which I hope to examine in future PRs, when I have time.
How to keep track of objects
There is a design choice here worth discussing. As part of the check for objects when taking a view, we need to go though the new dtypes's fields and make sure none of them overlap with memory containing an object in the old dtype (unless the field is also an object). Consider this potential view:
The problem would be that writing to v2arr will overwrite the Object in basearr. We need to prevent this, but the challenge is that
varr
doesn't "know" that there is an object at position 8. This raises the question: How do we figure out where the objects are in memory to prevent them from being viewed?The strategy I have taken in this PR is to only allow views into fields "known" to the parent array. In the example above, the
v2arr
view would be disallowed, and only views ofvarr
into offsets 0 and 16 would be allowed. (specifically, views into bytes 0-8 and 16-24). The downside is that many views are no longer "reversible", since once you take a view that "forgets" what was at an offset, you can never view the memory at that offset again in further views. While this constraint only really has to apply to object arrays (as tested withNPY_ITEM_HASOBJECT
, which I think is inherited in all views) I think for consistency it should apply to all structured arrays.. That is, all 'partial' views (views that have less fields than the original array) would be irreversible.An alternate strategy is to too look up all the object positions in the "base" object of the array,
varr.base
in this case. I assume that the "base" array (if it is an array) contains the list of all object offsets, which I use to check for view overlaps. I implemented this in a separate branch (partial version here), but it was difficult becausearr.base
does not always lead to an array object. For example it may be a buffer, in which case I disallow views containing object types, but that's not a big problem because users aren't allowed to create object arrays from buffers anyway. Another problem is (void) scalar structured types, as they do not list their "base" object even after being converted to a 0-d array. To fix this I made void scalar types' base attribute point to the original array. However, there are still some cases where I cannot get to the original array - for examplenditer
buffers the source array, which causes about 5 unit tests to fail, which I have not solved yet (but I could try). The advantage would be that views are reversible.The irreversible view strategy seems better because it touches less parts, has fewer edge cases, and seems to me less likely to go wrong. It also works right now. I also have not imagined any scenarios where inability to reverse a view would be a big problem, and no related unit tests fail.
Note that even if the 'base array' strategy is not good, perhaps it's still a good idea to make void scalars "base" attribute point to the original array, as another PR? I admit I am not totally clear on the true purposes of "base" objects.
In this PR I also made some small tweaks to recarrays: I don't think they need to implement their own
view
function as a result of my last PRs.