-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
BUG: Fix reference counting for subarrays containing objects #12650
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
*fromdata = d->fromdata; | ||
char *bufferin = d->bufferin, *bufferout = d->bufferout; | ||
PyArray_StridedUnaryOp *_strided_to_strided = PyArray_GetStridedCopyFn( | ||
0, dst_stride, sizeof(PyObject *), sizeof(PyObject *)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual change/hack. Instead of just using memset
to initialize the buffer, copy the data from the original array into the buffer, so that decref'ing can occur. I could imagine that we can go somewhere earlier to a slow direct cast loop, which calls the cast function for every single element instead of attempting to speed it up by doing it in chunks (frankly, I doubt that helps for objects that need casting anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg - could to separate it out, though looking at it again and bothering to look up PyArray_Item_INCREF
I see that what you had made sense already.
Comments mostly about whether there is really a need for a fast path - objects are slow generally and it just adds to the complexity.
numpy/core/src/multiarray/refcount.c
Outdated
} | ||
else { | ||
for (i = 0; i < size; i++){ | ||
PyArray_Item_INCREF(data + i * inner_elsize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I finally understand what this is actually doing - it just checks whether any part is an object and then increases its refcount. It might not be bad to write a comment to state that, so that the next person looking at the code doesn't have to look up what PyArray_Item_INCREF
actually does - the name is not too helpful here.
Separately, the check for object is very early on; I don't think there is much to gain by making a fast path, especially since anything to do with objects is slow anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not 100% sure where you were going. Added some documentation in a new commit though.
numpy/core/src/multiarray/refcount.c
Outdated
size = descr->elsize / inner_elsize; | ||
|
||
if (descr->subarray->base->type_num == NPY_OBJECT) { | ||
/* Fast path for only-objects */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, not clear to me this is worth the extra complexity.
@@ -258,6 +308,10 @@ _fillobject(char *optr, PyObject *obj, PyArray_Descr *dtype) | |||
Py_XDECREF(arr); | |||
} | |||
} | |||
if (dtype->type_num == NPY_OBJECT) { | |||
Py_XINCREF(obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely obj
cannot be NULL
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg do you want to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattip, I don't mind changing this. I am just not sure how much we implicitly support this, since NULL is also always checked e.g. in the ufunc loops that you worked on with the exception chaining.
But, I am pretty ambivalent about it. If you/anyone thinks we should just move away from it, lets do that. I do not really see the point in it.
numpy/core/src/multiarray/refcount.c
Outdated
if (dtype->subarray->base->type_num == NPY_OBJECT) { | ||
/* fast path when subarray only includes objects */ | ||
for (i = 0; i < size; i++){ | ||
Py_XINCREF(obj); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, surely obj
cannot be NULL
(realize you probably copied from above, but it seems very odd).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right in principle. But IIRC quite a few, maybe even all, internal functions actually accept interpreting NULL
as None
for numpy arrays. In practice I do not think numpy itself will ever fill an array with NULL
though, so the question would be if some extensions might actually rely on being able to fill arrays with NULL
.
numpy/core/src/multiarray/refcount.c
Outdated
NPY_COPY_PYOBJECT_PTR(optr, &obj); | ||
optr += sizeof(obj); | ||
if (dtype->subarray->base->type_num == NPY_OBJECT) { | ||
/* fast path when subarray only includes objects */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm a bit less sure, but the slow path would seem fast enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I timed it in this one and it was quite a big difference. On the other hand, probably almost nobody actually uses these features. And as soon as you do anything else, the actual speed is probably so slow, that it simply does not matter.
So I am good with just removing the fast paths, all of this code is pretty much only used doing creation and removal anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me on removing this fast path.
LGMT. I looked through the other functions calling For the object fast-path, I'm happy leaving it in, though I don't think it's really needed. People using object arrays are used to them being slow already. |
numpy/core/src/multiarray/refcount.c
Outdated
size = descr->elsize / inner_elsize; | ||
|
||
if (descr->subarray->base->type_num == NPY_OBJECT) { | ||
/* Fast path for only-objects */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This path isn't used in PyDataType_HASFIELDS
- I don't see why we should optimize for it here either.
OK, will remove the fast-paths again then. @ahaldane, thanks for looking through it. I am actually only half sure what that memsetting is good for, but I guess it is probably also used for setting VOID types only partially or so. So I guess this will be roughly right, except I should make the actual copying over code nicer (or maybe I can find a DECREF'ing copy function from buffer to output array to replace the current one). |
PyArray_StridedUnaryOp *_strided_to_strided = PyArray_GetStridedCopyFn( | ||
0, dst_stride, sizeof(PyObject *), sizeof(PyObject *)); | ||
|
||
for(;;) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that
for(;;) {
if (x) {
y();
}
else {
z();
}
}
is clearer as
while (x()) {
y();
}
z();
Pushed a different approach, which gets a function that DECREF's the original data upon copy. Better than the first hack, I think, but slightly different and it might be that the first approach was slightly better even as such. Frankly, that code is still confusing me, I think I may have to draw it out to really understand all the nesting and branching. |
numpy/core/tests/test_dtype.py
Outdated
# Test assumes that 0, 1, and None are singletons. | ||
for func, obj in zip([np.empty, np.ones, np.zeros], [None, one, zero]): | ||
if obj is None: | ||
# None test is flaky: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to elaborate a bit on this comment: is it numpy or python that is flaky with None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooops, leftover for some earlier version, I think the test runs fine now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should call None also not None
here, I got a feeling it helped for the others. I was wondering if it might have to do with the actual internal code python compiles too...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't help. It all seems to have to do with trickier things. Replacing the assert with a print for example runs it fine for me.
Frankly, I am not quite sure if the other tests (one and zero) are reliable or not. They seem to be, but..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More constructively: it would seem fine to just work with objects where we know nothing else can be increasing counts; maybe you are fighting some pytest
internals...
The thing is that the function there is actually only used for
`np.empty` and `np.zeros`, so just having zeros is fine. However, maybe
I can just repeat add 100 of them to a list and allow for some
variation, we have done that in other places.
Although if these tests run fine on all python versions, I guess it is
well enough for now (and the None test is not important).
Actually, there is a test missing that I still need to add. One should
use `np.repeat`, which is one of two functions explicitly using
`PyArray_INCREF`.
Will probably get to it tomorrow. The interesting part is mostly if you
think the changes in `dtype_transfer.c` are good enough, or whether I
should unravel the logic a bit more.
It does feel a bit like it may be a band-aid solution, and I do not
like band-aid solutions. (PS: Sorry for mailing/putting it all here,
but my connection to github is seriously slow right now)
|
Oh man. I added tests, including for advanced indexing and even more bugs spawned... |
Definitely seems to be one of those "let me use this left-over hour to fix this, and o no it is a week later" kind of bugs... Maybe move the new tests to a new issue? Or is it also in this |
Well, I could probably split out the The stuff in advanced indexing likely cooks down to changes in |
OK, found that one hehe. Pretty obvious once you looked at the right spot. |
OK, made the tests much more extensive and fixed that new bug. It is still the second version of using a different copy function for the copy from the buffer. If I think about it, it doesn't matter much if the fix is there or later in the same code path I guess. The only thing that might in principle be nicer (without major looking into it) is probably to use some very slow path early on. |
if (swap && PyArray_DESCR(arr)->subarray != NULL) { | ||
if (PyDataType_REFCHK(PyArray_DESCR(arr)) || | ||
/* Cannot have objects without fields or subarrays */ | ||
(swap && PyArray_DESCR(arr)->subarray != NULL)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong - the below refers to descr->subarray
, which was previously guaranteed not to be null, but is no longer. Why is a reference check needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think perhaps you want (PyDataType_REFCHK(PyArray_DESCR(arr)) || swap) && PyArray_DESCR(arr)->subarray != NULL)
, which is better written as
// TODO: This doesn't seem safe for user-defined types. Is the optimization really worth it?
npy_bool skip_subarray_copy = !PyDataType_REFCHK(PyArray_DESCR(arr)) && !swap;
if (PyArray_DESCR(arr)->subarray != NULL && !skip_subarray_copy) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, moved the checks around. It is correct because of the early return in the previous if, but it reads strange like this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct in the sense of the inital bug. But you are right, that was wrong. Also the subarray can be of VOID type again, so then you, again, need copyswap.
Added both with comments, I think it should be right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think the way I wrote it makes more sense...
@@ -258,6 +307,10 @@ _fillobject(char *optr, PyObject *obj, PyArray_Descr *dtype) | |||
Py_XDECREF(arr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope, but - line 295 will set an exception if an integer larger than 2**64
is passed in, which will probably lead to a SystemError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, I hope that this is only ever used for 0
or None
, unfortunately it is public API.
numpy/core/tests/test_dtype.py
Outdated
singleton = object() | ||
one = 1 | ||
zero = 0 | ||
for dt, pat, count in self.iter_struct_object_dtypes(singleton): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to use pytest.mark.paramtrize
here, rather than putting the for loop in the test. you might want to add a fourth label
item to the tuple, so that you can do:
@pytest.mark.parametrize(['dt', 'pat', 'count', 'label'], iter_struct_object_dtypes(), id=lambda t: t[3])
def test_structured_object_create_delete(dt, pat, count, label):
# don't actually use label here, pytest will have already handled it.
numpy/core/tests/test_dtype.py
Outdated
if obj is None: | ||
# None seems flaky due to python behaviour, None's are | ||
# probably simply used in too many occasions. | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the above suggestion, this becomes pytest.skip("None is flaky")
numpy/core/tests/test_dtype.py
Outdated
|
||
shapes = [(3,), (3, 2), (3, 2)] | ||
indices = [([0, 2],), ([0, 2], slice(None)), ([0, 2], [1])] | ||
items_changed = [2, 4, 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably more helpful to parametrize again.
@the_parametrize_I_used_in_the_comment_above
@pytest.mark.parametrize(['shape', 'indices', 'items_changed'], [
((3,), ([0, 2],), 2),
((3, 2), ([0, 2], slice(None)), 4),
((3, 2), ([0, 2], [1]), 2)
])
def test_structured_object_indexing(shape, indices, items_changed, dt, pat, count, label):
numpy/core/tests/test_dtype.py
Outdated
after_zero = sys.getrefcount(zero) | ||
after_one = sys.getrefcount(one) | ||
assert before_zero - after_zero == count * changed | ||
assert after_one - before_one == count * changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be neat to introduce a helper object for this
class reference_counter:
def __init__(self, obj):
self._obj = obj
self._initial_count = sys.getrefcount(obj)
@property
def change(self):
return sys.getrefcount(self._obj) - self._initial_count
zero_counter = reference_counter(zero)
...
assert reference_counter.change == count * changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, possibly. Anyway, did most (not all) of the parameterization, didn't use it much yet. If you got more, can fix it up a bit more, but probably won't continue otherwise.
Maybe should add a test for the non-contiguous subarray thing at least. But it is getting out of scope to start fixing that as well and getting late today at least.
78b057c
to
4bfa42c
Compare
Added a test for the sparse field fix at least that also got in. Plus one known failure in gh-12686. |
sparse_dtype = np.dtype([('a', {'names':['ab'], 'formats':['f'], | ||
'offsets':[4]}, (2, 3))]) | ||
|
||
@pytest.mark.xfail(reason="inaccessible data is changed see gh-12686.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have to think about it. This test is good, but it spams valgrind, because uninitialized buffers are copied around. Maybe for now that is good. In the longer run – if we want to continue with such effort here – could add valgrind detection to the tests, which may be an option to skip it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add a @pytest.mark.confuses_valgrind
custom mark, which will run the test by default, but if run as valgrind python runtest.py -- -m "not confuses_valgrind"
will skip it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added pytest.mark.valgrind_error
(it is a true error, not just confusion) and rebased.
That has also been my experience with adding tests :) |
LGTM |
Especially object dtypes need reference counting, so cannot use memcpy. Subarrays that include fields also have data which should not be overwritten.
When not using the dtype transfer function, the buffers are initialized to NULL, and then just copied in full. Thus, at no point decref'ing happens on the data that is originally in the array.
Tests initialization, copying and repeating (incrementing) as well as advanced indexing (uses copyswap currently) and assignment
Assigning to subarray fields should not change the output in between the fields. This currently still happens for normal assignment, but not for advanced indexing. The second test is marked as `pytest.mark.valgrind_error` to make it clear that it is expected to show errors in valgrind, and give at least the theoretical the possibility to skip it during testing.
855a788
to
9963a9b
Compare
Can be squash-merged (makes backporting easier) once tests pass |
Thanks @seberg |
…2650) * BUG: Fix dtype object subarrays for INCREF/DECREF and zeros/empty init * BUG: VOID_copyswap cannot fast path nontrivial subarrays Especially object dtypes need reference counting, so cannot use memcpy. Subarrays that include fields also have data which should not be overwritten. * BUG: Copying subarrays with objects must decref original objects When not using the dtype transfer function, the buffers are initialized to NULL, and then just copied in full. Thus, at no point decref'ing happens on the data that is originally in the array. * TST: Test subarray field reference counting Tests initialization, copying and repeating (incrementing) as well as advanced indexing (uses copyswap currently) and assignment * TST: Test sparse subarray field assignments Assigning to subarray fields should not change the output in between the fields. This currently still happens for normal assignment, but not for advanced indexing. The second test is marked as `pytest.mark.valgrind_error` to make it clear that it is expected to show errors in valgrind, and give at least the theoretical the possibility to skip it during testing.
This fixes reference counting for subarrays containing objects. Note that the 3rd commit is currently a WIP and mostly a proof that fixing up assignment like this works to not lose references. It may well approximate the correct solution, but I am not 100% sure.
This PR contains changes moved from gh-12624