Skip to content

BUG: Fix reference counting for subarrays containing objects #12650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 15, 2019

Conversation

seberg
Copy link
Member

@seberg seberg commented Jan 2, 2019

This fixes reference counting for subarrays containing objects. Note that the 3rd commit is currently a WIP and mostly a proof that fixing up assignment like this works to not lose references. It may well approximate the correct solution, but I am not 100% sure.

This PR contains changes moved from gh-12624

*fromdata = d->fromdata;
char *bufferin = d->bufferin, *bufferout = d->bufferout;
PyArray_StridedUnaryOp *_strided_to_strided = PyArray_GetStridedCopyFn(
0, dst_stride, sizeof(PyObject *), sizeof(PyObject *));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual change/hack. Instead of just using memset to initialize the buffer, copy the data from the original array into the buffer, so that decref'ing can occur. I could imagine that we can go somewhere earlier to a slow direct cast loop, which calls the cast function for every single element instead of attempting to speed it up by doing it in chunks (frankly, I doubt that helps for objects that need casting anyway).

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - could to separate it out, though looking at it again and bothering to look up PyArray_Item_INCREF I see that what you had made sense already.

Comments mostly about whether there is really a need for a fast path - objects are slow generally and it just adds to the complexity.

}
else {
for (i = 0; i < size; i++){
PyArray_Item_INCREF(data + i * inner_elsize,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I finally understand what this is actually doing - it just checks whether any part is an object and then increases its refcount. It might not be bad to write a comment to state that, so that the next person looking at the code doesn't have to look up what PyArray_Item_INCREF actually does - the name is not too helpful here.

Separately, the check for object is very early on; I don't think there is much to gain by making a fast path, especially since anything to do with objects is slow anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not 100% sure where you were going. Added some documentation in a new commit though.

size = descr->elsize / inner_elsize;

if (descr->subarray->base->type_num == NPY_OBJECT) {
/* Fast path for only-objects */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, not clear to me this is worth the extra complexity.

@@ -258,6 +308,10 @@ _fillobject(char *optr, PyObject *obj, PyArray_Descr *dtype)
Py_XDECREF(arr);
}
}
if (dtype->type_num == NPY_OBJECT) {
Py_XINCREF(obj);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely obj cannot be NULL here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg do you want to change this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip, I don't mind changing this. I am just not sure how much we implicitly support this, since NULL is also always checked e.g. in the ufunc loops that you worked on with the exception chaining.
But, I am pretty ambivalent about it. If you/anyone thinks we should just move away from it, lets do that. I do not really see the point in it.

if (dtype->subarray->base->type_num == NPY_OBJECT) {
/* fast path when subarray only includes objects */
for (i = 0; i < size; i++){
Py_XINCREF(obj);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, surely obj cannot be NULL (realize you probably copied from above, but it seems very odd).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right in principle. But IIRC quite a few, maybe even all, internal functions actually accept interpreting NULL as None for numpy arrays. In practice I do not think numpy itself will ever fill an array with NULL though, so the question would be if some extensions might actually rely on being able to fill arrays with NULL.

NPY_COPY_PYOBJECT_PTR(optr, &obj);
optr += sizeof(obj);
if (dtype->subarray->base->type_num == NPY_OBJECT) {
/* fast path when subarray only includes objects */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm a bit less sure, but the slow path would seem fast enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I timed it in this one and it was quite a big difference. On the other hand, probably almost nobody actually uses these features. And as soon as you do anything else, the actual speed is probably so slow, that it simply does not matter.

So I am good with just removing the fast paths, all of this code is pretty much only used doing creation and removal anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me on removing this fast path.

@charris charris added this to the 1.16.1 release milestone Jan 2, 2019
@ahaldane
Copy link
Member

ahaldane commented Jan 3, 2019

LGMT.

I looked through the other functions calling memset in that file. They all seem to have an alternate 'refcnt' version already, though it's a little confusing to work through all the code-paths. Only the one you found seemed to be missing a refcnt version.

For the object fast-path, I'm happy leaving it in, though I don't think it's really needed. People using object arrays are used to them being slow already.

size = descr->elsize / inner_elsize;

if (descr->subarray->base->type_num == NPY_OBJECT) {
/* Fast path for only-objects */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path isn't used in PyDataType_HASFIELDS - I don't see why we should optimize for it here either.

@seberg
Copy link
Member Author

seberg commented Jan 3, 2019

OK, will remove the fast-paths again then.

@ahaldane, thanks for looking through it. I am actually only half sure what that memsetting is good for, but I guess it is probably also used for setting VOID types only partially or so.

So I guess this will be roughly right, except I should make the actual copying over code nicer (or maybe I can find a DECREF'ing copy function from buffer to output array to replace the current one).

PyArray_StridedUnaryOp *_strided_to_strided = PyArray_GetStridedCopyFn(
0, dst_stride, sizeof(PyObject *), sizeof(PyObject *));

for(;;) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that

for(;;) {
   if (x) {
       y();
   }
   else {
      z();
   }
}

is clearer as

while (x()) {
   y();
}
z();

@seberg
Copy link
Member Author

seberg commented Jan 4, 2019

Pushed a different approach, which gets a function that DECREF's the original data upon copy. Better than the first hack, I think, but slightly different and it might be that the first approach was slightly better even as such. Frankly, that code is still confusing me, I think I may have to draw it out to really understand all the nesting and branching.

# Test assumes that 0, 1, and None are singletons.
for func, obj in zip([np.empty, np.ones, np.zeros], [None, one, zero]):
if obj is None:
# None test is flaky:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to elaborate a bit on this comment: is it numpy or python that is flaky with None?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, leftover for some earlier version, I think the test runs fine now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should call None also not None here, I got a feeling it helped for the others. I was wondering if it might have to do with the actual internal code python compiles too...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't help. It all seems to have to do with trickier things. Replacing the assert with a print for example runs it fine for me.

Frankly, I am not quite sure if the other tests (one and zero) are reliable or not. They seem to be, but..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More constructively: it would seem fine to just work with objects where we know nothing else can be increasing counts; maybe you are fighting some pytest internals...

@seberg
Copy link
Member Author

seberg commented Jan 4, 2019 via email

@seberg
Copy link
Member Author

seberg commented Jan 6, 2019

Oh man. I added tests, including for advanced indexing and even more bugs spawned...

@mhvk
Copy link
Contributor

mhvk commented Jan 6, 2019

Definitely seems to be one of those "let me use this left-over hour to fix this, and o no it is a week later" kind of bugs... Maybe move the new tests to a new issue? Or is it also in this HAS_SUBARRAY part? (which seemed somewhat self-contained, hence the suggestion)

@seberg
Copy link
Member Author

seberg commented Jan 6, 2019

Well, I could probably split out the refcount.c stuff with more limited tests for now at least.

The stuff in advanced indexing likely cooks down to changes in dtype_transfer.c similar to the second part here.

@seberg
Copy link
Member Author

seberg commented Jan 6, 2019

OK, found that one hehe. Pretty obvious once you looked at the right spot. copyswap was just utterly broken :)

@seberg
Copy link
Member Author

seberg commented Jan 6, 2019

OK, made the tests much more extensive and fixed that new bug. It is still the second version of using a different copy function for the copy from the buffer.

If I think about it, it doesn't matter much if the fix is there or later in the same code path I guess. The only thing that might in principle be nicer (without major looking into it) is probably to use some very slow path early on.

if (swap && PyArray_DESCR(arr)->subarray != NULL) {
if (PyDataType_REFCHK(PyArray_DESCR(arr)) ||
/* Cannot have objects without fields or subarrays */
(swap && PyArray_DESCR(arr)->subarray != NULL)) {
Copy link
Member

@eric-wieser eric-wieser Jan 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong - the below refers to descr->subarray, which was previously guaranteed not to be null, but is no longer. Why is a reference check needed here?

Copy link
Member

@eric-wieser eric-wieser Jan 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think perhaps you want (PyDataType_REFCHK(PyArray_DESCR(arr)) || swap) && PyArray_DESCR(arr)->subarray != NULL), which is better written as

// TODO: This doesn't seem safe for user-defined types. Is the optimization really worth it?
npy_bool skip_subarray_copy = !PyDataType_REFCHK(PyArray_DESCR(arr)) && !swap;

if (PyArray_DESCR(arr)->subarray != NULL && !skip_subarray_copy) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, moved the checks around. It is correct because of the early return in the previous if, but it reads strange like this...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct in the sense of the inital bug. But you are right, that was wrong. Also the subarray can be of VOID type again, so then you, again, need copyswap.

Added both with comments, I think it should be right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the way I wrote it makes more sense...

@@ -258,6 +307,10 @@ _fillobject(char *optr, PyObject *obj, PyArray_Descr *dtype)
Py_XDECREF(arr);
Copy link
Member

@eric-wieser eric-wieser Jan 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope, but - line 295 will set an exception if an integer larger than 2**64 is passed in, which will probably lead to a SystemError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I hope that this is only ever used for 0 or None, unfortunately it is public API.

singleton = object()
one = 1
zero = 0
for dt, pat, count in self.iter_struct_object_dtypes(singleton):
Copy link
Member

@eric-wieser eric-wieser Jan 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to use pytest.mark.paramtrize here, rather than putting the for loop in the test. you might want to add a fourth label item to the tuple, so that you can do:

@pytest.mark.parametrize(['dt', 'pat', 'count', 'label'], iter_struct_object_dtypes(), id=lambda t: t[3])
def test_structured_object_create_delete(dt, pat, count, label):
    # don't actually use label here, pytest will have already handled it.

if obj is None:
# None seems flaky due to python behaviour, None's are
# probably simply used in too many occasions.
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above suggestion, this becomes pytest.skip("None is flaky")


shapes = [(3,), (3, 2), (3, 2)]
indices = [([0, 2],), ([0, 2], slice(None)), ([0, 2], [1])]
items_changed = [2, 4, 2]
Copy link
Member

@eric-wieser eric-wieser Jan 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably more helpful to parametrize again.

@the_parametrize_I_used_in_the_comment_above
@pytest.mark.parametrize(['shape', 'indices', 'items_changed'], [
    ((3,), ([0, 2],), 2),
    ((3, 2), ([0, 2], slice(None)), 4),
    ((3, 2), ([0, 2], [1]), 2)
])
def test_structured_object_indexing(shape, indices, items_changed, dt, pat, count, label):

after_zero = sys.getrefcount(zero)
after_one = sys.getrefcount(one)
assert before_zero - after_zero == count * changed
assert after_one - before_one == count * changed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be neat to introduce a helper object for this

class reference_counter:
    def __init__(self, obj):
        self._obj = obj
        self._initial_count = sys.getrefcount(obj)

    @property
    def change(self):
        return sys.getrefcount(self._obj) - self._initial_count


zero_counter = reference_counter(zero)
...
assert reference_counter.change == count * changed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, possibly. Anyway, did most (not all) of the parameterization, didn't use it much yet. If you got more, can fix it up a bit more, but probably won't continue otherwise.

Maybe should add a test for the non-contiguous subarray thing at least. But it is getting out of scope to start fixing that as well and getting late today at least.

@seberg seberg force-pushed the struct-object-refcnt branch from 78b057c to 4bfa42c Compare January 6, 2019 23:11
@seberg
Copy link
Member Author

seberg commented Jan 7, 2019

Added a test for the sparse field fix at least that also got in. Plus one known failure in gh-12686.

sparse_dtype = np.dtype([('a', {'names':['ab'], 'formats':['f'],
'offsets':[4]}, (2, 3))])

@pytest.mark.xfail(reason="inaccessible data is changed see gh-12686.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have to think about it. This test is good, but it spams valgrind, because uninitialized buffers are copied around. Maybe for now that is good. In the longer run – if we want to continue with such effort here – could add valgrind detection to the tests, which may be an option to skip it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add a @pytest.mark.confuses_valgrind custom mark, which will run the test by default, but if run as valgrind python runtest.py -- -m "not confuses_valgrind" will skip it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added pytest.mark.valgrind_error (it is a true error, not just confusion) and rebased.

@charris
Copy link
Member

charris commented Jan 10, 2019

Oh man. I added tests, including for advanced indexing and even more bugs spawned...

That has also been my experience with adding tests :)

@seberg seberg changed the title WIP,BUG: Fix reference counting for subarrays containing objects BUG: Fix reference counting for subarrays containing objects Jan 11, 2019
@seberg seberg requested review from a team and removed request for a team January 11, 2019 13:16
@mattip
Copy link
Member

mattip commented Jan 14, 2019

LGTM

Especially object dtypes need reference counting, so cannot use
memcpy. Subarrays that include fields also have data which should
not be overwritten.
When not using the dtype transfer function, the buffers are initialized
to NULL, and then just copied in full. Thus, at no point decref'ing
happens on the data that is originally in the array.
Tests initialization, copying and repeating (incrementing) as
well as advanced indexing (uses copyswap currently) and assignment
Assigning to subarray fields should not change the output in
between the fields. This currently still happens for normal
assignment, but not for advanced indexing.

The second test is marked as `pytest.mark.valgrind_error` to make
it clear that it is expected to show errors in valgrind, and give
at least the theoretical the possibility to skip it during testing.
@seberg seberg force-pushed the struct-object-refcnt branch from 855a788 to 9963a9b Compare January 14, 2019 19:55
@mattip
Copy link
Member

mattip commented Jan 14, 2019

Can be squash-merged (makes backporting easier) once tests pass

@mattip mattip merged commit 7885b29 into numpy:master Jan 15, 2019
@mattip
Copy link
Member

mattip commented Jan 15, 2019

Thanks @seberg

@seberg seberg deleted the struct-object-refcnt branch January 15, 2019 08:28
charris pushed a commit to charris/numpy that referenced this pull request Jan 17, 2019
…2650)

* BUG: Fix dtype object subarrays for INCREF/DECREF and zeros/empty init

* BUG: VOID_copyswap cannot fast path nontrivial subarrays

Especially object dtypes need reference counting, so cannot use
memcpy. Subarrays that include fields also have data which should
not be overwritten.

* BUG: Copying subarrays with objects must decref original objects

When not using the dtype transfer function, the buffers are initialized
to NULL, and then just copied in full. Thus, at no point decref'ing
happens on the data that is originally in the array.

* TST: Test subarray field reference counting

Tests initialization, copying and repeating (incrementing) as
well as advanced indexing (uses copyswap currently) and assignment

* TST: Test sparse subarray field assignments

Assigning to subarray fields should not change the output in
between the fields. This currently still happens for normal
assignment, but not for advanced indexing.

The second test is marked as `pytest.mark.valgrind_error` to make
it clear that it is expected to show errors in valgrind, and give
at least the theoretical the possibility to skip it during testing.
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jan 17, 2019
@charris charris removed this from the 1.16.1 release milestone Jan 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants