Skip to content

BUG,MAINT: Updateifcopy arrays should always be copied. #9249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

charris
Copy link
Member

@charris charris commented Jun 13, 2017

Before this, one would have

In [1]: a = np.ones(10)[::2]

In [2]: a.writeable
Out[2]: True

In [3]: b = np.array(a.flat, copy=False)

In [4]: a.writeable
Out[4]: False

because b held a reference to the updateifcopy array a.flat.array(),
keeping it from being deallocated and resetting a to writeable. That was
because the array constructor returned a.flat.array() directly
instead of making a copy. It seems that in normal usage it would be
better to return a copy in these cases and this commit makes that
change.

Closes #9244.

Before this, one would have

    In [1]: a = np.ones(10)[::2]

    In [2]: a.writeable
    Out[2]: True

    In [3]: b = np.array(a.flat, copy=False)

    In [4]: a.writeable
    Out[4]: False

because b held a reference to the updateifcopy array a.flat.__array__(),
keeping it from being deallocated and resetting a to writeable. That was
because the array constructor returned a.flat.__array__() directly
instead of making a copy. It seems that in normal usage it would be
better to return a copy in these cases and this commit makes that
change.

Closes numpy#9244.
@njsmith
Copy link
Member

njsmith commented Jun 13, 2017

So do I understand correctly that what this PR does is make it so that if the array passed into np.array has the UPDATEIFCOPY flag set, then np.array will always return a copy, regardless of whether copy=False is set?

If so, why is this better than making arr.flat.__array__() stop returning UPDATEIFCOPY arrays?

I guess my general concern is that yeah, UPDATEIFCOPY arrays are weird and kind of intrinsically broken because of the way they depend on details of the garbage collector. But nonetheless, the idea is that you can use arbitrary array manipulation operations on them and then have that eventually reflected back to the original array. One kind of array manipulation is taking a view and then writing through that view. If I understand this PR correctly, then that kind of manipulation will stop working (if it goes through np.array), which could break existing code. It also doesn't prevent many other ways that one could accidentally hold onto a reference to an UPDATEIFCOPY array, like slicing, accessing the .T attribute, or just sticking it into some container somewhere. So my worry would be that this might break existing code, but doesn't necessarily move us closer to a systematic solution to the underlying problem in exchange.

@charris
Copy link
Member Author

charris commented Jun 13, 2017

If so, why is this better than making arr.flat.__array__() stop returning UPDATEIFCOPY arrays?

I don't think we can change that without breaking things. All this PR does is avoid the side affect of locking the array a when a reference to the result of np.asarray(a.flat) is held. Strictly speaking, that idiom should not be used, but instead np.array(a.flat), but that is an unexpected programming niggle users should not need to deal with. I think making a copy is pretty transparent in this case and should not break any existing code, even if the C-API is used to get the array. In the latter case one can, if intended, end up with an updateifcopy array pointing back to another updateifcopy array, but the resulting chain should still work fine and is likely rare in practice.

I note that arr.flat.__array__() itself may be a view, a copy, or an updateifcopy array depending on arr, but it is only the last that changes the flags on arr.

This PR is not intended as a big fix, merely a small fix that should be mostly unnoticed.

@seberg
Copy link
Member

seberg commented Jun 13, 2017

It seems hard to imagine someone actually using the updateifcopy here, but its true, the clean thing would be to somehow deprecate it first.

It might be interesting to check if this (sometimes unnecessary?) copy is triggered within the numpy test suit (outside of a possibly odd .flat test)?

@charris
Copy link
Member Author

charris commented Jun 13, 2017

That said, not having this is probably not a big deal.

@njsmith
Copy link
Member

njsmith commented Jun 13, 2017

I don't know, I could be wrong, but my intuition is that there could totally be code doing

a2 = np.asarray(a.flat)
a2[0] = ...

And in fact I'd expect this to be the most common way to write code that (inadvertently) relies on the UPDATEIFCOPY of
a.flat.__array__... am I missing something? It seems more likely than that people are calling a.flat.__array__() directly and then writing to the array it returns...

@charris
Copy link
Member Author

charris commented Jun 13, 2017

I think the more common use would be

a.flat[0] = whatever

Where it is clear that a is being modified. I have no statistics on how folks think asarray(a.flat) operates, a.flat is documented to be an iterator, not an array. I suppose that can be a problem with any object that implements __array__. I certainly haven't programmed with that method in mind.

@njsmith
Copy link
Member

njsmith commented Jun 13, 2017

a.flat[0] = whatever

Sure, but that doesn't go via .flat.__array__ (or if it currently does internally then that can easily be fixed; anyway, UPDATEIFCOPY is harmless in this case because the reference never leaves numpy's C internals). So that won't be affected by any of the ideas on the table here AFAICT.

@eric-wieser
Copy link
Member

eric-wieser commented Jun 13, 2017

Is there a strong argument for UPDATEIFCOPY existing at all, other than backwards compatibility? Can't the user just be asked to make the copy manually (via a long deprecation cycle), or are there performance implications?

Perhaps something like this would also be useful:

a = np.arange(2)[::2]
with np.contiguous_view(a) as a_contig:
    a_contig[0] = 5

Implemented as

@contextlib.contextmanager
def contiguous_view(a, order='C'):
    if a.flags[order + '_CONTIGUOUS']:
        yield a
    else:
        b = np.array(a, order=order, subok=True, copy=True)
        try:
            yield b
        finally:
            a[...] = b

which does the update-if-copy behaviour, but using a predictable context manager. a_contig would be a plain numpy array, special only in that a reference is held to it by the context manager

@njsmith
Copy link
Member

njsmith commented Jun 14, 2017

In general UPDATEIFCOPY gets used in a number of places internally, and not just when you need a contiguous "view" of a non-contiguous array. For example, it can be used to work around dtype mismatches between a ufunc loop and an out argument. But, in all the cases where it makes sense, it could be replaced by something like your context manager, yeah. Or the C equivalent. We have an issue for that somewhere (can't really search on my phone), because we need to switch to the explicit cleanup version anyway if we want to work correctly on PyPy.

@charris
Copy link
Member Author

charris commented Jun 14, 2017

There are seven places where updateifcopy arrays are created:

numpy/core/src/umath/reduction.c:190:        if (PyArray_SetUpdateIfCopyBase(ret_copy, (PyArrayObject *)ret) < 0) {
numpy/core/src/multiarray/iterators.c:1111:            if (PyArray_SetUpdateIfCopyBase(ret, it->ao) < 0) {
numpy/core/src/multiarray/multiarraymodule.c:810:            if (PyArray_SetUpdateIfCopyBase(out_buf, out) < 0) {
numpy/core/src/multiarray/cblasfuncs.c:429:            if (PyArray_SetUpdateIfCopyBase(out_buf, out) < 0) {
numpy/core/src/multiarray/nditer_constr.c:2926:                if (PyArray_SetUpdateIfCopyBase(temp, op[iop]) < 0) {
numpy/core/src/multiarray/ctors.c:2010:            if (PyArray_SetUpdateIfCopyBase(ret, arr) < 0) {
numpy/core/src/multiarray/mapping.c:3208:        if (PyArray_SetUpdateIfCopyBase(a_copy, a) < 0) {

PyArray_SetUpdateIfCopyBase is defined here

numpy/core/src/multiarray/arrayobject.c:86:PyArray_SetUpdateIfCopyBase(PyArrayObject *arr, PyArrayObject *base)

@charris
Copy link
Member Author

charris commented Jun 14, 2017

Just for giggles, I returned an ordinary copy instead of an updateifcopy array from the flat iterator, and all tests pass except the one that checks if it is updateifcopy 8-)

@charris
Copy link
Member Author

charris commented Jun 14, 2017

I'm going to suggest that we change a.flat.__array__() to always return a flattened copy of the array, e.g., act just like a.flatten(). Starting with a future warning, of course.

@charris
Copy link
Member Author

charris commented Jun 14, 2017

I note that the current a.flat.__array__() also ignores the two optional positional arguments dtype and context. We should perhaps look to eliminate the __array__ method altogether in the future, I think __array_ufunc__ will handle most of the cases for which it might be useful.

@njsmith
Copy link
Member

njsmith commented Jun 14, 2017

I'm going to suggest that we change a.flat.__array__() to always return a flattened copy of the array, e.g., act just like a.flatten(). Starting with a future warning, of course.

👍

look to eliminate the __array__ method altogether in the future

Not sure about this... it seems like it's useful to have a standard way to explicitly coerce to an ndarray when you want to? Like in dask you can do np.array(dask_array) to say "ok do the computation now and give me an actual in-memory array" (I think). And in some sense __array_ufunc__ makes this more useful, because it lets you avoid implicitly getting coerced to ndarray when you don't want to be.

We might also want to consider deprecating .flat (though I doubt we'll ever be able to get rid of it entirely).

@charris
Copy link
Member Author

charris commented Jun 16, 2017

Closing this and creating an issue instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants