BUG,MAINT: Updateifcopy arrays should always be copied. #9249

charris · 2017-06-13T01:11:11Z

Before this, one would have

In [1]: a = np.ones(10)[::2]

In [2]: a.writeable
Out[2]: True

In [3]: b = np.array(a.flat, copy=False)

In [4]: a.writeable
Out[4]: False

because b held a reference to the updateifcopy array a.flat.array(),
keeping it from being deallocated and resetting a to writeable. That was
because the array constructor returned a.flat.array() directly
instead of making a copy. It seems that in normal usage it would be
better to return a copy in these cases and this commit makes that
change.

Closes #9244.

Before this, one would have In [1]: a = np.ones(10)[::2] In [2]: a.writeable Out[2]: True In [3]: b = np.array(a.flat, copy=False) In [4]: a.writeable Out[4]: False because b held a reference to the updateifcopy array a.flat.__array__(), keeping it from being deallocated and resetting a to writeable. That was because the array constructor returned a.flat.__array__() directly instead of making a copy. It seems that in normal usage it would be better to return a copy in these cases and this commit makes that change. Closes numpy#9244.

njsmith · 2017-06-13T10:03:16Z

So do I understand correctly that what this PR does is make it so that if the array passed into np.array has the UPDATEIFCOPY flag set, then np.array will always return a copy, regardless of whether copy=False is set?

If so, why is this better than making arr.flat.__array__() stop returning UPDATEIFCOPY arrays?

I guess my general concern is that yeah, UPDATEIFCOPY arrays are weird and kind of intrinsically broken because of the way they depend on details of the garbage collector. But nonetheless, the idea is that you can use arbitrary array manipulation operations on them and then have that eventually reflected back to the original array. One kind of array manipulation is taking a view and then writing through that view. If I understand this PR correctly, then that kind of manipulation will stop working (if it goes through np.array), which could break existing code. It also doesn't prevent many other ways that one could accidentally hold onto a reference to an UPDATEIFCOPY array, like slicing, accessing the .T attribute, or just sticking it into some container somewhere. So my worry would be that this might break existing code, but doesn't necessarily move us closer to a systematic solution to the underlying problem in exchange.

charris · 2017-06-13T15:20:10Z

If so, why is this better than making arr.flat.__array__() stop returning UPDATEIFCOPY arrays?

I don't think we can change that without breaking things. All this PR does is avoid the side affect of locking the array a when a reference to the result of np.asarray(a.flat) is held. Strictly speaking, that idiom should not be used, but instead np.array(a.flat), but that is an unexpected programming niggle users should not need to deal with. I think making a copy is pretty transparent in this case and should not break any existing code, even if the C-API is used to get the array. In the latter case one can, if intended, end up with an updateifcopy array pointing back to another updateifcopy array, but the resulting chain should still work fine and is likely rare in practice.

I note that arr.flat.__array__() itself may be a view, a copy, or an updateifcopy array depending on arr, but it is only the last that changes the flags on arr.

This PR is not intended as a big fix, merely a small fix that should be mostly unnoticed.

seberg · 2017-06-13T16:27:40Z

It seems hard to imagine someone actually using the updateifcopy here, but its true, the clean thing would be to somehow deprecate it first.

It might be interesting to check if this (sometimes unnecessary?) copy is triggered within the numpy test suit (outside of a possibly odd .flat test)?

charris · 2017-06-13T17:02:43Z

That said, not having this is probably not a big deal.

njsmith · 2017-06-13T17:12:07Z

I don't know, I could be wrong, but my intuition is that there could totally be code doing

a2 = np.asarray(a.flat)
a2[0] = ...

And in fact I'd expect this to be the most common way to write code that (inadvertently) relies on the UPDATEIFCOPY of
a.flat.__array__... am I missing something? It seems more likely than that people are calling a.flat.__array__() directly and then writing to the array it returns...

charris · 2017-06-13T17:40:56Z

I think the more common use would be

a.flat[0] = whatever

Where it is clear that a is being modified. I have no statistics on how folks think asarray(a.flat) operates, a.flat is documented to be an iterator, not an array. I suppose that can be a problem with any object that implements __array__. I certainly haven't programmed with that method in mind.

njsmith · 2017-06-13T21:03:32Z

a.flat[0] = whatever

Sure, but that doesn't go via .flat.__array__ (or if it currently does internally then that can easily be fixed; anyway, UPDATEIFCOPY is harmless in this case because the reference never leaves numpy's C internals). So that won't be affected by any of the ideas on the table here AFAICT.

eric-wieser · 2017-06-13T22:05:46Z

Is there a strong argument for UPDATEIFCOPY existing at all, other than backwards compatibility? Can't the user just be asked to make the copy manually (via a long deprecation cycle), or are there performance implications?

Perhaps something like this would also be useful:

a = np.arange(2)[::2]
with np.contiguous_view(a) as a_contig:
    a_contig[0] = 5

Implemented as

@contextlib.contextmanager
def contiguous_view(a, order='C'):
    if a.flags[order + '_CONTIGUOUS']:
        yield a
    else:
        b = np.array(a, order=order, subok=True, copy=True)
        try:
            yield b
        finally:
            a[...] = b

which does the update-if-copy behaviour, but using a predictable context manager. a_contig would be a plain numpy array, special only in that a reference is held to it by the context manager

njsmith · 2017-06-14T00:16:31Z

In general UPDATEIFCOPY gets used in a number of places internally, and not just when you need a contiguous "view" of a non-contiguous array. For example, it can be used to work around dtype mismatches between a ufunc loop and an out argument. But, in all the cases where it makes sense, it could be replaced by something like your context manager, yeah. Or the C equivalent. We have an issue for that somewhere (can't really search on my phone), because we need to switch to the explicit cleanup version anyway if we want to work correctly on PyPy.

charris · 2017-06-14T01:15:28Z

There are seven places where updateifcopy arrays are created:

numpy/core/src/umath/reduction.c:190:        if (PyArray_SetUpdateIfCopyBase(ret_copy, (PyArrayObject *)ret) < 0) {
numpy/core/src/multiarray/iterators.c:1111:            if (PyArray_SetUpdateIfCopyBase(ret, it->ao) < 0) {
numpy/core/src/multiarray/multiarraymodule.c:810:            if (PyArray_SetUpdateIfCopyBase(out_buf, out) < 0) {
numpy/core/src/multiarray/cblasfuncs.c:429:            if (PyArray_SetUpdateIfCopyBase(out_buf, out) < 0) {
numpy/core/src/multiarray/nditer_constr.c:2926:                if (PyArray_SetUpdateIfCopyBase(temp, op[iop]) < 0) {
numpy/core/src/multiarray/ctors.c:2010:            if (PyArray_SetUpdateIfCopyBase(ret, arr) < 0) {
numpy/core/src/multiarray/mapping.c:3208:        if (PyArray_SetUpdateIfCopyBase(a_copy, a) < 0) {

PyArray_SetUpdateIfCopyBase is defined here

numpy/core/src/multiarray/arrayobject.c:86:PyArray_SetUpdateIfCopyBase(PyArrayObject *arr, PyArrayObject *base)

charris · 2017-06-14T02:14:12Z

Just for giggles, I returned an ordinary copy instead of an updateifcopy array from the flat iterator, and all tests pass except the one that checks if it is updateifcopy 8-)

charris · 2017-06-14T02:45:09Z

I'm going to suggest that we change a.flat.__array__() to always return a flattened copy of the array, e.g., act just like a.flatten(). Starting with a future warning, of course.

charris · 2017-06-14T02:57:27Z

I note that the current a.flat.__array__() also ignores the two optional positional arguments dtype and context. We should perhaps look to eliminate the __array__ method altogether in the future, I think __array_ufunc__ will handle most of the cases for which it might be useful.

njsmith · 2017-06-14T23:16:04Z

I'm going to suggest that we change a.flat.__array__() to always return a flattened copy of the array, e.g., act just like a.flatten(). Starting with a future warning, of course.

👍

look to eliminate the __array__ method altogether in the future

Not sure about this... it seems like it's useful to have a standard way to explicitly coerce to an ndarray when you want to? Like in dask you can do np.array(dask_array) to say "ok do the computation now and give me an actual in-memory array" (I think). And in some sense __array_ufunc__ makes this more useful, because it lets you avoid implicitly getting coerced to ndarray when you don't want to be.

We might also want to consider deprecating .flat (though I doubt we'll ever be able to get rid of it entirely).

charris · 2017-06-16T13:45:04Z

Closing this and creating an issue instead.

charris added 00 - Bug component: numpy._core labels Jun 13, 2017

charris mentioned this pull request Jun 13, 2017

BUG?: np.array(a, copy=False) should check UPDATEIFCOPY flag. #9244

Open

charris closed this Jun 16, 2017

charris mentioned this pull request Jun 16, 2017

ENH: Replace updateifcopy arrays. #9257

Closed

njsmith mentioned this pull request Jul 19, 2017

UPDATEIFCOPY breaks on python interpreters with non-refcounted GC #7054

Closed

This was referenced Jul 21, 2017

ENH: Refactor updateifcopy #9269

Closed

API: Make a.flat.__array__ return a copy when a non-contiguous. #9447

Merged

charris deleted the always-copy-updateifcopy-arrays branch September 19, 2018 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG,MAINT: Updateifcopy arrays should always be copied. #9249

BUG,MAINT: Updateifcopy arrays should always be copied. #9249

charris commented Jun 13, 2017

njsmith commented Jun 13, 2017

charris commented Jun 13, 2017 •

edited

Loading

seberg commented Jun 13, 2017

charris commented Jun 13, 2017

njsmith commented Jun 13, 2017

charris commented Jun 13, 2017

njsmith commented Jun 13, 2017

eric-wieser commented Jun 13, 2017 •

edited

Loading

njsmith commented Jun 14, 2017

charris commented Jun 14, 2017 •

edited

Loading

charris commented Jun 14, 2017

charris commented Jun 14, 2017

charris commented Jun 14, 2017

njsmith commented Jun 14, 2017

charris commented Jun 16, 2017

BUG,MAINT: Updateifcopy arrays should always be copied. #9249

BUG,MAINT: Updateifcopy arrays should always be copied. #9249

Conversation

charris commented Jun 13, 2017

njsmith commented Jun 13, 2017

charris commented Jun 13, 2017 • edited Loading

seberg commented Jun 13, 2017

charris commented Jun 13, 2017

njsmith commented Jun 13, 2017

charris commented Jun 13, 2017

njsmith commented Jun 13, 2017

eric-wieser commented Jun 13, 2017 • edited Loading

njsmith commented Jun 14, 2017

charris commented Jun 14, 2017 • edited Loading

charris commented Jun 14, 2017

charris commented Jun 14, 2017

charris commented Jun 14, 2017

njsmith commented Jun 14, 2017

charris commented Jun 16, 2017

charris commented Jun 13, 2017 •

edited

Loading

eric-wieser commented Jun 13, 2017 •

edited

Loading

charris commented Jun 14, 2017 •

edited

Loading