ENH: Refactor updateifcopy #9269

mattip · 2017-06-19T18:11:36Z

Fixes issue #7054 by refactoring the resolution of UPDATEIFCOPY into a new private API function PyArray_ResolveUpdateIfCopy. Add a call to the function wherever a temporary ndarray with that flag is created, and added tests for calls that exercise the new function.

In the second commit I made nditer into a context manager, where PyArray_ResolveUpdateIfCopy is called on __exit__(), and updated the nditer tests to use the context manager wherever the updateifcopy flag is used.

The third commit 4effdfb makes sure clip is called with a mode keyword, which seems to have been forgotten.

The last commit f012769 explicitly uses the __exit__ method to resolve updateifcopy for iterators created with nested_iters, how can I make this look nicer?

…ests

… way

charris · 2017-06-19T18:52:01Z

Oh, dangerous ;) I'm worried about downstream projects using updateifcopy arrays. Do we want to deprecate that?

mattip · 2017-06-19T18:58:08Z

it is all very backward-compatible for CPython. PyArray_ResolveUpdateIfCopy can be called multiple times since it only does the copy dance if needed and then resets flags so subsequent calls will be ignored. For backward compatibility, I left a final call to PyArray_ResolveUpdateIfCopy in the tp_dealloc function. On PyPy however, the tp_dealloc function is not reliably called at a certain point in the program, so maybe that is the "dangerous" part?

njsmith · 2017-06-21T06:14:05Z

Is the issue with nditer that on each iteration we can call PyArray_ResolveUpdateIfCopy on the array used in the previous iteration, but that doesn't work on the last iteration, so we need an __exit__ method to call it one more time at the end?

I guess the reason this feels dangerous is because it changes the behavior of the existing flag, rather than making a new flag and using a deprecation cycle to migrate to that. Whether it's actually dangerous is less clear. I guess those are the two options to consider though, so let's think it through.

I'm assuming that there are downstream projects out there using UPDATEIFCOPY and relying on the GC, and we would like them to eventually stop doing that and stop supporting it ourselves.

Option 1:

Add an optional, explicit resolve-this-UPDATEIFCOPY function (like in this PR)
Initially, keep the automatic resolve call in tp_dealloc, but deprecate it
Eventually, drop the call from tp_dealloc

Option 2:

Add a new UPDATEIFCOPY_FIXED flag (suggestions for a better name welcome :-))
Deprecate UPDATEIFCOPY in favor of the new flag
Eventually, drop UPDATEIFCOPY

So my concern about option 1 is: how should we do the deprecation? The problem is that you can't raise an error from tp_dealloc (this is a python limitation, exceptions inside the garbage collector just get printed to the console and discarded). But in the "option 1" approach we can't tell the difference between a correct, updated user and a broken, legacy user until tp_dealloc. So I guess the final step in this approach would be to just stop resolving the UPDATEIFCOPY, which could cause existing code to give wrong answer, so we'd need to go down the FutureWarning route?

OTOH with option 2, we can detect up front when they create the array whether they're using UPDATEIFCOPY or UPDATEIFCOPY_FIXED, so we can issue a DeprecationWarning warning then, and eventually raise an error on UPDATEIFCOPY. So I'm leaning towards option 2. Does that logic magic sense? Is there anything I'm missing?

The other migration we need to think about is for nditer as a context manager. My suggestion is that #if we're on pypy, we immediately make it mandatory to use the context manager version (i.e., if the user starts iterating before __enter__ is called then error out), because (a) no-one's using numpy on PyPy yet so breaking compatibility there isn't a huge deal, and (b) on PyPy you're actually risking getting wrong answers if you don't do this, so it's pretty important. And on CPython, we can't make this change immediately, but we can move in this direction by issuing a DeprecationWarning when we detect iteration-without-__enter__, and eventually make it an error, so that CPython and PyPy match again.

Question: should nditer enforce the context manager thing in all cases, or only in cases where it detects that UPDATEIFCOPY{,_FIXED} is actually needed? The argument for the latter is that it's less disruptive; the argument for the former is that it's hard to know whether UPDATEIFCOPY will be needed (could depend on obscure things like the strides of input arrays), so it's better to enforce it uniformly from the start instead of leaving an unexploded landmine to be discovered in production. I don't feel like I have a good intuition here; I'm really not very familiar with the nditer code or how it's used.

mattip · 2017-06-21T19:12:05Z

@njs Thanks for the reply, you gave this alot of thought. I would prefer option 2
Should I make this more public via the numpy-discussion mailing list?

I would prefer merging this pull request (with the addition of an nditer warning as below for PyPy only), since it is totally backward compatible and does solve issues with PyPy and the functions in TestUpdateIfCopy.
Then I will engineer the new flag (which I would call something like USE_LOCALDATA or USE_WORKINGDATA) into a new pull request that will require more debate and consideration. In this second request the nditer warning-for-updateifcopy will be turned on for all implementations.

I am +1 for warning on creating an nditer with the updateifcopy flag without a context manager. Unfortunately I do not know how to do this in the case of np.nested_iters() which simply returns a tuple, see this test. It would seem I could use an
ExitStack on python3.3 and above, but on python 2 that would entail a new dependency on contextlib2. Until I have a solution for nested_iters, the new warning will always be displayed :(

seberg · 2017-06-21T19:27:11Z

Just to throw it in, are you sure we can't deprecate the usage in nditer as well? For all I know there might be only a couple of uses for that flag in nditer and possible not a single one in real live.... NpyIter is a pretty unused beast mostly.

mattip · 2017-06-29T02:46:27Z

It turns out the ndarray under the iterator operand is created (in nditer_constr.c) with the UPDATEIFCOPY flag set
whenever the operand is writeable and:

casting is required (NPY_OP_ITFLAGS_CAST) and the (NPY_ITER_COPY or NPY_ITER_UPDATEIFCOPY) flag is used OR
the NPY_OP_ITFLAG_FORCECOPY flag is used

The ndarray can also use the UPDATEIFCOPY mechanism if the operand is readwrite or writeonly and the original data is not a straightforward ndarray (in nditer_pywrap.c).

So it is not as simple as removing the updateifcopy flag in the call to nditer(), other combinations of arguments and flags can cause the UDPDATEIFCOPY mechanism to trigger, requiring a context manager on PyPy. My latest patch provides a mechanism to detect and raise a ValueError if an iterator with an UDATEIFCOPY ndarray is used without a context manager. Currently the exception triggers on PyPy only, should I emit a warning on CPython?

mattip · 2017-07-12T21:10:32Z

My mail to numpy-discussion got no response. How should I proceed to get this merged? Is there a POC for nditer ?

seberg · 2017-07-16T14:11:53Z

numpy/core/include/numpy/ndarraytypes.h

@@ -1027,6 +1027,8 @@ typedef void (NpyIter_GetMultiIndexFunc)(NpyIter *iter,
 * eliminate overlap.
 */
 #define NPY_ITER_COPY_IF_OVERLAP            0x00002000
+/* At least one operand uses UPDATEIFCOPY semantics, use context manager */
+#define NPY_ITER_USE_CONTEXT_MANAGER        0x00004000


Some questions/comments:

Just so you know the nditer/NpyIter is mostly a C-Api beast, the python side is not used by far as much.

Is the flag really necessary? I guess you would have to check every single operand array, but that does not seem too bad? The flag may not be bad either though, so not sure what I think yet.

Why is there an explicit ResolveUpdateIfCopy in the nditer pywrap code? It seems to me that on the C-Side this should be triggered by the NpyIter_Deallocate call, so it would be more obvious to me if you would just call that. (Hopefully guarding all the other methods against the iterator being NULL or so is not too bad, but it should be done anyway).

Regarding above point: Right now it seems to me like C-Code using NpyIter is broken in PyPy, because the resolve code is never called.

In cpython, if I write for i in np.nditer(...): I know the nditer will be deallocated by the time the for loop ends, I could imagine trying to give a warning when this might not work, and we know it from the reference count, since it should be higher if there is any chance of the object living longer. (I don't care whether this warning is a )

I suppose PyPy has no way of knowing if a temporary iterator object only exists within the scope of the single for loop? (in which case you could do the same thing as above)

The warning in cpython would be produced on (the first) call to next. If there was any way to detect that the code looks like for i in np.nditer(...) (and that iteration is finished/aborted) in PyPy that would be cool since it would make the context manager normally unnecessary, but I guess there is not.

A short mention in the documentation would be nice of course.

Could also think about doing an automatic copy-back when the iteration end is reached. But since this is a rather low-level api, maybe better not.

Arrg, I guess it might not possible to guess for sure if the iterator is limited to a for-block from the code (though it would be good to be sure/check). Also C-Api usage/cython might make a refcount check impossible I guess.

@seberg "Why is there an explicit ResolveUpdateIfCopy in the nditer pywrap code? It seems to me that on the C-Side this should be triggered by the NpyIter_Deallocate call, ..."

Do you mean the call on line 2397 here in npyiter_exit?

Well, but does it call array_dealloc or does it just reduce the reference count of those arrays? Because it should do the Resolve step specifically.

Yes, it would destroy the state of the iterator, but first, on the C-side, that is how you would achieve copy-back I think and second I don't see much use entering the same nditer object twice.

HHmmmpf, I guess I have no clue how exactly pypy works anyway, and yeah for the python side and updateifcopy things get strange anyway if someone looks at the operand arrays I guess. Will have to think about it some more (and actually figure out the exact logic).

So, even on PyPy, if we stay fully within C, NpyIter_Deallocs Py_DECREFs will actually trigger array_dealloc immediatly? I just glanced over http://docs.cython.org/en/latest/src/userguide/pypy.html and I don't quite see that this happens? Only the C-Api "view" should be cleaned up, or does it know that there can't be a Python side object (because there was no access or because it never left the C-api)?

If for some reason this works even with python code in between, we could possibly check for access to the operand arrays as well? But probably not worth the trouble, right now I am mostly worried about the C-Api usage being correct for sure.

I will add "updating cython/docs/src/userguide/pypy.rst" to my TODO list. While not wrong, that page is slightly out of date and it should mention that only certain C-API functions will actually create a python-level object linked to the C one. Looking at the implementation of Py_DECREF one can see that if the refcount is 0 (which means no python-level object is created, otherwise the refcount would be >= REFCNT_FROM_PYPY) then obj.tp_deallocis called immediately.

I did try to naively add a call to PyArray_ResolveUpdateIfCopy inside NpyIter_Deallocate just before the call to Py_XDECREF(*object), but it causes a crash in the test_nditer.py tests. It seems there are cases where the operand is not solely owned by the iterator, i.e. in any of the places NpyIter_GetOperandArray, NpyIter_Copy or others are called.

As far as nditer is concerned, the context manager completely resolves any needed PyArray_ResolveUpdateIfCopy call without relying on refcount semantics or modifying NpyIter_Deallocate

Now that I issue a warning when PyArray_ResolveUpdateIfCopy is called during deallocation, it turns out I was misreading the actual place needing conversion to a context manager. Since I did not convert all the nditer() calls to context managers, and there were places in tests that reuse the same variable name, triggering the deallocation, I actually needed to convert the previous test to a context manager.

Again, once I reworked the python-level tests, no warnings of a call to PyArray_ResolveUpdateIfCopy are emitted during nditer tests. However, there are a few places it seems PyArray_ResolveUpdateIfCopy is still needed, see my comment below

mattip · 2017-07-16T19:30:46Z

added documentation, including doctest - how do I know if the doctest passes?

seberg · 2017-07-16T20:08:43Z

We don't actually use doctests, so don't worry about it. Some other C-style stuff in there too (and probably more), but I would like to figure out that the context manager this way is right.

njsmith · 2017-07-18T02:33:36Z

I feel like either we're in the bad old mode of relying on the GC, in which case it's fine to wait for the array itself to be deallocated, it's worked so far and no need to mess with it, or else we're in the new good mode that does the explicit copy back, in which case there's again no reason to mess with functions named *_Deallocate.

mattip · 2017-07-19T20:11:43Z

@njsmith I agree. If the solution of a context manager for python-level nditer is acceptable, I will continue to hone this pull request to remove dependency on refcount semantics. So far the pull request::

can raise a DeprecationWarning in situations where UPDATEIFCOPY resolution occurs during ndarray deallocation, it remains for the NumPy community to decide if and when this is appropriate
is 100% backward compatible on CPython (I should leave some old tests around that catch the Warning to prove this)
will turn that warning into an Exception on PyPy
requires downstream users to re-factor any python-level nditers into context managers to guarantee correct behaviour on PyPy

It turns out that by adding the warning, I found a few more places that emit it. I am now trying to re-factor those to use the new PyArray_ResolveUpdateIfCopy function. For instance, this test emits the warning, it seems to stem from the x += x.T in-place operation::

def test_inplace_op_simple_manual():
    rng = np.random.RandomState(1234)
    x = rng.rand(50, 50)
    x += x.T

Once I can pass the test suite without raising the Warning I will update the pull request

mattip · 2017-07-19T22:37:15Z

darn. I ironed out all the Python2 warnings by carefully looking at ufuncs and such that create nditers in C, but moving to Python3 it turns out x.flat.__array__, (iter_array in C) can use UPDATEIFCOPY if x is not contiguous. Why does this only occur on Python3? The resulting ndarray is exposed to python-level. I'm not sure turning code like

a = np.arange(9).reshape(3,3)
b = a.T.flat
c = b.__array__()
# triggers the UPDATEIFCOPY resolution, assuming refcount semantics
del c

into code like this will be acceptable ...

a = np.arange(9).reshape(3,3)
b = a.T.flat
with b.__array__() as c:
   # do something with c
    print(c.flags)
del c

Although on second thought the fact that changing c will only affect b and a when c is deleted is not really clear, and explicit trumps implicit?

njsmith · 2017-07-20T00:01:03Z

moving to Python3 it turns out x.flat.__array__, (iter_array in C) can use UPDATEIFCOPY if x is not contiguous. Why does this only occur on Python3?

Not sure what you're seeing, but I see it on python 2 as well:

Python 2.7.13 (default, Dec 18 2016, 20:19:42) [GCC 6.2.1 20161215] on linux2
>>> import numpy as np
>>> np.arange(10)[::2].flat.__array__().flags["UPDATEIFCOPY"]
True

njsmith · 2017-07-20T00:14:00Z

Do I understand correctly from #9269 (comment) that you (a) want to add a new flag that acts like UPDATEIFCOPY except for requiring explicit resolution, but (b) you want to start by adding explicit resolution to the existing UPDATEIFCOPY and then add the new flag in a future PR?

If that's the plan, then I don't think we can start issuing DeprecationWarnings until after the new flag has landed, because the point of a DeprecationWarning is to nag downstream users to fix their code, and if the fix is for them to switch to the new flag, then we need the new flag before they can fix things :-)

mattip · 2017-07-20T17:27:44Z

@njs Somewhere along the line I decided to deprecate the idea of a new flag, since in most cases such arrays are created with a direct call to PyArray_SetUpdateIfCopyBase. So now I am proposing to issue a DeprecationWarning whenever UDATEIFCOPY resolution occurs at ndarray deallocation already in this pull request. The way to mitigate the warning is::

use a nditer inside a context manager in python code
use a flat.array() inside a context manager in python code
make sure C functions that create a ndarray with the UPDATEIFCOPY flag do not leak outside that function, and call PyArray_ResolveUpdateIfCopy at the end of the function

I am willing to consider a fallback position where all the above is still implemented (and required on PyPy), but the actual warning is only emitted if a environment/compliation flag is turned on, for full backward compatibility

njsmith · 2017-07-21T03:19:31Z

@mattip: UPDATEIFCOPY is also part of numpy's public C API -- there's probably third-party code that's calling PyArray_FromArray with UPDATEIFCOPY in the flags, or calling PyArray_SetUpdateIfCopyBase directly. For example, scipy uses it. That's the code I'm worried about -- if we don't rename the flag, and they don't notice our deprecation warnings, and then we stop calling ResolveUpdateIfcopy from __del__, then their code will start silently giving the wrong results.

Our rule is that we have to use noisy FutureWarnings if we're going to make code start silently doing something different, and we try to avoid it in general. So AFAICT we have 3 options:

Keep the same flag, issue a DeprecationWarning from __del__ if the UPDATEIFCOPY hasn't been resolved, but never ever remove the resolution call from __del__. Libraries that miss the deprecation warning will keep working on cpython but silently start giving wrong answers on PyPy, in perpetuity. I guess we could soften this a bit by eventually issuing some noisy warning like a RuntimeWarning from __del__ if it detects an unresolved UPDATEIFCOPY; maybe that would be enough to convince libraries to clean up their act.
Migrate to a new flag. On CPython, issue a DeprecationWarning for those using the old flag, and eventually make attempting to use the old flag an error. On PyPy, make attempting to use the old flag an error immediately.
Keep the same flag, issue a noisy FutureWarning from __del__ if the UPDATEIFCOPY hasn't been resolved, eventually risk breaking code by removing the resolution call from __del__. (Generally not preferred.)

…FCOPY

mattip · 2017-07-21T09:18:01Z

OK, after reconsideration I propose we only lay the infrastructure in this pull request, interally refactor NumPy code where needed, and recommend using nditer as a context manager. So far all this is backward compatible on CPython but will raise Errors and Warnings on PyPy or if compiled with DEPRECATE_UPDATEIFCOPY defined. AFAICT this will be sufficient for me to work with downstream projects to fix their code for PyPy. Future enhancements will (each point is a separate enhancement/pull request/discussion):

For downstream C-API consumers: Migrate to a new flag, issue a DeprecationWarning for those using the old flag and eventually deprecate the old flag.
For nditer: Issue a warning/error (TBD) when nditer writes to an operand created with the UPDATEIFCOPY flag (not exclusively via the operand flags, also when using readwrite and in other cases too, but that is a different subject) outside a context manager
For flat.__array__: Issue a warning/error (TDB) when writing to a flat.__array__() or somehow otherwise resolve BUG,MAINT: Updateifcopy arrays should always be copied. #9249

Breaking this into pieces will make it all more managable.

mattip · 2017-07-21T09:21:54Z

(edited comment to add " outside a context manager" in point about nditer)

mattip · 2017-07-21T10:07:53Z

(edited comment to qualify 'backward compatible' WRT PyPy)

mattip · 2017-07-21T10:09:40Z

Note that this pull request modified some (not all) of the tests in test_nditer.py to use a context manager so that they mostly pass on PyPy. Should I (1) revert this file, and make these changes part of future pull requests (2) completely rework the rest of the nditer calls to use a context manager since that is now recommended practice or (3) fix tests that are still causing warnings/errors on PyPy?

mattip · 2017-07-30T21:24:38Z

I reworked much of this as pull request #9451, but things seem very quiet over there :(

mattip · 2017-08-06T11:20:17Z

ping on pull request #9451which replaces this one? Anyone?

mattip · 2017-09-01T13:17:14Z

Replaced by #9639

mattip added 4 commits June 19, 2017 00:48

refactor copying code into new API PyArray_ResolveUpdateIfCopy, add t…

9f15a2e

…ests

make nditer into a context manager, utilize for UPDATEIFCOPY

81cbb7b

BUG: test was not using mode

4effdfb

use __exit__ to force resolution of UPDATEIFCOPY for want of a better…

f012769

… way

charris added 01 - Enhancement 03 - Maintenance 24 - PyPy component: numpy._core labels Jun 19, 2017

charris changed the title ~~Refactor updateifcopy~~ ENH: Refactor updateifcopy Jun 19, 2017

add flag when nditer requires context manager, raise on PyPy

6dc0c65

seberg reviewed Jul 16, 2017

View reviewed changes

document nditer as a context manager

b8c7c38

mattip mentioned this pull request Jul 19, 2017

UPDATEIFCOPY breaks on python interpreters with non-refcounted GC #7054

Closed

cleanup, issue DeprecationWarning only if PyPy or -DDEPRECATE_UPDATEI…

7a63ca4

…FCOPY

internally use new ResolveUpdateIfCopy wherever needed

82203e1

mattip mentioned this pull request Jul 23, 2017

ENH, DEP refactor updateifcopy #9451

Closed

mattip mentioned this pull request Sep 1, 2017

MAINT: Refactor updateifcopy #9639

Merged

mattip closed this Sep 2, 2017

Uh oh!

ENH: Refactor updateifcopy #9269

ENH: Refactor updateifcopy #9269

Uh oh!

Conversation

mattip commented Jun 19, 2017

Uh oh!

charris commented Jun 19, 2017

Uh oh!

mattip commented Jun 19, 2017

Uh oh!

njsmith commented Jun 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Jun 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Jun 21, 2017

Uh oh!

mattip commented Jun 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Jul 12, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip commented Jul 16, 2017

Uh oh!

seberg commented Jul 16, 2017

Uh oh!

njsmith commented Jul 18, 2017

Uh oh!

mattip commented Jul 19, 2017

Uh oh!

mattip commented Jul 19, 2017

Uh oh!

njsmith commented Jul 20, 2017

Uh oh!

njsmith commented Jul 20, 2017

Uh oh!

mattip commented Jul 20, 2017

Uh oh!

njsmith commented Jul 21, 2017

Uh oh!

mattip commented Jul 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Jul 21, 2017

Uh oh!

mattip commented Jul 21, 2017

Uh oh!

mattip commented Jul 21, 2017

Uh oh!

mattip commented Jul 30, 2017

Uh oh!

mattip commented Aug 6, 2017

Uh oh!

mattip commented Sep 1, 2017

Uh oh!

Uh oh!

njsmith commented Jun 21, 2017 •

edited

Loading

mattip commented Jun 21, 2017 •

edited

Loading

mattip commented Jun 29, 2017 •

edited

Loading

mattip commented Jul 21, 2017 •

edited

Loading