-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
MAINT: Refactor updateifcopy #9639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
You should close older PRs if you intend to replace them with this. You can always resubmit them. |
f14c0ee
to
059f070
Compare
@njsmith could you review? I reworked the pull request:
|
doc/CAPI.rst.txt
Outdated
This is a special flag that is set if this array represents a copy | ||
made because a user required certain flags in ``PyArray_FromAny`` and | ||
a copy had to be made of some other array (and the user asked for | ||
this flag to be set in such a situation). The base attribute then | ||
points to the "misbehaved" array (which is set read_only). When | ||
the array with this flag set is deallocated, it will copy its | ||
``PyArray_ResoveWritebackIfCopy`` is called, it will copy its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing l
in Resolve
.
Text should probably say something like "if you use this flag, you have to later call PyArray_ResolveWriteBackIfCopy
on the array. If the the array has the NPY_WRITEBACKIFCOPY
flag set, it will copy its contents back ... Otherwise, this operation is a no-op."
doc/CAPI.rst.txt
Outdated
``NPY_UPDATEIFCOPY`` | ||
Similar to ``NPY_WRITEBACKIFCOPY``, but deprecated since it copied the | ||
contents back when the array is deallocated, which is not explicit and | ||
relies on refcount semantics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably go ahead and mention PyPy here by name, since not everyone knows why refcount semantics matter :-). "which is unreliable on alternative Python implementations like PyPy that don't use refcounting.", maybe.
numpy/add_newdocs.py
Outdated
ultimate owner of the memory exposes a writeable buffer interface, | ||
or is a string. (The exception for string is made so that unpickling | ||
can be done without copying memory.) | ||
The (deprecated) UPDATEIFCOPY and WRITEBACKIFCOPY flags can never be set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The WRITEBACKIFCOPY and (deprecated) UPDATEIFCOPY", to make clear that we're not saying both are deprecated.
@@ -894,11 +895,11 @@ typedef int (PyArray_FinalizeFunc)(PyArrayObject *, PyObject *); | |||
#define NPY_ARRAY_IN_ARRAY (NPY_ARRAY_CARRAY_RO) | |||
#define NPY_ARRAY_OUT_ARRAY (NPY_ARRAY_CARRAY) | |||
#define NPY_ARRAY_INOUT_ARRAY (NPY_ARRAY_CARRAY | \ | |||
NPY_ARRAY_UPDATEIFCOPY) | |||
NPY_ARRAY_WRITEBACKIFCOPY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it turns out third party code is using INOUT_ARRAY
(e.g. scipy/interpolate/src/_interpolate.cpp
), so I guess we can't just change its value :-(. We could define some new version like NPY_ARRAY_INOUT_ARRAY2
, or maybe we should just deprecate it and tell people that they should write NPY_ARRAY_CARRY | NPY_ARRAY_WRITEBACKIFCOPY
if that's what they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For backward compatibility I prefer the first option. I will extend the warning notice to reflect the possible flags that could trigger it.
#define NPY_ARRAY_IN_FARRAY (NPY_ARRAY_FARRAY_RO) | ||
#define NPY_ARRAY_OUT_FARRAY (NPY_ARRAY_FARRAY) | ||
#define NPY_ARRAY_INOUT_FARRAY (NPY_ARRAY_FARRAY | \ | ||
NPY_ARRAY_UPDATEIFCOPY) | ||
NPY_ARRAY_WRITEBACKIFCOPY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever we do for NPY_ARRAY_INOUT_ARRAY
we should do the same here.
changes pushed in response to review, once approved I will squash to a single commit |
@@ -2923,7 +2923,7 @@ npyiter_allocate_arrays(NpyIter *iter, | |||
/* If the data will be written to, set UPDATEIFCOPY */ | |||
if (op_itflags[iop] & NPY_OP_ITFLAG_WRITE) { | |||
Py_INCREF(op[iop]); | |||
if (PyArray_SetUpdateIfCopyBase(temp, op[iop]) < 0) { | |||
if (PyArray_SetWritebackIfCopyBase(temp, op[iop]) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change and the next one, inside the nditer
code, correct? I thought we were keeping nditer
using UPDATEIFCOPY
for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My goal was for nditer
to not print any warning after this pull request on CPython. The following logic allows that:
- all calls to
PyArray_SetUpdateIfCopyBase
issue aFutureWarning
, - to maintain backward compatiblity,
PyArray_ResolveWritebackIfCopy
is called inarray_dealloc
ifarr->base
exists- if
PyArray_ResolveWritebackIfCopy
actually does something, it will currently warn, but only on PyPy or if compiled with ``-DDEPRECATE_UPDATEIFCOPY"
- if
By using PyArray_SetWritebackIfCopyBase
, nditer
will neither warn when creating its operands nor when resolving in array_dealloc
. When we deal with nditer
in a future pull request, we can enable the warning in array_dealloc
on all implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all calls to
PyArray_SetUpdateIfCopyBase
issue aFutureWarning
This should be a DeprecationWarning
. (Rule: DeprecationWarning
is for when we're trying to remove an API, FutureWarning
is for when we're going to keep it but change the semantics.) And I think this is where we should do the "if PyPy or -DDEPRECATE_UPDATEIFCOPY
" check.
array_dealloc
I think I'd prefer the array_dealloc
logic be:
-
If
WRITEBACKIFCOPY
is set, then issue aRuntimeWarning
and that's it (you're supposed to callResolveUpdateIfCopy
orClearUpdateIfCopy
before the array is GCed). -
If
UPDATEIFCOPY
is set, then resolve it up like we always did. We don't need to issue a warning, because we already did that whenUPDATEIFCOPY
was set.
This then suggests that nditer
should keep using UPDATEIFCOPY
until we're ready to fix it properly, which seems logical, since it will in fact be broken until then, and if you're on PyPy or have -DDEPRECATE_UPDATEIFCOPY
set then it's probably useful to see that your program is in fact using UPDATEIFCOPY
(via nditer).
numpy/core/tests/test_multiarray.py
Outdated
assert_equal(self.a.flags['U'], False) | ||
assert len(sup) == 2 | ||
assert_equal(self.a.flags.writebackifcopy, False) | ||
assert_equal(self.a.flags['X'], False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use numpy.testing.assert_warns
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I should wrap each test in a assert_warns
context manager? (likewise line 4323)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing special there, the epic quest is just to never use "ignore" filter (and we have a test to make sure you don't do that). Assert warns as context filter is great, if you want to use the message text, you can use suppress_warnings
directly.
OK, I finally reached the bottom! Sorry for the drawn-out review :-) Overall it's looking quite reasonable. What state do you think it's at -- ready to go in? The one remaining high level thing that occurred to me is that |
@njsmith thanks for the non-trivial review and discussion process. Except for the If I understand correctly, What next? Should I squash the commits to a single one? Edit to mention that |
Re: With this PR currently, the |
Changing the logic exposed a missing call to The |
Also removed inconsistency between the second argument name between documentation ("totype") and code ("to").
@njsmith Ready. |
@njsmith if I could get a thumbs-up on this someone else can do the merge. Can you at some point give a stronger approval than the "I think this can go in" above? |
OK so I just read through the problem. I'm not sure I caught all the subtleties, but at least I see the point and I see why this is an improvement. So if this gets into 1.14, is the plan also to implement #9714 in time for 1.14? And if I understand correctly, the discussion at this point is about exactly when to emit deprecation warnings. In 1.14 the old updateifcopy flag will still "work", just emit warnings. Is that right? |
Also, |
@ahaldane the fix for issue #9714 is a much smaller so if there is consensus on the direction it could be ready for 1.14. Please take a look and tell me if we should deprecate the use case or make nditer into a context manager. The old flag and behaviour is still supported, and will emit warnings, except for the cases mentioned in issue #9714 (for a more complete discussion see this part of this pull request |
#define NPY_ARRAY_IN_FARRAY (NPY_ARRAY_FARRAY_RO) | ||
#define NPY_ARRAY_OUT_FARRAY (NPY_ARRAY_FARRAY) | ||
#define NPY_ARRAY_INOUT_FARRAY (NPY_ARRAY_FARRAY | \ | ||
NPY_ARRAY_UPDATEIFCOPY) | ||
#define NPY_ARRAY_INOUT_FARRAY2 (NPY_ARRAY_FARRAY | \ | ||
NPY_ARRAY_WRITEBACKIFCOPY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So presumably this 2
version is only needed during the deprecation phase, afterwards we will redefine NPY_ARRAY_INOUT_FARRAY
to use NPY_ARRAY_WRITEBACKIFCOPY
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one option, but it's a bit risky since we don't have any way to know when all the old code using NPY_ARRAY_INOUT_FARRAY
is gone. We might just keep the 2
version forever – it's not pretty but it's safe and works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we can leave the 2 version forever, but after the deprecation phase no one needs to use it any more.
What do you think of changing the release note as follows:
I want to make it easy for downstream projects to know what to change. |
@ahaldane: Are you happy? |
That's correct. Generally the deprecation procedure is to immediately emit warnings as soon as the deprecations occur. We can't do that here because there is still some internal code that calls this, but that doesn't work on pypy so we do that anyway. IMHO the next PR should rename the function so that warnings can also be emitted on CPython. |
Yeah looks good to me. |
Okay, let's do this... |
:c:data:`NPY_ARRAY_UPDATEIFCOPY` flag. Also resets the | ||
:c:data:`NPY_ARRAY_WRITEABLE` flag on the base object. This is | ||
useful for recovering from an error condition when | ||
writeback semantics are used, but will lead to wrong results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wording isn't great here - I read this as "This is useful ..., but will lead to wrong results", which is just confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about "useful for recovering from an exception when writeback semantics are used. Pending changes to the base array will be thrown away"
@@ -36,5 +36,6 @@ | |||
0x0000000a = 9b8bce614655d3eb02acddcb508203cb | |||
|
|||
# Version 11 (NumPy 1.13) Added PyArray_MapIterArrayCopyIfOverlap | |||
# Version 11 (NumPy 1.14) No Change | |||
# Version 11 (NumPy 1.14) Added PyArray_ResolveWritebackIfCopy, | |||
# PyArray_SetWritebackIfCopyBase and deprecate PyArray_SetUpdateIfCopyBase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this get a new hash on the last line? Or does that only happen at release time? @charris?
At any rate, I think the comment should go below the last hash in the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets checked and updated at release time, but should be updated on the fly. IIRC, that was how it was done with the last addition in 1.13.
BUG: test, fix problems from PR #9639
s = PyUnicode_FromString(msg); | ||
#endif | ||
if (s) { | ||
PyErr_WriteUnraisable(s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct usage - the argument should be the context - the message is already printed along with the exception type. Right now, the message is printed twice.
Off the top of my head, (PyObject *)&PyArray_Type
would be a safe context to pass. In theory you could pass self
, but calling array_repr
sounds a little too risky.
if nothing else, passing NULL
would cause the message alone to be printed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we used the py3 tp_finalize
slot, then we could pass ndarray.__del__
as the context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, the message is printed twice
This code should only run if DEPRECATE
(which is PyErr_WarnEx
) raises an exeption, in which case the exception will be ignored since this is a deallocator. In order to be sure the error is emitted, I wanted to print it to stderr
, and the documentation seems to indicate that is what PyErr_WriteUnraisable
does. Should I print to stderr and then emit PyErr_WriteUnraisable
as well, or simply skip it altogether? Since there was no example in NumPy I looked at how PyErr_WriteUnraisable
is used in Pandas as a hint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the documentation seems to indicate that is what PyErr_WriteUnraisable does
Correct, but it does this by looking at sys.exc_info
, not by looking at its argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand, the entire if()
clause should be replaced by the one-liner
PyErr_WriteUnraisable((PyObject *)&PyArray_Type);
How should I proceed since this pull request has been merged? Issue another pull request based off HEAD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, it seems cpython prints both the context from PyErr_Fetch
and repr(obj)
for a call to PyErr_WriteUnrasiable
but PyPy only prints the repr(obj)
without the context. I will file an issue there
edit: wrong, PyPy does do the right thing, and double-prints the current message as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd personally go for a new PR based on the last commit of this PR, but going off head would work fine too.
Replaces pull requests #9269 and #9451, see extensive previous discussions there. Summary:
UPDATEIFCOPY
semantics fromarray_dealloc
into a new API function, call the new function both fromarray_dealloc
and from all the places that previously relied onPy_DECREF
. This is to support implementations with a different GC strategy, wherePy_DECREF
may not immediately cause callingarray_dealloc
andUPDATEIFCOPY
resolutionUPDATEIFCOPY
semantics of this change, add a new flagNPY_ARRAY_WRITEBACKIFCOPY
and a new functionPyArray_SetWritebackIfCopyBase
. Use of either of those will raise aFutureWarning
.updateifcopy
.That will be handled in a future pull request. As a consequence a warning on resolution of theUPDATEIFCOPY
semantics inarray_dealloc
will only happen now if the compiler flag-DDEPRECATE_UPDATEIFCOPY
is used. On PyPy that is the default.