-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH, DEP refactor updateifcopy #9451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
just for completeness, this is related to #9447, if that pull request is rejected there should be a call to |
Not sure why you think the other thing was not acceptable, I am just daft :), and this is slightly tricky (or I guess just hard to be sure that we got it right). Here I am not quite sure if there is much gain from adding the new flag. If we give a warning in any case (and an error in some future), does it matter when we do it? On the other hand, the error would have to be during dealloc, so python would only print it out and not actually raise it? And that means that test suits might miss it (at least for a while). I guess the new flag would mostly be good to get a warning (or maybe error) if you have a coding error? I wonder if ResolveUpdateIfCopy should steal the reference on CPython, but lets hash out details later. To be honest, the important thing is probably to figure out something and do it. There are probably few users outside numpy (even less if we might not have to count the iterator users), so please don't get discouraged if I am overly sceptical about something. |
@seberg thanks, I thought maybe the other pull request focused too much on nditer-as-a-context-manager and so this one would be more acceptable (no nditer anywhere in this one). My thoughts were once this got merged I can issue another pull request for nditer-as-a-context-manager. The addition of the new flag was for @njsmith, hopefully he will weigh in on the pros and cons of deprecating behaviour via a cycle of deprecation warnings then errors then removal |
@njsmith ping, waiting for your comments since the new flag was per your suggestion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like this overall approach better than #9269.
I'm not a big fan of the CLEAR_B4_EXIT
name. Maybe NPY_ARRAY_UPDATEIFCOPY_EXPLICIT_RESOLVE
?
I have a few comments below (note in particular the issue with PyArray_SetUpdateIfCopyBase
), but I think the big question is about which of two strategies to take. Right now, this PR's strategy is to define two flags with different numeric values – NPY_ARRAY_UPDATEIFCOPY
and NPY_ARRAY_UPDATEIFCOPY_WHATEVER
– but it doesn't actually use both of these flags on arrays. Instead, it tries to normalize them at array creation into the new flag (possibly emitting a warning as it does so), and then switches the rest of numpy to use the new flag exclusively (including stuff like flag reprs – so arr.flags.UPDATEIFCOPY
refers to the new flag, not NPY_ARRAY_UPDATEIFCOPY
).
The other approach would be to allow both flags to coexist as first-class entities – so if some code requests NPY_ARRAY_UPDATEIFCOPY
then it gets an array with that flag set, it prints as UPDATEIFCOPY
, etc., and if some other code requests NPY_ARRAY_UPDATEIFCOPY_EXPLICIT_RESOLVE
then it gets an array with that flag set, it prints as UPDATEIFCOPY_EXPLICIT_RESOLVE
, etc.
It's not obvious to me which of these is better. My intuition is that the latter will take a bit more code to implement, but be much easier to reason about and judge whether we got things right, so I have a weak preference for it. (For example, it eliminates worries like "what if someone is setting NPY_ARRAY_UPDATEIFCOPY
manually?".) I'd be interested to hear what others think.
@@ -46,7 +46,7 @@ | |||
#define NPY_ALIGNED NPY_ARRAY_ALIGNED | |||
#define NPY_NOTSWAPPED NPY_ARRAY_NOTSWAPPED | |||
#define NPY_WRITEABLE NPY_ARRAY_WRITEABLE | |||
#define NPY_UPDATEIFCOPY NPY_ARRAY_UPDATEIFCOPY | |||
#define NPY_UPDATEIFCOPY NPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be reverted -- if someone's using the deprecated NPY_UPDATEIFCOPY
alias, then they should get NPY_ARRAY_UPDATEIFCOPY
semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -112,7 +112,7 @@ PyArray_SetUpdateIfCopyBase(PyArrayObject *arr, PyArrayObject *base) | |||
* references. | |||
*/ | |||
((PyArrayObject_fields *)arr)->base = (PyObject *)base; | |||
PyArray_ENABLEFLAGS(arr, NPY_ARRAY_UPDATEIFCOPY); | |||
PyArray_ENABLEFLAGS(arr, NPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, this is dubious. PyArray_SetUpdateIfCopyBase
is public, so there might be third-party code that uses it and expects the GC to resolve the UPDATEIFCOPY
. I guess we need to make PyArray_SetUpdateIfCopyBase
keep using NPY_ARRAY_UPDATEIFCOPY
and issue a deprecation warning, and add a new function PyArray_SetUpdateIfCopyBaseWithExplicitResolve
or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will add a PyArray_CopyDataFromBase
that will set the new flag, and use it instead across the code base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand the naming scheme here. IIRC (from my phone), PyArray_SetUpdateIfCopyBase
sets the UPDATEIFCOPY
flag and sets the ->base
member, but it doesn't copy any data does it? I was suggesting having a version that's identical except that it sets the new flag instead of the old one and doesn't warn. What does PyArray_CopyDataFromBase
do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, so maybe PyArray_SetUpToCopyData
?
*/ | ||
PyErr_Clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should use PyErr_WriteUnraisable()
or similar – otherwise when someone sets a warning filter to deprecations into errors, then the effect will be to hide this warning entirely!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, will add a test as well
@@ -1009,7 +1009,7 @@ PyArray_NewFromDescr_int(PyTypeObject *subtype, PyArray_Descr *descr, int nd, | |||
} | |||
} | |||
else { | |||
fa->flags = (flags & ~NPY_ARRAY_UPDATEIFCOPY); | |||
fa->flags = (flags & ~NPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should clear both versions of the flag, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
(flags & NPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT)) { | ||
if (flags & NPY_ARRAY_UPDATEIFCOPY) { | ||
if (DEPRECATE("Use NPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT and" | ||
" be sure to call PyArray_ResolveUpdateIfCopy") < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this have an #ifdef
around it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of #ifdef
?
We do not wish to issue a DeprecationWarning in the current version of NumPy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ultimately yeah it should be deprecated, but I thought your plan was to initially introduce this with all the warnings hidden behind a DEPRECATE_UPDATEIFCOPY
ifdef.
OK, so I will rewrite as a new commit to keep both flags, with the new name |
It's not important to get the names settled now b/c It's easy to fiddle with them later via copy/replace, but the idea behind the traditional naming is: "please give an array with the given properties based on this input, and make it a view if you can – but IF you have to COPY the input array, then please later UPDATE it". (This is when the flag is used as an input to Maybe
As a heads up, I think there are places where we call |
ETOOLONG, sorry. @njsmith I started rewriting the second changeset using the "double code" approach from your comment but it is not just a few lines here and there of more code:
I think the cost/benefit of the second approach is too high, there is a much higher chance of a mistake in such a sweeping code change. I have pushed what I have so far (still no tests and need to refactor nditer) here.I estimate this is about 85% of the code change but every changed code path needs a new test to be sure we can reason about the changes and judge whether we got things right, especially the python-level flag ones. So we have:
It is hard to estimate consumers of a NumPy API, but putting some time into searching Github for code (outside of numpy forks and copies) that use these semantics did not turn up a single case. I propose we stick with the first option. If you agree I will update the pull request, deprecating |
Not sure what you mean. If someone's accessing
Where do these single-letter codes come in?
So our plan for dealing with nditer is still somewhat unresolved... at the C level this actually seems kinda reasonable, because anyone using I looked at your preliminary branch, and the only major pieces of code duplication I saw were All the other changes (at least on a skim) seem very reasonable, and like things that would either need changing anyway, or whose semantics would be changing so they needed review anyway. I'm also a little confused about your statistics are coming from... the two approaches are your branches refactor-updateifcopy2 and refactor-updateifcopy3, right? If I restrict just to changes in the
and as noted, a lot of those insertions in the latter case are unnecessary.
There are at least 3 in scipy, FWIW. Anyway: I'm open to being convinced, but I'm not yet convinced :-) Can you give an example of a test or two that you think you need to write in the "option 2" approach but not in the "option 1" approach? (Besides the test that |
OK, I will continue to crank out code assuming you will review it or become convinced that the first alternative is better. str(array.flags) is used in logging (I have used it myself) to help debug strange cases. It will now "grow" another row. array.flags[s] supports both full upper case names and single-letter upper-case abbreviations., so we still need a single letter for the new flag. RESOLVEIFCOPY (R), WRTIEBACKIFCOPY (W) FINALIZEIFCOPY (F) all do not work. HARMONIZEIFCOPY (H)? SETTLEIFCOPY (S)? Or simply go with WRITEBACKIFCOPY (X)? I would prefer to postpone any discussion of nditer until this is behind us, Since my initial motivation for this entire saga is to fix the use of UPDATEIFCOPY semantics on nditers, you can be sure I will follow this up and not just forget it. Indeed more tests are needed in any case in both alternatives, it seems, although the first alternative would not require new tests in array.flags. I was looking in Github for |
I had to delete and re-fork the repo I issued this pull request from. I did not realize that meant I would not be able to continue this pull request, so I will be forced to close it and start again. Sorry Continued on #9639 |
Replaces pull request #9269, related to issues #9257, #7054
Commit b2837ab refactors the resolution of
UPDATEIFCOPY
into a new private API functionPyArray_ResolveUpdateIfCopy
. Add a call to the function wherever a temporaryndarray
with that flag is created, and added tests for calls that exercise the new function. This commit changes no behaviour on CPython but will raise a DeprecationWarning on PyPy (or if compiled with DEPRECATE_UPDATEIFCOPY defined) ifPyArray_ResolveUpdateIfCopy
actually does resolve anything inarray_deallocate
.Commit ec86410 changes the C API flag
NPY_ARRAY_UPDATEIFCOPY
toNPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT
and raise a DeprecationWarning if a ndarray is created with the old flag. The goal here is to encourage downstream projects to be sure they are properly resolving the UPDATEIFCOPY semantics before callingarray_deallocate
Still TODO:
PyArray_ResolveUpdateIfCopy
into a public API function for downstream users to be able to use it? What should the public name be?array_dealloc
, we need to deal with nditer. Pull request ENH: Refactor updateifcopy #9269 suggested making nditer into a context manager, we could also simply disallow creation of nditers with UDATEIFCOPY semantics.NPY_ARRAY_UPDATEIFCOPY
toNPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT
changeIs this approach more acceptable than #9269 ?