Skip to content

ENH, DEP refactor updateifcopy #9451

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

ENH, DEP refactor updateifcopy #9451

wants to merge 2 commits into from

Conversation

mattip
Copy link
Member

@mattip mattip commented Jul 23, 2017

Replaces pull request #9269, related to issues #9257, #7054

Commit b2837ab refactors the resolution of UPDATEIFCOPY into a new private API function PyArray_ResolveUpdateIfCopy. Add a call to the function wherever a temporary ndarray with that flag is created, and added tests for calls that exercise the new function. This commit changes no behaviour on CPython but will raise a DeprecationWarning on PyPy (or if compiled with DEPRECATE_UPDATEIFCOPY defined) if PyArray_ResolveUpdateIfCopy actually does resolve anything in array_deallocate.

Commit ec86410 changes the C API flag NPY_ARRAY_UPDATEIFCOPY to NPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT and raise a DeprecationWarning if a ndarray is created with the old flag. The goal here is to encourage downstream projects to be sure they are properly resolving the UPDATEIFCOPY semantics before calling array_deallocate

Still TODO:

  • Do I need to make PyArray_ResolveUpdateIfCopy into a public API function for downstream users to be able to use it? What should the public name be?
  • Before we can enable or strengthen the DeprecationWarning at array_dealloc, we need to deal with nditer. Pull request ENH: Refactor updateifcopy #9269 suggested making nditer into a context manager, we could also simply disallow creation of nditers with UDATEIFCOPY semantics.
  • There should be some test of the NPY_ARRAY_UPDATEIFCOPY to NPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT change

Is this approach more acceptable than #9269 ?

@mattip
Copy link
Member Author

mattip commented Jul 23, 2017

just for completeness, this is related to #9447, if that pull request is rejected there should be a call to PyArray_ResolveUpdateIfCopy in iter_richcompare.

@seberg
Copy link
Member

seberg commented Aug 6, 2017

Not sure why you think the other thing was not acceptable, I am just daft :), and this is slightly tricky (or I guess just hard to be sure that we got it right).

Here I am not quite sure if there is much gain from adding the new flag. If we give a warning in any case (and an error in some future), does it matter when we do it? On the other hand, the error would have to be during dealloc, so python would only print it out and not actually raise it? And that means that test suits might miss it (at least for a while).

I guess the new flag would mostly be good to get a warning (or maybe error) if you have a coding error? I wonder if ResolveUpdateIfCopy should steal the reference on CPython, but lets hash out details later.

To be honest, the important thing is probably to figure out something and do it. There are probably few users outside numpy (even less if we might not have to count the iterator users), so please don't get discouraged if I am overly sceptical about something.

@mattip
Copy link
Member Author

mattip commented Aug 7, 2017

@seberg thanks, I thought maybe the other pull request focused too much on nditer-as-a-context-manager and so this one would be more acceptable (no nditer anywhere in this one). My thoughts were once this got merged I can issue another pull request for nditer-as-a-context-manager.

The addition of the new flag was for @njsmith, hopefully he will weigh in on the pros and cons of deprecating behaviour via a cycle of deprecation warnings then errors then removal

@mattip
Copy link
Member Author

mattip commented Aug 14, 2017

@njsmith ping, waiting for your comments since the new flag was per your suggestion

Copy link
Member

@njsmith njsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like this overall approach better than #9269.

I'm not a big fan of the CLEAR_B4_EXIT name. Maybe NPY_ARRAY_UPDATEIFCOPY_EXPLICIT_RESOLVE?

I have a few comments below (note in particular the issue with PyArray_SetUpdateIfCopyBase), but I think the big question is about which of two strategies to take. Right now, this PR's strategy is to define two flags with different numeric values – NPY_ARRAY_UPDATEIFCOPY and NPY_ARRAY_UPDATEIFCOPY_WHATEVER – but it doesn't actually use both of these flags on arrays. Instead, it tries to normalize them at array creation into the new flag (possibly emitting a warning as it does so), and then switches the rest of numpy to use the new flag exclusively (including stuff like flag reprs – so arr.flags.UPDATEIFCOPY refers to the new flag, not NPY_ARRAY_UPDATEIFCOPY).

The other approach would be to allow both flags to coexist as first-class entities – so if some code requests NPY_ARRAY_UPDATEIFCOPY then it gets an array with that flag set, it prints as UPDATEIFCOPY, etc., and if some other code requests NPY_ARRAY_UPDATEIFCOPY_EXPLICIT_RESOLVE then it gets an array with that flag set, it prints as UPDATEIFCOPY_EXPLICIT_RESOLVE, etc.

It's not obvious to me which of these is better. My intuition is that the latter will take a bit more code to implement, but be much easier to reason about and judge whether we got things right, so I have a weak preference for it. (For example, it eliminates worries like "what if someone is setting NPY_ARRAY_UPDATEIFCOPY manually?".) I'd be interested to hear what others think.

@@ -46,7 +46,7 @@
#define NPY_ALIGNED NPY_ARRAY_ALIGNED
#define NPY_NOTSWAPPED NPY_ARRAY_NOTSWAPPED
#define NPY_WRITEABLE NPY_ARRAY_WRITEABLE
#define NPY_UPDATEIFCOPY NPY_ARRAY_UPDATEIFCOPY
#define NPY_UPDATEIFCOPY NPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be reverted -- if someone's using the deprecated NPY_UPDATEIFCOPY alias, then they should get NPY_ARRAY_UPDATEIFCOPY semantics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -112,7 +112,7 @@ PyArray_SetUpdateIfCopyBase(PyArrayObject *arr, PyArrayObject *base)
* references.
*/
((PyArrayObject_fields *)arr)->base = (PyObject *)base;
PyArray_ENABLEFLAGS(arr, NPY_ARRAY_UPDATEIFCOPY);
PyArray_ENABLEFLAGS(arr, NPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, this is dubious. PyArray_SetUpdateIfCopyBase is public, so there might be third-party code that uses it and expects the GC to resolve the UPDATEIFCOPY. I guess we need to make PyArray_SetUpdateIfCopyBase keep using NPY_ARRAY_UPDATEIFCOPY and issue a deprecation warning, and add a new function PyArray_SetUpdateIfCopyBaseWithExplicitResolve or something like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will add a PyArray_CopyDataFromBase that will set the new flag, and use it instead across the code base

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't understand the naming scheme here. IIRC (from my phone), PyArray_SetUpdateIfCopyBase sets the UPDATEIFCOPY flag and sets the ->base member, but it doesn't copy any data does it? I was suggesting having a version that's identical except that it sets the new flag instead of the old one and doesn't warn. What does PyArray_CopyDataFromBase do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, so maybe PyArray_SetUpToCopyData?

*/
PyErr_Clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should use PyErr_WriteUnraisable() or similar – otherwise when someone sets a warning filter to deprecations into errors, then the effect will be to hide this warning entirely!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, will add a test as well

@@ -1009,7 +1009,7 @@ PyArray_NewFromDescr_int(PyTypeObject *subtype, PyArray_Descr *descr, int nd,
}
}
else {
fa->flags = (flags & ~NPY_ARRAY_UPDATEIFCOPY);
fa->flags = (flags & ~NPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should clear both versions of the flag, I think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

(flags & NPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT)) {
if (flags & NPY_ARRAY_UPDATEIFCOPY) {
if (DEPRECATE("Use NPY_ARRAY_UPDATEIFCOPY_CLEAR_B4_EXIT and"
" be sure to call PyArray_ResolveUpdateIfCopy") < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this have an #ifdef around it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of #ifdef?
We do not wish to issue a DeprecationWarning in the current version of NumPy?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately yeah it should be deprecated, but I thought your plan was to initially introduce this with all the warnings hidden behind a DEPRECATE_UPDATEIFCOPY ifdef.

@mattip
Copy link
Member Author

mattip commented Aug 24, 2017

OK, so I will rewrite as a new commit to keep both flags, with the new name NPY_ARRAY_HASBASEDATACOPY", the old NPY_ARRAY_UPDATEIFCOPY will remain. I will add a new additional API function PyArray_CopyDataFromBase that will copy the data and set the new flag and use it across the code base. Thanks for the review

@njsmith
Copy link
Member

njsmith commented Aug 28, 2017

It's not important to get the names settled now b/c It's easy to fiddle with them later via copy/replace, but the idea behind the traditional naming is: "please give an array with the given properties based on this input, and make it a view if you can – but IF you have to COPY the input array, then please later UPDATE it". (This is when the flag is used as an input to PyArray_FromAny and friends – then the flag gets reused for the actual array object, where the name doesn't make as much sense.) Anyway, this makes SetUpToCopyData feel a bit confusing to me b/c now we've flipped it around so the word "copy" is referring to what used to be called "update".

Maybe WRITEBACK_IF_COPY, PyArray_SetWritebackIfCopyBase?

new additional API function PyArray_CopyDataFromBase that will copy the data and set the new flag and use it across the code base.

As a heads up, I think there are places where we call SetUpdateIfCopyBase without actually initializing the array to contain a copy of the base, e.g. for out arrays.

@mattip
Copy link
Member Author

mattip commented Aug 30, 2017

ETOOLONG, sorry.

@njsmith I started rewriting the second changeset using the "double code" approach from your comment but it is not just a few lines here and there of more code:

  • array.flags grows another flag, all the parsing and representation code for flags needs to be adapted with proper deprecation semantics. Since this is exposed to python, anyone who is currently parsing flags will need to modify their code, and will need to modify it again when the old flag is removed.
  • we need to worry about creating a new single-letter flag representation for the new flag, not using A B C F U or W since those are already taken, with the additional cost of modifying downstream python code
  • nditer also grows another flag NPY_ITER_WRITEBACKIFCOPY and likewise needs care to make sure everything is done there, including parsing of op_flags which also grow another python-level option
  • everywhere UPDATEIFCOPY is used there is code duplication
  • additional tests are needed to make sure this all is working properly

I think the cost/benefit of the second approach is too high, there is a much higher chance of a mistake in such a sweeping code change. I have pushed what I have so far (still no tests and need to refactor nditer) here.I estimate this is about 85% of the code change but every changed code path needs a new test to be sure we can reason about the changes and judge whether we got things right, especially the python-level flag ones.

So we have:

changeset github stats tests
refactor code from dealloc into a function 119 additions, 25 deletions in code 10 tests added
replacing the flag usage (option 1) 62 additions, 32 deletions in code 0 tests added, a few needed
add parallel code paths (option 2) (started over again from master) 434 additions, 98 deletions, more on the way 11 tests added, many more needed

It is hard to estimate consumers of a NumPy API, but putting some time into searching Github for code (outside of numpy forks and copies) that use these semantics did not turn up a single case. I propose we stick with the first option. If you agree I will update the pull request, deprecating PyArray_SetUpdateIfCopyBase, adding deprecation tests, and fixing for the rest of the review comments.

@njsmith
Copy link
Member

njsmith commented Aug 30, 2017

array.flags grows another flag, all the parsing and representation code for flags needs to be adapted with proper deprecation semantics. Since this is exposed to python, anyone who is currently parsing flags will need to modify their code, and will need to modify it again when the old flag is removed.

Not sure what you mean. If someone's accessing arr.flags["UPDATEIFCOPY"], then giving a warning is probably a good thing. If they're accessing arr.flags["ALIGNED"], then why would they need to modify their code?

we need to worry about creating a new single-letter flag representation for the new flag, not using A B C F U or W since those are already taken, with the additional cost of modifying downstream python code

Where do these single-letter codes come in?

nditer also grows another flag NPY_ITER_WRITEBACKIFCOPY

So our plan for dealing with nditer is still somewhat unresolved... at the C level this actually seems kinda reasonable, because anyone using NPY_ITER_UPDATEIFCOPY will need to add something like an explicit PyArray_Resolve call... right? And at the Python level, remind me – did we come to any conclusion about which cases will issue deprecation warnings and what will need to be done to silence them?

I looked at your preliminary branch, and the only major pieces of code duplication I saw were arrayflags_writebackifcopy_set and the body of PyArray_SetWritebackIfCopyBase. Both of these seem like they could be easily eliminated though? I think we should just disallow setting flags["WRITEBACKIFCOPY"] so that goes away, and then most of the code in PyArray_Set*IfCopyBase can be factored out into a common function.

All the other changes (at least on a skim) seem very reasonable, and like things that would either need changing anyway, or whose semantics would be changing so they needed review anyway.

I'm also a little confused about your statistics are coming from... the two approaches are your branches refactor-updateifcopy2 and refactor-updateifcopy3, right? If I restrict just to changes in the numpy/ dir (so ignoring doc updates) and compare each to master, I get

  • refactor-updateifcopy2: 223 insertions, 60 delections
  • refactor-updateifcopy3: 363 insertions, 68 deletions

and as noted, a lot of those insertions in the latter case are unnecessary.

It is hard to estimate consumers of a NumPy API, but putting some time into searching Github for code (outside of numpy forks and copies) that use these semantics did not turn up a single case

There are at least 3 in scipy, FWIW.

Anyway: I'm open to being convinced, but I'm not yet convinced :-)

Can you give an example of a test or two that you think you need to write in the "option 2" approach but not in the "option 1" approach? (Besides the test that arr.flags["WRITEBACKIFCOPY"] and variants work.)

@mattip
Copy link
Member Author

mattip commented Aug 30, 2017

OK, I will continue to crank out code assuming you will review it or become convinced that the first alternative is better.

str(array.flags) is used in logging (I have used it myself) to help debug strange cases. It will now "grow" another row.

array.flags[s] supports both full upper case names and single-letter upper-case abbreviations., so we still need a single letter for the new flag. RESOLVEIFCOPY (R), WRTIEBACKIFCOPY (W) FINALIZEIFCOPY (F) all do not work. HARMONIZEIFCOPY (H)? SETTLEIFCOPY (S)? Or simply go with WRITEBACKIFCOPY (X)?

I would prefer to postpone any discussion of nditer until this is behind us, Since my initial motivation for this entire saga is to fix the use of UPDATEIFCOPY semantics on nditers, you can be sure I will follow this up and not just forget it.

Indeed more tests are needed in any case in both alternatives, it seems, although the first alternative would not require new tests in array.flags.

I was looking in Github for PyArray_SetUpdateIfCopyBase or places that set the flag true directly. The motivation to use the old flag in parallel with the new one is only in these two cases, since use of the old NP_ARRAY_UPDATEIFCOPY in any of the array creation routines can and should emit a deprecation warning in both alternatives.

@mattip
Copy link
Member Author

mattip commented Sep 1, 2017

I had to delete and re-fork the repo I issued this pull request from. I did not realize that meant I would not be able to continue this pull request, so I will be forced to close it and start again. Sorry Continued on #9639

@mattip mattip closed this Sep 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants