ENH: add nditer.close to solve writeback semantics in nditer #10184

mattip · 2017-12-09T21:08:45Z

Resolves issue #9714 by providing a recommended path to use nditer.close() for PyPy compatability. If nditer.close is not called on CPython, no warning will be raised. On PyPy (or if compiled with the special AVOID_UPDATEIFCOPY C macro) a DeprecationWarning will be raised when the operands are deallocated.

See the extensive discussion on PR #9998.

Review:

is npy_common.h the correct place for the macro?
Where would be a good place to document PyPy compatability (specifically the reasoning behind nditer.close() and how/when to call PyArray_ResolveWritebackIfCopy in the C-API)

…COPY

eric-wieser · 2017-12-15T08:47:52Z

numpy/core/src/multiarray/nditer_pywrap.c

@@ -37,6 +37,7 @@ struct NewNpyArrayIterObject_tag {
    npy_intp *innerstrides, *innerloopsizeptr;
    char readflags[NPY_MAXARGS];
    char writeflags[NPY_MAXARGS];
+    char open;


Not npy_bool?

the other flags are char, I was just following suit

eric-wieser · 2017-12-15T08:50:33Z

numpy/core/src/multiarray/nditer_pywrap.c

@@ -1443,6 +1482,10 @@ static PyObject *npyiter_operands_get(NewNpyArrayIterObject *self)
                "Iterator is invalid");
        return NULL;
    }
+    if (self->open == 0) {
+        PyErr_SetString(PyExc_RuntimeError, "Iterator closed");


It would be great to add a comment to each of these explaining why it is no longer valid to access the data.

For instance, would there be any harm in exposing a readonly view on the operands here?

It's more of a philisophical than practical question. Once the iter is closed, the it.operand[0].data may (if writeback semantics were involved) fall out of sync with op.data, which could introduce subtle bugs. These bugs might not show up in testing, since it is difficult to predict when writeback semantics will be triggered. I would prefer as much as possible to disallow use of a closed iter, just in case

eric-wieser · 2017-12-15T08:52:19Z

numpy/core/src/multiarray/nditer_pywrap.c

+    if (self->open == 0) {
+        PyErr_SetString(PyExc_RuntimeError, "Iterator closed");
+        return NULL;
+    }


This one feels totally safe to me

fixed but there was a merge conflict, now fixed and merged

eric-wieser · 2017-12-15T09:33:14Z

numpy/core/src/multiarray/nditer_pywrap.c

 static PyObject *
 npyiter_iternext(NewNpyArrayIterObject *self)
 {
+    if (self->open == 0) {
+        PyErr_SetString(PyExc_RuntimeError, "Iterator closed");


This one (and only this one) should raise StopIteration for consistency with generator objects

Either that, or __next__ should be different from iternext, and the former should raise StopIteration

Do you mean npyiter_next which is assigned to the tp_iternext slot? I am confused who calls npyiter_iternext which is exposed as a nditer.iternext method, it does not seem to be documented in the python typeobject model for python2 nor python3

if I iterate over a closed file object, it raises a ValueError, not a StopIteration error

You're right, I do mean npyiter_next. .iternext() is part of our API, not a python one.

Good counterexample with file

njsmith · 2017-12-19T06:53:41Z

I'm sorry I zoned out for a while and missed much of the discussion on #9998.

Looking over that discussion now, I'm afraid a critical issue was missed: it is never correct to issue a DeprecationWarning when something is deallocated. "Deprecated" means "in the future this will raise an exception", but technical limitations in Python mean that you cannot raise an exception when something is deallocated, so... it's a contradiction in terms. That's why the original plan was to make the use of a context manager mandatory, instead of providing a .close method: it's the only way we can accomplish the transition without risking wrong results from user code.

In this case, the problem would be, suppose in the future the deprecation period ends and we stop cleaning up WRITEBACKIFCOPY operands when the arrays are GCed. At that point, if someone hasn't switched to calling nditer.close, then... we just start giving the wrong results. We could still issue a warning when we see it, but the rule isn't "it's ok for people to get wrong results if we issue a warning"; we have to raise an exception. If crucial thing about the context manager is that it provides the __enter__ method, so we can see ahead of time whether the caller has been updated to clean things up properly; by the time things are being GCed it's too late.

CC @eric-wieser, @xoviat, @pv

What's the plan for NpyIter_Dealloc and NpyIter_Copy (which unfortunately is used by numexpr)?

Are you proposing that we have a long-term split in our API where on CPython we don't require closing the iterator, but on PyPy we do? I guess that's possible but it's not my favorite...

ghost · 2017-12-19T07:00:11Z

In this case, the problem would be, suppose in the future the deprecation period ends and we stop cleaning up WRITEBACKIFCOPY operands when the arrays are GCed. At that point, if someone hasn't switched to calling nditer.close, then... we just start giving the wrong results.

No. We at that point, we intentionally leak copious amounts of memory. I know what you're thinking: that sounds terrible, but it's what CPython does for this same scenario, and it got people to fix their code.

Note: this is written half-jokingly, but it's also completely factual.

eric-wieser · 2017-12-19T07:01:28Z

At that point, if someone hasn't switched to calling nditer.close, then... we just start giving the wrong results.

I think you have a valid argument against using DeprecationWarning, but FutureWarning already has semantics of "this will do something you didn't want /give the wrong results in future"

njsmith · 2017-12-19T07:03:09Z

@xoviat: huh? how does leaking memory help anything? What's "the same scenario" that CPython encountered?

The scenario here is that the user makes modifications to the WRITEBACKONCOPY array, then continues using their original array, thinking that the modifications have been applied -- but they haven't, so they end up using garbage data in their computations.

@eric-wieser Yeah, but the general rule is that we only use FutureWarning when we have no alternative...

ghost · 2017-12-19T07:04:51Z

@njsmith If you don't close a file explicitly in CPython 3, there will be a large memory leak. This happened on a production server and people fixed their code as a result.

Edit: the leak is through the warnings module.

ghost · 2017-12-19T07:06:59Z

See python-pillow/Pillow#2019.

ghost · 2017-12-19T07:10:26Z

In all seriousness, we could just keep putting out warnings perpetually. Results would be incorrect on PyPy, but that's the current state of things. Putting out zero warnings is not really an option though.

njsmith · 2017-12-19T07:10:56Z

@xoviat That was a CPython bug that has nothing to do with this...

ghost · 2017-12-19T07:12:32Z

No, it's not a bug. It's a feature. And it's exactly the same problem as the one here: we need to close our iterator (rather than file) explicitly rather than on gc

mattip · 2017-12-19T19:04:15Z

We need to synchronize our stories. I created this pull request in response to the luke-warm reception PR #9998 received. See the comment that seemed to veto the possibility of ever removing the warning.

Note also that even after this pull request, no warnings should be normally emitted, even when deallocating a nditer with no close. Only on PyPy (or if a special compilation macro is used) will a warning be emitted.

mattip · 2017-12-19T20:03:20Z

one possible approach would be to merge this one, and wait to see if PyPy + Numpy becomes popular enough to justify PR #9998

ghost · 2017-12-19T20:08:42Z

IMO the comment from @charris was mostly wrong (though I realize that my opinion carries significantly less weight). Using a deallocator to resolve the writeback is incorrect; that should be fixed. The backwards-compatible change is to introduce a compile-time preprocessor check, as SciPy has done.

I plan to leave the deprecations in place for a long time, perhaps forever, as there is no reason for folks to change their code except for PyPy compatibility, which they may not be interested in.

I may have a different perspective on deprecations than others: I see no reason why the existing functionality needs to be removed so quickly. It can simply remain in a deprecated status until we see that no one else encounters these warnings, which could be in a decade or so.

ghost · 2017-12-19T20:13:15Z

I think it would also be useful to explain why the changes are needed in the first place. I assume that the UPDATEIFCOPY flag is not actually replace, but rather a new flag WRITEBACKIFCOPY is added.

Because this is implementation-defined behavior. This is the same type of situation where something is released, but is not in a specification, and then people rely on it. This is why NumPy is attempting to remove deprecated APIs so that it can refactor internal code. The way that's done is to introduce a deprecation warning to stop the bleeding: make sure that no new code uses it, and then when (if) almost all existing code is fixed, it can finally be removed.

But it should be user, rather than schedule, driven. If people are still using deprecated functionality, then it shouldn't be removed.

njsmith · 2017-12-20T07:02:15Z

@mattip Oh, interesting, I missed the last bit of @charris's comment there.

I actually agree that we might want to leave this deprecation as a warning only for some long period of time, at least on CPython. But (a) I think we should leave ourselves the option of transitioning it to a real error eventually – refcounting is a real barrier to making Python faster, and NumPy wants Python to be faster!, and (b) we at least want to make it an error on PyPy ASAP, so that people who do try to port to PyPy find out immediately instead of getting incorrect results.

Also I think it'd really be better in the long run if numpy's API did not vary depending on which Python implementation it happens to be built for. So I'd like to at least have a plan for making the transition on CPython and emit an invisible warning, even if we don't have any solid time-table for actually flipping to an error.

mattip · 2017-12-20T08:38:31Z

@njs IMO that (a) would be acieved by the other pull request, (b) is achieved by this pull request as a warning. The long term plan is the other pull request, once consensus is reached. I am not sure how to make it an error rather than a warning since the action happens in a deallocate finction. Good enough?

njsmith · 2017-12-20T08:56:53Z

So what does this PR accomplish? It adds a warning that we'll have to remove later...? I don't really see how "We give the wrong result on PyPy and you don't even know unless you have DeprecationWarnings turned on" is a meaningful improvement over "We give the wrong result on PyPy and you'll never know", if it doesn't move us towards a real solution either :-(.

eric-wieser · 2017-12-20T09:00:09Z

Can we not jump straight to using PyErr_WriteUnraisable in nditer.__del__, which at least solves the "warnings turned off" problem?

njsmith · 2017-12-20T09:16:11Z

@eric-wieser do you think that the best option is to never raise an error in this case, just dump loud warnings on the console?

mattip · 2017-12-20T09:18:43Z

We can, but in many cases the PyPy GC will not be triggered anywhere close to the nditer use so the user will not see the stderr printout

ghost · 2017-12-20T16:36:18Z

IMO gh-9998 is fine. Yes, it's a complex PR, but we have plenty of time to review it. No one is forced to use a context manager if they don't want to, and it mirrors exactly what Python does with files.

mattip · 2018-01-07T19:16:09Z

So what does this PR accomplish?

It provides a path, for those "responsible adults" who care, to run python code correctly on PyPy without harming CPython at all. I was not able to generate much support for the complications involved in going the whole way as in PR #9998, so this is a minimal change to allow explicit closing of an iterator.

mattip · 2018-01-22T20:07:21Z

Can I get a yes/no/modify it decision on this pull request? If this is rejected, I understand PR #9998 will never be accepted. If this is accepted, we can wait with PR #9998 for a while to see how this is adopted by the community

eric-wieser · 2018-03-29T07:50:41Z

@eric-wieser do you think that the best option is to never raise an error in this case, just dump loud warnings on the console?

Well, dump unraiseable errors, which is essentially the same thing. Yes, I think that's a reasonable solution here.

Sorry for losing the context here: I'm still +1 on the addition of a close() method, whether or not we also go forward with #9998

mattip · 2018-04-21T19:48:09Z

Handled via PR #9998

mattip added 2 commits December 9, 2017 22:59

test, document, add nditer.close, add PyPy compatabile AVOID_UPDATEIF…

a1cf666

…COPY

test, implement flag to disallow using closed nditer

99553a4

mattip force-pushed the nditer-close branch from 72dfc60 to 99553a4 Compare December 11, 2017 17:05

mattip mentioned this pull request Dec 13, 2017

change updateifcopy semantics in nditer #9714

Closed

eric-wieser reviewed Dec 15, 2017

View reviewed changes

fix from review

a5613fd

mattip added 2 commits December 19, 2017 20:54

Merge branch 'master' into nditer-close

5fe54bb

Merge branch 'master' into nditer-close

3de2460

ahaldane mentioned this pull request Mar 7, 2018

ENH: Nditer as context manager #9998

Merged

charris changed the title ~~ENHANCE: add nditer.close to solve writeback semantics in nditer~~ ENH: add nditer.close to solve writeback semantics in nditer Mar 29, 2018

mattip closed this Apr 21, 2018

mattip deleted the nditer-close branch April 21, 2018 19:48

mattip mentioned this pull request Apr 22, 2018

BUG: it.close() disallows access to iterator, fixes #10950 #10951

Merged

ENH: add nditer.close to solve writeback semantics in nditer #10184

ENH: add nditer.close to solve writeback semantics in nditer #10184

Conversation

mattip commented Dec 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

njsmith commented Dec 19, 2017

ghost commented Dec 19, 2017 • edited by ghost Loading

eric-wieser commented Dec 19, 2017 • edited Loading

njsmith commented Dec 19, 2017

ghost commented Dec 19, 2017 • edited by ghost Loading

ghost commented Dec 19, 2017

ghost commented Dec 19, 2017

njsmith commented Dec 19, 2017

ghost commented Dec 19, 2017

mattip commented Dec 19, 2017

mattip commented Dec 19, 2017

ghost commented Dec 19, 2017

ghost commented Dec 19, 2017 • edited by ghost Loading

njsmith commented Dec 20, 2017

mattip commented Dec 20, 2017

njsmith commented Dec 20, 2017

eric-wieser commented Dec 20, 2017

njsmith commented Dec 20, 2017

mattip commented Dec 20, 2017

ghost commented Dec 20, 2017

mattip commented Jan 7, 2018 • edited Loading

mattip commented Jan 22, 2018 • edited Loading

eric-wieser commented Mar 29, 2018 • edited Loading

mattip commented Apr 21, 2018

ghost commented Dec 19, 2017 •

edited by ghost

Loading

eric-wieser commented Dec 19, 2017 •

edited

Loading

ghost commented Dec 19, 2017 •

edited by ghost

Loading

ghost commented Dec 19, 2017 •

edited by ghost

Loading

mattip commented Jan 7, 2018 •

edited

Loading

mattip commented Jan 22, 2018 •

edited

Loading

eric-wieser commented Mar 29, 2018 •

edited

Loading