-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: add nditer.close to solve writeback semantics in nditer #10184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -37,6 +37,7 @@ struct NewNpyArrayIterObject_tag { | |||
npy_intp *innerstrides, *innerloopsizeptr; | |||
char readflags[NPY_MAXARGS]; | |||
char writeflags[NPY_MAXARGS]; | |||
char open; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not npy_bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the other flags are char, I was just following suit
@@ -1443,6 +1482,10 @@ static PyObject *npyiter_operands_get(NewNpyArrayIterObject *self) | |||
"Iterator is invalid"); | |||
return NULL; | |||
} | |||
if (self->open == 0) { | |||
PyErr_SetString(PyExc_RuntimeError, "Iterator closed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to add a comment to each of these explaining why it is no longer valid to access the data.
For instance, would there be any harm in exposing a readonly view on the operands here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more of a philisophical than practical question. Once the iter is closed, the it.operand[0].data
may (if writeback semantics were involved) fall out of sync with op.data
, which could introduce subtle bugs. These bugs might not show up in testing, since it is difficult to predict when writeback semantics will be triggered. I would prefer as much as possible to disallow use of a closed iter, just in case
if (self->open == 0) { | ||
PyErr_SetString(PyExc_RuntimeError, "Iterator closed"); | ||
return NULL; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one feels totally safe to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed but there was a merge conflict, now fixed and merged
static PyObject * | ||
npyiter_iternext(NewNpyArrayIterObject *self) | ||
{ | ||
if (self->open == 0) { | ||
PyErr_SetString(PyExc_RuntimeError, "Iterator closed"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one (and only this one) should raise StopIteration
for consistency with generator
objects
Either that, or __next__
should be different from iternext
, and the former should raise StopIteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I iterate over a closed file object, it raises a ValueError
, not a StopIteration
error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I do mean npyiter_next
. .iternext()
is part of our API, not a python one.
Good counterexample with file
I'm sorry I zoned out for a while and missed much of the discussion on #9998. Looking over that discussion now, I'm afraid a critical issue was missed: it is never correct to issue a DeprecationWarning when something is deallocated. "Deprecated" means "in the future this will raise an exception", but technical limitations in Python mean that you cannot raise an exception when something is deallocated, so... it's a contradiction in terms. That's why the original plan was to make the use of a context manager mandatory, instead of providing a In this case, the problem would be, suppose in the future the deprecation period ends and we stop cleaning up WRITEBACKIFCOPY operands when the arrays are GCed. At that point, if someone hasn't switched to calling CC @eric-wieser, @xoviat, @pv What's the plan for Are you proposing that we have a long-term split in our API where on CPython we don't require closing the iterator, but on PyPy we do? I guess that's possible but it's not my favorite... |
No. We at that point, we intentionally leak copious amounts of memory. I know what you're thinking: that sounds terrible, but it's what CPython does for this same scenario, and it got people to fix their code. Note: this is written half-jokingly, but it's also completely factual. |
I think you have a valid argument against using |
@xoviat: huh? how does leaking memory help anything? What's "the same scenario" that CPython encountered? The scenario here is that the user makes modifications to the WRITEBACKONCOPY array, then continues using their original array, thinking that the modifications have been applied -- but they haven't, so they end up using garbage data in their computations. @eric-wieser Yeah, but the general rule is that we only use |
@njsmith If you don't close a file explicitly in CPython 3, there will be a large memory leak. This happened on a production server and people fixed their code as a result. Edit: the leak is through the |
In all seriousness, we could just keep putting out warnings perpetually. Results would be incorrect on PyPy, but that's the current state of things. Putting out zero warnings is not really an option though. |
@xoviat That was a CPython bug that has nothing to do with this... |
No, it's not a bug. It's a feature. And it's exactly the same problem as the one here: we need to close our iterator (rather than file) explicitly rather than on gc |
We need to synchronize our stories. I created this pull request in response to the luke-warm reception PR #9998 received. See the comment that seemed to veto the possibility of ever removing the warning. Note also that even after this pull request, no warnings should be normally emitted, even when deallocating a |
one possible approach would be to merge this one, and wait to see if PyPy + Numpy becomes popular enough to justify PR #9998 |
IMO the comment from @charris was mostly wrong (though I realize that my opinion carries significantly less weight). Using a deallocator to resolve the writeback is incorrect; that should be fixed. The backwards-compatible change is to introduce a compile-time preprocessor check, as SciPy has done.
I may have a different perspective on deprecations than others: I see no reason why the existing functionality needs to be removed so quickly. It can simply remain in a deprecated status until we see that no one else encounters these warnings, which could be in a decade or so. |
Because this is implementation-defined behavior. This is the same type of situation where something is released, but is not in a specification, and then people rely on it. This is why NumPy is attempting to remove deprecated APIs so that it can refactor internal code. The way that's done is to introduce a deprecation warning to stop the bleeding: make sure that no new code uses it, and then when (if) almost all existing code is fixed, it can finally be removed. But it should be user, rather than schedule, driven. If people are still using deprecated functionality, then it shouldn't be removed. |
@mattip Oh, interesting, I missed the last bit of @charris's comment there. I actually agree that we might want to leave this deprecation as a warning only for some long period of time, at least on CPython. But (a) I think we should leave ourselves the option of transitioning it to a real error eventually – refcounting is a real barrier to making Python faster, and NumPy wants Python to be faster!, and (b) we at least want to make it an error on PyPy ASAP, so that people who do try to port to PyPy find out immediately instead of getting incorrect results. Also I think it'd really be better in the long run if numpy's API did not vary depending on which Python implementation it happens to be built for. So I'd like to at least have a plan for making the transition on CPython and emit an invisible warning, even if we don't have any solid time-table for actually flipping to an error. |
@njs IMO that (a) would be acieved by the other pull request, (b) is achieved by this pull request as a warning. The long term plan is the other pull request, once consensus is reached. I am not sure how to make it an error rather than a warning since the action happens in a deallocate finction. Good enough? |
So what does this PR accomplish? It adds a warning that we'll have to remove later...? I don't really see how "We give the wrong result on PyPy and you don't even know unless you have DeprecationWarnings turned on" is a meaningful improvement over "We give the wrong result on PyPy and you'll never know", if it doesn't move us towards a real solution either :-(. |
Can we not jump straight to using |
@eric-wieser do you think that the best option is to never raise an error in this case, just dump loud warnings on the console? |
We can, but in many cases the PyPy GC will not be triggered anywhere close to the nditer use so the user will not see the stderr printout |
IMO gh-9998 is fine. Yes, it's a complex PR, but we have plenty of time to review it. No one is forced to use a context manager if they don't want to, and it mirrors exactly what Python does with files. |
It provides a path, for those "responsible adults" who care, to run python code correctly on PyPy without harming CPython at all. I was not able to generate much support for the complications involved in going the whole way as in PR #9998, so this is a minimal change to allow explicit closing of an iterator. |
Well, dump unraiseable errors, which is essentially the same thing. Yes, I think that's a reasonable solution here. Sorry for losing the context here: I'm still +1 on the addition of a |
Handled via PR #9998 |
Resolves issue #9714 by providing a recommended path to use
nditer.close()
for PyPy compatability. Ifnditer.close
is not called on CPython, no warning will be raised. On PyPy (or if compiled with the specialAVOID_UPDATEIFCOPY
C macro) a DeprecationWarning will be raised when the operands are deallocated.See the extensive discussion on PR #9998.
Review:
nditer.close()
and how/when to callPyArray_ResolveWritebackIfCopy
in the C-API)