Skip to content

ENH: add nditer.close to solve writeback semantics in nditer #10184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

mattip
Copy link
Member

@mattip mattip commented Dec 9, 2017

Resolves issue #9714 by providing a recommended path to use nditer.close() for PyPy compatability. If nditer.close is not called on CPython, no warning will be raised. On PyPy (or if compiled with the special AVOID_UPDATEIFCOPY C macro) a DeprecationWarning will be raised when the operands are deallocated.

See the extensive discussion on PR #9998.

Review:

  • is npy_common.h the correct place for the macro?
  • Where would be a good place to document PyPy compatability (specifically the reasoning behind nditer.close() and how/when to call PyArray_ResolveWritebackIfCopy in the C-API)

@@ -37,6 +37,7 @@ struct NewNpyArrayIterObject_tag {
npy_intp *innerstrides, *innerloopsizeptr;
char readflags[NPY_MAXARGS];
char writeflags[NPY_MAXARGS];
char open;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not npy_bool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other flags are char, I was just following suit

@@ -1443,6 +1482,10 @@ static PyObject *npyiter_operands_get(NewNpyArrayIterObject *self)
"Iterator is invalid");
return NULL;
}
if (self->open == 0) {
PyErr_SetString(PyExc_RuntimeError, "Iterator closed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to add a comment to each of these explaining why it is no longer valid to access the data.

For instance, would there be any harm in exposing a readonly view on the operands here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more of a philisophical than practical question. Once the iter is closed, the it.operand[0].data may (if writeback semantics were involved) fall out of sync with op.data, which could introduce subtle bugs. These bugs might not show up in testing, since it is difficult to predict when writeback semantics will be triggered. I would prefer as much as possible to disallow use of a closed iter, just in case

if (self->open == 0) {
PyErr_SetString(PyExc_RuntimeError, "Iterator closed");
return NULL;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one feels totally safe to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed but there was a merge conflict, now fixed and merged

static PyObject *
npyiter_iternext(NewNpyArrayIterObject *self)
{
if (self->open == 0) {
PyErr_SetString(PyExc_RuntimeError, "Iterator closed");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one (and only this one) should raise StopIteration for consistency with generator objects

Either that, or __next__ should be different from iternext, and the former should raise StopIteration

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean npyiter_next which is assigned to the tp_iternext slot? I am confused who calls npyiter_iternext which is exposed as a nditer.iternext method, it does not seem to be documented in the python typeobject model for python2 nor python3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I iterate over a closed file object, it raises a ValueError, not a StopIteration error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I do mean npyiter_next. .iternext() is part of our API, not a python one.

Good counterexample with file

@njsmith
Copy link
Member

njsmith commented Dec 19, 2017

I'm sorry I zoned out for a while and missed much of the discussion on #9998.

Looking over that discussion now, I'm afraid a critical issue was missed: it is never correct to issue a DeprecationWarning when something is deallocated. "Deprecated" means "in the future this will raise an exception", but technical limitations in Python mean that you cannot raise an exception when something is deallocated, so... it's a contradiction in terms. That's why the original plan was to make the use of a context manager mandatory, instead of providing a .close method: it's the only way we can accomplish the transition without risking wrong results from user code.

In this case, the problem would be, suppose in the future the deprecation period ends and we stop cleaning up WRITEBACKIFCOPY operands when the arrays are GCed. At that point, if someone hasn't switched to calling nditer.close, then... we just start giving the wrong results. We could still issue a warning when we see it, but the rule isn't "it's ok for people to get wrong results if we issue a warning"; we have to raise an exception. If crucial thing about the context manager is that it provides the __enter__ method, so we can see ahead of time whether the caller has been updated to clean things up properly; by the time things are being GCed it's too late.

CC @eric-wieser, @xoviat, @pv


What's the plan for NpyIter_Dealloc and NpyIter_Copy (which unfortunately is used by numexpr)?

Are you proposing that we have a long-term split in our API where on CPython we don't require closing the iterator, but on PyPy we do? I guess that's possible but it's not my favorite...

@ghost
Copy link

ghost commented Dec 19, 2017

In this case, the problem would be, suppose in the future the deprecation period ends and we stop cleaning up WRITEBACKIFCOPY operands when the arrays are GCed. At that point, if someone hasn't switched to calling nditer.close, then... we just start giving the wrong results.

No. We at that point, we intentionally leak copious amounts of memory. I know what you're thinking: that sounds terrible, but it's what CPython does for this same scenario, and it got people to fix their code.

Note: this is written half-jokingly, but it's also completely factual.

@eric-wieser
Copy link
Member

eric-wieser commented Dec 19, 2017

At that point, if someone hasn't switched to calling nditer.close, then... we just start giving the wrong results.

I think you have a valid argument against using DeprecationWarning, but FutureWarning already has semantics of "this will do something you didn't want /give the wrong results in future"

@njsmith
Copy link
Member

njsmith commented Dec 19, 2017

@xoviat: huh? how does leaking memory help anything? What's "the same scenario" that CPython encountered?

The scenario here is that the user makes modifications to the WRITEBACKONCOPY array, then continues using their original array, thinking that the modifications have been applied -- but they haven't, so they end up using garbage data in their computations.

@eric-wieser Yeah, but the general rule is that we only use FutureWarning when we have no alternative...

@ghost
Copy link

ghost commented Dec 19, 2017

@njsmith If you don't close a file explicitly in CPython 3, there will be a large memory leak. This happened on a production server and people fixed their code as a result.

Edit: the leak is through the warnings module.

@ghost
Copy link

ghost commented Dec 19, 2017

See python-pillow/Pillow#2019.

@ghost
Copy link

ghost commented Dec 19, 2017

In all seriousness, we could just keep putting out warnings perpetually. Results would be incorrect on PyPy, but that's the current state of things. Putting out zero warnings is not really an option though.

@njsmith
Copy link
Member

njsmith commented Dec 19, 2017

@xoviat That was a CPython bug that has nothing to do with this...

@ghost
Copy link

ghost commented Dec 19, 2017

No, it's not a bug. It's a feature. And it's exactly the same problem as the one here: we need to close our iterator (rather than file) explicitly rather than on gc

@mattip
Copy link
Member Author

mattip commented Dec 19, 2017

We need to synchronize our stories. I created this pull request in response to the luke-warm reception PR #9998 received. See the comment that seemed to veto the possibility of ever removing the warning.

Note also that even after this pull request, no warnings should be normally emitted, even when deallocating a nditer with no close. Only on PyPy (or if a special compilation macro is used) will a warning be emitted.

@mattip
Copy link
Member Author

mattip commented Dec 19, 2017

one possible approach would be to merge this one, and wait to see if PyPy + Numpy becomes popular enough to justify PR #9998

@ghost
Copy link

ghost commented Dec 19, 2017

IMO the comment from @charris was mostly wrong (though I realize that my opinion carries significantly less weight). Using a deallocator to resolve the writeback is incorrect; that should be fixed. The backwards-compatible change is to introduce a compile-time preprocessor check, as SciPy has done.

I plan to leave the deprecations in place for a long time, perhaps forever, as there is no reason for folks to change their code except for PyPy compatibility, which they may not be interested in.

I may have a different perspective on deprecations than others: I see no reason why the existing functionality needs to be removed so quickly. It can simply remain in a deprecated status until we see that no one else encounters these warnings, which could be in a decade or so.

@ghost
Copy link

ghost commented Dec 19, 2017

I think it would also be useful to explain why the changes are needed in the first place. I assume that the UPDATEIFCOPY flag is not actually replace, but rather a new flag WRITEBACKIFCOPY is added.

Because this is implementation-defined behavior. This is the same type of situation where something is released, but is not in a specification, and then people rely on it. This is why NumPy is attempting to remove deprecated APIs so that it can refactor internal code. The way that's done is to introduce a deprecation warning to stop the bleeding: make sure that no new code uses it, and then when (if) almost all existing code is fixed, it can finally be removed.

But it should be user, rather than schedule, driven. If people are still using deprecated functionality, then it shouldn't be removed.

@njsmith
Copy link
Member

njsmith commented Dec 20, 2017

@mattip Oh, interesting, I missed the last bit of @charris's comment there.

I actually agree that we might want to leave this deprecation as a warning only for some long period of time, at least on CPython. But (a) I think we should leave ourselves the option of transitioning it to a real error eventually – refcounting is a real barrier to making Python faster, and NumPy wants Python to be faster!, and (b) we at least want to make it an error on PyPy ASAP, so that people who do try to port to PyPy find out immediately instead of getting incorrect results.

Also I think it'd really be better in the long run if numpy's API did not vary depending on which Python implementation it happens to be built for. So I'd like to at least have a plan for making the transition on CPython and emit an invisible warning, even if we don't have any solid time-table for actually flipping to an error.

@mattip
Copy link
Member Author

mattip commented Dec 20, 2017

@njs IMO that (a) would be acieved by the other pull request, (b) is achieved by this pull request as a warning. The long term plan is the other pull request, once consensus is reached. I am not sure how to make it an error rather than a warning since the action happens in a deallocate finction. Good enough?

@njsmith
Copy link
Member

njsmith commented Dec 20, 2017

So what does this PR accomplish? It adds a warning that we'll have to remove later...? I don't really see how "We give the wrong result on PyPy and you don't even know unless you have DeprecationWarnings turned on" is a meaningful improvement over "We give the wrong result on PyPy and you'll never know", if it doesn't move us towards a real solution either :-(.

@eric-wieser
Copy link
Member

Can we not jump straight to using PyErr_WriteUnraisable in nditer.__del__, which at least solves the "warnings turned off" problem?

@njsmith
Copy link
Member

njsmith commented Dec 20, 2017

@eric-wieser do you think that the best option is to never raise an error in this case, just dump loud warnings on the console?

@mattip
Copy link
Member Author

mattip commented Dec 20, 2017

We can, but in many cases the PyPy GC will not be triggered anywhere close to the nditer use so the user will not see the stderr printout

@ghost
Copy link

ghost commented Dec 20, 2017

IMO gh-9998 is fine. Yes, it's a complex PR, but we have plenty of time to review it. No one is forced to use a context manager if they don't want to, and it mirrors exactly what Python does with files.

@mattip
Copy link
Member Author

mattip commented Jan 7, 2018

So what does this PR accomplish?

It provides a path, for those "responsible adults" who care, to run python code correctly on PyPy without harming CPython at all. I was not able to generate much support for the complications involved in going the whole way as in PR #9998, so this is a minimal change to allow explicit closing of an iterator.

@mattip
Copy link
Member Author

mattip commented Jan 22, 2018

Can I get a yes/no/modify it decision on this pull request? If this is rejected, I understand PR #9998 will never be accepted. If this is accepted, we can wait with PR #9998 for a while to see how this is adopted by the community

@eric-wieser
Copy link
Member

eric-wieser commented Mar 29, 2018

@eric-wieser do you think that the best option is to never raise an error in this case, just dump loud warnings on the console?

Well, dump unraiseable errors, which is essentially the same thing. Yes, I think that's a reasonable solution here.

Sorry for losing the context here: I'm still +1 on the addition of a close() method, whether or not we also go forward with #9998

@charris charris changed the title ENHANCE: add nditer.close to solve writeback semantics in nditer ENH: add nditer.close to solve writeback semantics in nditer Mar 29, 2018
@mattip
Copy link
Member Author

mattip commented Apr 21, 2018

Handled via PR #9998

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants