Skip to content

BUG: it.close() disallows access to iterator, fixes #10950 #10951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 23, 2018

Conversation

mattip
Copy link
Member

@mattip mattip commented Apr 21, 2018

PR #9998 added a close method to nditer that resolved writeback semantics but did not actually disallow access to the iterator operands

@@ -2847,7 +2847,7 @@ def test_writebacks():
enter = it.__enter__
assert_raises(ValueError, enter)

def test_close():
def test_close_equivalent():
''' using a context amanger and using nditer.close are equivalent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amanger -> manager

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

assert_equal (next(it), 0)
it.close()
assert_raises(StopIteration, next, it)
assert_raises(ValueError, getattr, it, 'operands')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe allow access to the operands even after .close()?

Maybe I'm forgetting how this all works, but I think the operands are still safe to use after closing. It is the temp buffers which are unsafe.

In fact, isn't it the case that the output operand should only be used by other code after close is called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to make explicit that it.operands can no longer be trusted, it may retrieve stale data from a closed iterator. I will add documentation to that effect, but am open to other opinons

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with @ahaldane on this one. Doesn't operands point to the array to write back to, in which case it's not even valid until after the it.close() is called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to document the behavour wrt writeback semantics here
If writeback semantics were used and have been resolved, the operand's data is no longer related to the input array's data. If writeback semantics were not used, the operand shares data with the input array. So we could create a situation where code is fragile: it works without writeback semantics but fails when they are used.

Note this was true even before PR #9998. The previous trigger to resolve was a call to NpyIter_Deallocate, which was done by del it or it = None. Since it was no longer valid, new access to the operands was prevented.

We started discussing this in the nditer.close pull request in Dec, but I guess it never really got resolved.

I am open to suggestions. The motivation/root-cause of this whole mess is that the operand can have a different, non-compatible dtype from the array. At some point that incompatibility needs to be resolved, and from that point forward the operand is unusable or we need some kind of flag to track between the (now writeable) data in the array and the still-valid(?), writeable operand.

Copy link
Member

@ahaldane ahaldane Apr 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, you're right, I was misunderstanding it. The it.operands are indeed the temporary buffers, and not the original output array, and so should not be accessed after closing. Eg,

>>> a = np.arange(24, dtype='f8')
>>> aiter = np.nditer(a, [], [['readwrite', 'updateifcopy']],
...                  casting='same_kind',op_dtypes=[np.dtype('f4')])
>>> with aiter as it:
...     x = it.operands[0]
>>> x[0] = 10
>>> a[0]
0.0

But in that case all the examples above which do things like return it.operands[1] don't make sense, since they appear to be returning the temp buffer rather than the original array. We need to provide a way for the user to access the allocated output arrays even after the iterator is closed.

Perhaps it.operators should point to the original array instead of the temp buffer? Or (more likely) maybe we need to add a new attribute to the nditer object which points to the original arrays.

Copy link
Member

@ahaldane ahaldane Apr 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nditer docstring says:

operands | (tuple of operand(s)) The array(s) to be iterated over.

which sounds to me like it is describing the "original" arrays, not the buffer copies. All of the code which does return it.operands[1] seems to implicitly assume it.operands refers to the original arrays. ... except we just found this isn't true.

The one exception is in the reduction examples, which start by doing it.operands[1][..] = 0 to initialize the reduction, which assumes that it.operands refers to the buffers. However, there seems to be a whole complicated discussion of a special flag delay_bufalloc and a special method it.reset() which, looking at it in retrospect, seem to me to be hacks only necessary because of the lack of distinction between the "original" arrays and the buffer arrays which we have discovered above.

Maybe now that we have a context manager, we can deprecate all of that and add some better behavior? That is, you would do it.operands[1][..] = 0 before the context is entered, (so context entrance replaces it.reset)?

Maybe that's all going too far, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the examples should do

-     return it.operands[1]
+    if out:
+        return out
+    else:
+        return it.operand[1]

As for delay_bufalloc it relates to the buffered flag, which allocates memory independent of the operands.

@@ -20,7 +20,7 @@

typedef struct NewNpyArrayIterObject_tag NewNpyArrayIterObject;

enum NPYITER_CONTEXT {CONTEXT_NOTENTERED, CONTEXT_INSIDE, CONTEXT_EXITED};
enum NPYITER_CONTEXT {CONTEXT_NOTENTERED, CONTEXT_INSIDE, CLOSED};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the distinction between CONTEXT_NOTENTERED and CONTEXT_INSIDE even used anywhere? I feel like npy_bool is_closed might be all that is needed here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adopted

@@ -1517,7 +1517,8 @@ static PyObject *npyiter_itviews_get(NewNpyArrayIterObject *self)
static PyObject *
npyiter_next(NewNpyArrayIterObject *self)
{
if (self->iter == NULL || self->iternext == NULL || self->finished) {
if (self->iter == NULL || self->iternext == NULL ||
self->finished || (self->managed == CLOSED)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would drop the inner parens here - they're inconsistent with the rest of the expression

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is strict improvement over master, and the .operands discussion was something we should have folded into the original issue.

So perhaps we should merge this, and open a new issue about how to handle it.operands?

@ahaldane
Copy link
Member

Fair enough, let's do that. I'll go ahead with the merge.

Thanks Matti!

@ahaldane ahaldane merged commit f2888db into numpy:master Apr 23, 2018
@mattip mattip deleted the nditer-close-fixes branch May 7, 2018 05:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants