-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Fix index_tricks issue #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ouch, it looks like the new code depends on ndindex(*shape). This is the wrong fix, then. |
The build error on the 2nd try is not related to the pull request.. |
This still fails:
Should be:
|
Yeah, you're right. That is because the nditer does not accept an empty shape. I had to put in somewhat of a hack to make it work.. ndindex now checks if the given shape was empty and returns an empty index in that case. |
@@ -533,8 +533,13 @@ class ndindex(object): | |||
|
|||
""" | |||
def __init__(self, *shape): | |||
# Semi-backward compatibility check. | |||
if len(shape) and isinstance(shape[0], tuple): shape = shape[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if
statements should always be split across two lines. I'd also make this if len(shape) == 1
for clarity. And the comment really should explain what exactly this is accomplishing -- I think this is for handling both ndindex(2, 3)
and ndindex((2, 3))
syntax?
And if that's correct, then we need a test that both syntaxes are handled so this doesn't break again.
This fixes an issue with ndindex shape tuple recognition, and an issue in the nditer where scalar input did not produce an empty index tuple. To be able to fix nditer, an extra flag has been added: NPY_ITFLAG_SCALAR and a new function NpyIter_IsScalar has been added to the nditer API. Also a few tests have been added to make sure the ndindex behaves as intended.
Ok, I implemented the nditer changes as Mark suggested and incorporated it into a new commit together with the suggestions of Nathaniel. This changes the nditer API and, as such, the NumPy API. Someone should check if the API number has to increase, as I don't know if we are still in between releases. |
Again, the build failure is not related.. |
Is there a short explanation of why we need to add a special-case hack to the nditer API for handling arrays with this particular shape? 0-dim arrays are exotic looking at first glance, but in fact almost everything about them follows naturally from the rules for handling generic n-dim arrays without any special cases, so I haven't had time to dig into the API in depth but this NpyIter_IsScalar function really smells funny... |
Not at all, it is very simple in fact. An iterator needs data and dimensions to iterate. If you do not present dimensions, it will not be possible to iterate at all. You have to at least be able to count to 1 to make a cycle. Therefore, counting zero is not intuitive for an iterator and thus needs to be special-cased. |
Here is it in code form:
|
Yes, but not necessarily at the API-level. My question is do we really need a new exposed API function. On Sep 20, 2012, at 9:28 AM, Han Genuit wrote:
|
Ah. I guess not, although it would fit in with the rest of the code.. I'll disable the API part. |
For purposes of getting 1.7 out the door, the current patch (without the API change) seems fine to me, as would dropping the original ndindex patch from 1.7 entirely. (Obviously I was overoptimistic here... this patch should never have been in 1.7 in the first place.) For the larger question of how 0-dim arrays should be handled, I did spend a few minutes looking at the nditer API, and my observations are:
|
[I seem to have confused myself... I wrote the following message earlier, and thought I'd posted it, but only just discovered it sitting unposted in another tab. So just pretend this comes before my previous comment on this thread...] @87: I'm afraid I don't understand what you're saying at all. Obviously an iterator needs something to iterate over. And obviously iterating over the dimensions of a zero-dimensional array will take zero iterations. But nditer is a tool for iterating over data values, not for iterating over dimensions. The number of values in an array is prod(shape), which for a 0-dim array is 1. So there's no problem iterating over the values in a 0-dim array... I looked through your list of special cases, and it seems to be (1) code that for some reason has decided that 0-dim arrays should be automatically flattened, for unknown and arguably buggy reasons (np.insert, np.nonzero -- the latter in particular simply doesn't satisfy its documented invariants for 0-dim arrays), (2) code that is looking for a scalar, and will cast 0-dim arrays to scalar as necessary, (3) a binary IO routine I didn't bother to unravel. None of these are anywhere near as central to the code as the core iteration routines, and none of them require the user of an API to use inconsistent hacks to make the API work properly... |
Before this drags into a discussion we don't want to have, let's be plain: I want to fix stuff, not have endless arguments about what should and should not theoretically be possible. The nditer iterates over dimensions, as well as data. The zero-dimensional array is in itself a special case, because naturally it should not be an array. The reason you have to increase the dimension, is to be able to handle it like an array, otherwise you get the same result as shown above (#nothing). I realized the examples from the search were not very complete, but if you look through the code you will find them. The 'hack' is less inconsistent than one might think, because it compensates for the fact that the dimension had to be increased in order to be able to return a value, which in turn led to the problems with ndindex. This only works for the multi_index, which can seem inconsistent, but it is the only place where this can work.
|
Hmm, I read your last message first, sorry about that. |
I also would rather fix stuff than have endless theoretical arguments, but sometimes it's hard to know what "fixed" means without understanding the theory :-). I'm not sure how to explain this briefly, but I think your understanding of how 0-d arrays fit into the basic numpy data model (data pointer + strides + shape + dtype) is just incomplete. In that model, 0-d arrays naturally exist, and work exactly the way they actually do work in numpy (e.g., holding a single value). Not having 0-d arrays would require a special case in the code just to check for and disallow it. The error message you got in your code example is misleading - 0-d arrays can be indexed. You just have to use a 0-d index to do so, and your code is using a 1-d index. It's like writing In any case, like Travis says, even if the nditer code needs some special cases internally to handle 0-d arrays, it shouldn't export those special cases as part of the API... |
I think the correct behavior is for the iterator to immediately yield a StopIteration on an empty array. @njsmith As for dropping the patch, I think the situation is much better now than before: we have unit tests (there were none before), and the minor issue of 0-d arrays is addressed by @87 's patch. True, we can't always know what we break by such a refactoring, but now that we have tests we can guarantee continued consistent behavior in the future. |
@stefanv: That's true, but a 0-dim array is totally different from an empty array (= an array with a 0-length dimension). And re: dropping the patch, it isn't a judgement on anything (and Ondrej may not do it), it's just that as a policy, backporting new features into an already-delayed release branch is always a bad idea :-). |
@njsmith Ah, I didn't realize that was what we were talking about, sorry! |
I am not familiar with the theoretical model, but I am familiar with the implementation. The reason a zero-dimensional array works the way it works is because strides and shape are allowed to be NULL while size is 1. The reason
But it has special properties in any case:
How would you track a 0-dimensional index and iterate over non-iterable data without using a special case? |
The current indexing code has all kinds of arcane and likely misguided warts in it... but here's the idea I'm thinking of. Let's say we have an array
This function has no special-case for 0-d arrays, but it works correctly for all arrays, including 0-d arrays (and discontiguous arrays, and weird stride-trick-using arrays, and 0-sized arrays, etc.). It's quite beautiful actually how simple it is.
|
Hmm, that principle works elegantly in Python, but I am not sure about C. If all the input is standardized (e.g. coerced to a standard known data-format, that includes zero-dimensions) before being processed, it might work. It takes advantage of some principles, like multiplying with an empty array gives an empty array and the sum of an empty array is zero. And I am not sure what the type of the strides of the strides would be, for instance. And I would be concerned about the overhead, but as idea it looks nice. |
It's not really Python, it's Math with a python-flavoured notation... the "standard known data-format" here is exactly what ndarray is, strides are integer counts of bytes, and the sum of an empty sequence is always zero, even in C:
This is exactly how the numpy C code actually works, except that the numpy C code has lots of extra spaghetti to handle different sorts of integers, slices, the special case for 1-d indexing returning (generalized) rows, blah blah blah. But this is the basic structure underneath all of that. |
Okay, if you would use a C array of byte counts for strides, how would that map to a zero-dimensional array? |
For a 0-d array the strides are just an empty C array.
|
So, basically, what you propose is to have a singular indexing routine for requesting data from an array, and have a offset routine that outputs zero if nothing happens (e.g. offset is zero for zero-dimensional array). That might work. |
But I still think the interfacing would need to have special cases for iterators, for instance, or working with all kinds of different indexing types, thus it would not be that simple.. |
In any case, it would require a large rewrite of the current code base, imo, and I'm not convinced the gain would outweigh the effort. This patch fixes an issue that was exposed by 5a471b5, which implemented a nice performance improvement. The API change has been reverted. Why not accept this as fix and be done with it? |
+1 Iterating over 0-d is uncommon enough that a band-aid is more than sufficient for now. |
I'm not talking about rewriting the code-base, just fixing the nditer API so that it doesn't have weird special cases and produce inconsistent results (like the .ndim value being set wrong for 0-d arrays). That has to be fixed sooner or later, so if I can convince you do it now then numpy will be better. If not then it will just stay on the TODO list, that's all :-). If the choice is between this patch and nothing then we should accept this patch -- it doesn't fix the underlying problem, but since it doesn't change the public API, it doesn't make it worse either. |
(toggling closed/open to ping travis-ci) |
Hmm, I see there are some new errors from CI, I'll have to take a look at it. |
Hmm, yeah, some non-deterministic-looking failure in |
This should be superceded by gh-3104, so closing. |
Commit 5a471b5 caused an issue with ndindex, which did convert the incoming shape information into an extra tuple, which in turn corrupted the array interface.
The ndindex class now accepts both packed and unpacked tuples as shape information. Also, the nditer now returns an empty index for scalar values.