-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
ENH: Use array indexing preparation routines for flatiter objects #28590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
198df6b
to
75aaed0
Compare
75aaed0
to
9f2d51f
Compare
assert_raises(ValueError, ia, x.flat, s, np.zeros(9, dtype=float)) | ||
assert_raises(ValueError, ia, x.flat, s, np.zeros(11, dtype=float)) | ||
assert_raises(IndexError, ia, x.flat, s, np.zeros(9, dtype=float)) | ||
assert_raises(IndexError, ia, x.flat, s, np.zeros(11, dtype=float)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is certainly more consistent and I'd even call it a bugfix, it is a behavior change and someone might have code relying on the old behavior. Needs a release note at least. You also need another release note for the new features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Added a release note that lists all the most important changes.
8f2b322
to
33109dd
Compare
This is a big refactor, so I think we'll need at least two experienced developers to go over the C code changes, so that might take a while. I'll try to do a pass focusing on the correctness of the C code soon. On a first, high-level pass this looks like mostly simplification and cleanup. I think you should also try running the indexing benchmarks to see if there are any significant regressions in existing benchmarks. I think It would also be nice to get new entries in the |
These are the results of running the (old & new) benchmarks:
It looks like having special cases for tuple, ellipses etc. (instead of going through |
Probably |
I added a couple of special cases for an empty tuple and boolean indexes. This fixes the two worst performance regressions. I feel that the rest are acceptable, since this goes through a much more complex code path to make sure that everything is set up correctly. |
I added the 2.3 milestone to make sure we don't drop reviewing this before cutting the release. |
@ngoldbaum I am about to push this off to 2.4 unless you want to put it in very soon. |
I spoke with Lysandros and he said it's OK to push this off. We'll coordinate on getting this reviewed soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not looking at it much. Overall looks nice, I need to do a pass to see for refcount issues, etc.
I am slightly worried that some of the bad bool cases should maybe have a FutureWarning
(or just go to an error for a bit?!), to enforce correct behavior.
Overall, I am happy that this seemed to have worked well to integrate, the diff is a bit unwieldy, but it can't be helped.
@seberg Sorry for taking so long with this and thanks a lot for the thorough review. It was really helpful! I've addressed all of the feedback so it should be in a better state now. Looking foward to hearing your thoughts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this is so tedious. I have to look further, but wanted to comment for those things.
I do wonder more now if we shouldn't push the errors/warning down into the index prep helper. As annoying as it is, we just duplicate things and while that code is spaghetti, it is at least pretty linear spaghetti.
if (!PyArray_Check(ind) && !PyArrayIter_Check(ind)) { | ||
PyErr_SetString(PyExc_IndexError, | ||
"Non-array indices are not supported because they can lead to unexpected behavior. " | ||
"This is expected to change in future versions."); | ||
goto finish; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tedious :/. I don't think it is right here, we should move things down because arr.flat[[1, 2, 3, 4]]
is really OK.
So there are two things here:
a.flat[[1.]]
floats are at this point, rejected by the original preparation. Changing that this stops working is also the biggest break (of sensible behavior). I could see adding a branch to the index-prep code to deprecate it there (or we gamble and see if someone notices, even considering that array indexing is strict for a decade, I am not sure).a.flat[[True, False]]
was always nonsense, and I am certainly happy to not worry about deprecation. Two possible solutions:- Just move this (similarly) into the
HAS_BOOL
branch. That works, although the current will also fail forarr.flat[np.array([True, False]),]
as it is a tuple. Admittedly, that is rather niche. - We also push this into the prep-index code, where we have the original object at hand to give the warning.
- Just move this (similarly) into the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The float case is now fixed in that it works but is derecated (added a branch in the index-prep code). I'm a bit confused about the bool case though. When bool indices are used that do not match the iterator's shape, that raises an IndexError. That's what we want, so I think we're okay like this?
Friendly ping here. I'm gonna be on PTO starting Aug 11th, so if we could get another round of feedback on this before I go, that'd be very helpful! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, had a look through. I think two of the comments are probably relevant (avoid the often unnecessary copy and actually raise the error we said we would raise).
But otherwise, I think this is turning out very nice and much simpler than the old code (even if that means making the index prep function slightly larger).
else { | ||
Py_INCREF(ind); | ||
obj = ind; | ||
if (index_type == HAS_BOOL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The release notes promise that this raises an error for the cases were the result would change.
We have to actually add that error here (or actually probably easier in the index prep function as we don't have to worry about tuples there).
That is, raise a specific IndexError
that informs the user that this will be treated as a boolean index in the future (depending on where the error is, a non-matching length of the boolean one may be raised earlier, I don't care):
In [6]: a = np.arange(3)
In [7]: a[[True, False, True]]
Out[7]: array([0, 2]) # changed behavior
May be nice to mention that this will be allowed in the future and that np.asarray(index)
is a work-around.
Or we decide to live with it and put it more prominently in the release notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay now. This path raises as well.
Tests look good. This should be really close now @seberg! Also, thanks for the reviews and the patience! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, there is one issue around []
I think, but I'll just apply the fix and hope CI passes :).
Py_DECREF(obj); | ||
PyErr_SetString(PyExc_IndexError, | ||
"only integers, slices (`:`), ellipsis (`...`) and integer or boolean " | ||
"arrays are valid indices" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I suspect this can't be reached, but it seems good to have as a back-stop. I'll just apply the ==
above and hope it works fine.
(Just to be a bit pedantic.)
Unfortunately the change to include
|
I want to fix it to do the same thing as for arrays, but I am slightly confused why |
That's probably because of this new
The string array is cast to an integer array and since it's got no elements, there's no issues there. |
Oh, nvm, let's just just remove the test, or only test for a string array. The array case just lets this one pass, because it can't doesn't distinguish between a |
The point is, that it isn't a string array, its a string flatiter... And we never handle it correctly, so I don't think we should care for this PR. |
"only integers, slices (`:`), ellipsis (`...`)%s and integer or boolean " | ||
"arrays are valid indices", | ||
is_flatiter_object ? "" : ", numpy.newaxis (`None`)" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that this also changes the array path. We may want to add a change release note, but overall it's not super important.
(Also wonder whether we should chain the error, but doubt that it is helpful).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't for this specific case. Do you know of any examples that would reach this?
>>> import numpy as np
>>> a = np.arange(3)
>>> b = np.array(["a"], dtype="S")
>>> a[b]
Traceback (most recent call last):
File "<python-input-8>", line 1, in <module>
a[b]
~^^^
IndexError: arrays used as indices must be of integer (or boolean) type
>>> a.flat[b.flat]
Traceback (most recent call last):
File "<python-input-9>", line 1, in <module>
a.flat[b.flat]
~~~~~~^^^^^^^^
IndexError: only integers, slices (`:`), ellipsis (`...`) and integer or boolean arrays are valid indices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I suppose it is practically impossible to reach indeed, since it only enters for length zero.
Thanks @lysnikolaou this was a really nice cleanup, I think. And fixing up flatiter a bit more is also nice. |
prepare_index
initer_subscript
anditer_ass_subscript
. This fixes various cases that were broken before:arr.flat[[True, True]]
arr.flat[[1.0, 1.0]]
arr.flat[()] = 0
flatiter
indexing operationsCloses #28314.