Skip to content

ENH: Use array indexing preparation routines for flatiter objects #28590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Aug 8, 2025

Conversation

lysnikolaou
Copy link
Member

  • Use prepare_index in iter_subscript and iter_ass_subscript. This fixes various cases that were broken before:
    • arr.flat[[True, True]]
    • arr.flat[[1.0, 1.0]]
    • arr.flat[()] = 0
  • Add more extensive tests for flatiter indexing operations

Closes #28314.

@lysnikolaou lysnikolaou changed the title [ENH] Use array indexing preparation routines for flatiter objects ENH: Use array indexing preparation routines for flatiter objects Mar 26, 2025
@lysnikolaou lysnikolaou force-pushed the use-prepare-index-flatiter branch from 198df6b to 75aaed0 Compare March 26, 2025 10:23
@lysnikolaou lysnikolaou force-pushed the use-prepare-index-flatiter branch from 75aaed0 to 9f2d51f Compare March 26, 2025 10:30
assert_raises(ValueError, ia, x.flat, s, np.zeros(9, dtype=float))
assert_raises(ValueError, ia, x.flat, s, np.zeros(11, dtype=float))
assert_raises(IndexError, ia, x.flat, s, np.zeros(9, dtype=float))
assert_raises(IndexError, ia, x.flat, s, np.zeros(11, dtype=float))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this is certainly more consistent and I'd even call it a bugfix, it is a behavior change and someone might have code relying on the old behavior. Needs a release note at least. You also need another release note for the new features.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Added a release note that lists all the most important changes.

@lysnikolaou lysnikolaou force-pushed the use-prepare-index-flatiter branch from 8f2b322 to 33109dd Compare March 28, 2025 13:52
@ngoldbaum
Copy link
Member

This is a big refactor, so I think we'll need at least two experienced developers to go over the C code changes, so that might take a while. I'll try to do a pass focusing on the correctness of the C code soon. On a first, high-level pass this looks like mostly simplification and cleanup.

I think you should also try running the indexing benchmarks to see if there are any significant regressions in existing benchmarks. I think bench_indexing already captures several workflows that go through the changed low-level C code path.

It would also be nice to get new entries in the FlatIterIndexing benchmark for newly added functionality.

@ngoldbaum ngoldbaum self-assigned this Mar 28, 2025
@lysnikolaou
Copy link
Member Author

lysnikolaou commented Mar 28, 2025

These are the results of running the (old & new) benchmarks:

| Change   | Before [93898621] <main>   | After [cfcdabf0] <use-prepare-index-flatiter>   |     Ratio | Benchmark (Parameter)                                       |
|----------|----------------------------|-------------------------------------------------|-----------|-------------------------------------------------------------|
| +        | 87.0±0.4ns                 | 43.0±0.2ms                                      | 494276    | bench_indexing.FlatIterIndexing.time_flat_empty_tuple_index |
| +        | 115±3ns                    | 479±8ns                                         |      4.15 | bench_indexing.FlatIterIndexing.time_flat_bool_index_0d     |
| +        | 39.4±0.3ms                 | 42.8±0.3ms                                      |      1.09 | bench_indexing.FlatIterIndexing.time_flat_ellipsis_index    |
| +        | 3.95±0.04ms                | 4.29±0.07ms                                     |      1.09 | bench_indexing.FlatIterIndexing.time_flat_slice_index       |

It looks like having special cases for tuple, ellipses etc. (instead of going through prepare_index) did have an impact on performance. Should we try and keep those special cases in?

@ngoldbaum
Copy link
Member

Should we try and keep those special cases in?

Probably

@lysnikolaou
Copy link
Member Author

Should we try and keep those special cases in?

Probably

I added a couple of special cases for an empty tuple and boolean indexes. This fixes the two worst performance regressions. I feel that the rest are acceptable, since this goes through a much more complex code path to make sure that everything is set up correctly.

@ngoldbaum ngoldbaum added this to the 2.3.0 release milestone Apr 23, 2025
@ngoldbaum
Copy link
Member

I added the 2.3 milestone to make sure we don't drop reviewing this before cutting the release.

@charris
Copy link
Member

charris commented May 19, 2025

I added the 2.3 milestone

@ngoldbaum I am about to push this off to 2.4 unless you want to put it in very soon.

@ngoldbaum
Copy link
Member

I spoke with Lysandros and he said it's OK to push this off. We'll coordinate on getting this reviewed soon.

@charris charris modified the milestones: 2.3.0 release, 2.4.0 release May 19, 2025
@seberg seberg added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Jun 11, 2025
@seberg seberg self-requested a review June 11, 2025 17:49
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not looking at it much. Overall looks nice, I need to do a pass to see for refcount issues, etc.

I am slightly worried that some of the bad bool cases should maybe have a FutureWarning (or just go to an error for a bit?!), to enforce correct behavior.

Overall, I am happy that this seemed to have worked well to integrate, the diff is a bit unwieldy, but it can't be helped.

@lysnikolaou
Copy link
Member Author

@seberg Sorry for taking so long with this and thanks a lot for the thorough review. It was really helpful!

I've addressed all of the feedback so it should be in a better state now. Looking foward to hearing your thoughts.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this is so tedious. I have to look further, but wanted to comment for those things.
I do wonder more now if we shouldn't push the errors/warning down into the index prep helper. As annoying as it is, we just duplicate things and while that code is spaghetti, it is at least pretty linear spaghetti.

if (!PyArray_Check(ind) && !PyArrayIter_Check(ind)) {
PyErr_SetString(PyExc_IndexError,
"Non-array indices are not supported because they can lead to unexpected behavior. "
"This is expected to change in future versions.");
goto finish;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tedious :/. I don't think it is right here, we should move things down because arr.flat[[1, 2, 3, 4]] is really OK.

So there are two things here:

  1. a.flat[[1.]] floats are at this point, rejected by the original preparation. Changing that this stops working is also the biggest break (of sensible behavior). I could see adding a branch to the index-prep code to deprecate it there (or we gamble and see if someone notices, even considering that array indexing is strict for a decade, I am not sure).
  2. a.flat[[True, False]] was always nonsense, and I am certainly happy to not worry about deprecation. Two possible solutions:
    • Just move this (similarly) into the HAS_BOOL branch. That works, although the current will also fail for arr.flat[np.array([True, False]),] as it is a tuple. Admittedly, that is rather niche.
    • We also push this into the prep-index code, where we have the original object at hand to give the warning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The float case is now fixed in that it works but is derecated (added a branch in the index-prep code). I'm a bit confused about the bool case though. When bool indices are used that do not match the iterator's shape, that raises an IndexError. That's what we want, so I think we're okay like this?

@lysnikolaou
Copy link
Member Author

Friendly ping here. I'm gonna be on PTO starting Aug 11th, so if we could get another round of feedback on this before I go, that'd be very helpful!

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, had a look through. I think two of the comments are probably relevant (avoid the often unnecessary copy and actually raise the error we said we would raise).

But otherwise, I think this is turning out very nice and much simpler than the old code (even if that means making the index prep function slightly larger).

else {
Py_INCREF(ind);
obj = ind;
if (index_type == HAS_BOOL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release notes promise that this raises an error for the cases were the result would change.
We have to actually add that error here (or actually probably easier in the index prep function as we don't have to worry about tuples there).

That is, raise a specific IndexError that informs the user that this will be treated as a boolean index in the future (depending on where the error is, a non-matching length of the boolean one may be raised earlier, I don't care):

In [6]: a = np.arange(3)

In [7]: a[[True, False, True]]
Out[7]: array([0, 2])  # changed behavior

May be nice to mention that this will be allowed in the future and that np.asarray(index) is a work-around.


Or we decide to live with it and put it more prominently in the release notes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay now. This path raises as well.

@lysnikolaou
Copy link
Member Author

Tests look good. This should be really close now @seberg! Also, thanks for the reviews and the patience!

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, there is one issue around [] I think, but I'll just apply the fix and hope CI passes :).

Py_DECREF(obj);
PyErr_SetString(PyExc_IndexError,
"only integers, slices (`:`), ellipsis (`...`) and integer or boolean "
"arrays are valid indices"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I suspect this can't be reached, but it seems good to have as a back-stop. I'll just apply the == above and hope it works fine.

(Just to be a bit pedantic.)

@lysnikolaou
Copy link
Member Author

lysnikolaou commented Aug 8, 2025

Unfortunately the change to include PyArray_SIZE(tmp_arr) != 0 fails the following test because it doesn't warn anymore. What do we wanna do here? Is this acceptable?

    def test_empty_string_flat_index_on_flatiter(self):
        a = np.arange(9).reshape((3, 3))
        b = np.array([], dtype="S")
        with pytest.warns(DeprecationWarning,
                          match="Invalid non-array indices for iterator objects are "
                                "deprecated"):
            assert_equal(a.flat[b.flat], np.array([]))

@seberg
Copy link
Member

seberg commented Aug 8, 2025

I want to fix it to do the same thing as for arrays, but I am slightly confused why np.arange(3)[np.array([], dtype="S")] does the right thing.

@lysnikolaou
Copy link
Member Author

That's probably because of this new if in mapping.c.

            if (PyArray_SIZE(tmp_arr) == 0
                || (is_flatiter_object && !PyArray_ISINTEGER(tmp_arr) && !PyArray_ISBOOL(tmp_arr))) {

The string array is cast to an integer array and since it's got no elements, there's no issues there.

@seberg
Copy link
Member

seberg commented Aug 8, 2025

Oh, nvm, let's just just remove the test, or only test for a string array. The array case just lets this one pass, because it can't doesn't distinguish between a [] and array([]).flat.

@seberg
Copy link
Member

seberg commented Aug 8, 2025

The point is, that it isn't a string array, its a string flatiter... And we never handle it correctly, so I don't think we should care for this PR.

"only integers, slices (`:`), ellipsis (`...`)%s and integer or boolean "
"arrays are valid indices",
is_flatiter_object ? "" : ", numpy.newaxis (`None`)"
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that this also changes the array path. We may want to add a change release note, but overall it's not super important.
(Also wonder whether we should chain the error, but doubt that it is helpful).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't for this specific case. Do you know of any examples that would reach this?

>>> import numpy as np
>>> a = np.arange(3)
>>> b = np.array(["a"], dtype="S")
>>> a[b]
Traceback (most recent call last):
  File "<python-input-8>", line 1, in <module>
    a[b]
    ~^^^
IndexError: arrays used as indices must be of integer (or boolean) type
>>> a.flat[b.flat]
Traceback (most recent call last):
  File "<python-input-9>", line 1, in <module>
    a.flat[b.flat]
    ~~~~~~^^^^^^^^
IndexError: only integers, slices (`:`), ellipsis (`...`) and integer or boolean arrays are valid indices

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I suppose it is practically impossible to reach indeed, since it only enters for length zero.

@seberg seberg merged commit a0515ad into numpy:main Aug 8, 2025
79 checks passed
@seberg
Copy link
Member

seberg commented Aug 8, 2025

Thanks @lysnikolaou this was a really nice cleanup, I think. And fixing up flatiter a bit more is also nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Fix flatiter indexing
4 participants