Skip to content

ENH: Inverse indices from np.unique() sharing the input array's shape #20638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
honno opened this issue Dec 21, 2021 · 4 comments
Closed

ENH: Inverse indices from np.unique() sharing the input array's shape #20638

honno opened this issue Dec 21, 2021 · 4 comments

Comments

@honno
Copy link
Contributor

honno commented Dec 21, 2021

Currently inverse indices are always(?) a 1-D array. This means input arrays with !=1 ndims cannot just be reconstructed by indexing the unique values with the inverse indices e.g. in the docs we have the following reconstruction example

>>> a = np.array([1, 2, 6, 4, 2, 3, 2])
>>> u, indices = np.unique(a, return_inverse=True)
>>> u[indices]
array([1, 2, 6, 4, 2, 3, 2])

but we can't do the same for 0-D/2+ dimensional arrays

>>> a = np.array([[1, 2, 6], [2, 3, 2]])
>>> u, indices = np.unique(a, return_inverse=True)
>>> u[indices]
array([1, 2, 6, 2, 3, 2])  # shape is not 2-D

Right now you need to use reshape() to actually reconstruct the original array

>>> u[indices.reshape(a.shape)]
array([[1, 2, 6],
       [2, 3, 2]])

So IMO it would be a nice usability change for these inverse indices to share the input array shape so you can just do

>>> u[indices]
array([[1, 2, 6],
       [2, 3, 2]])

Another benefit is that you will fix a current bug in array_api.unique_inverse/array_api.unique_all, as the Array API spec specifies inverse indices should indeed share the same shape as the input array. Admittedly that can easily be solved with an internal reshaping. cc @asmeurer

I wonder if there's some reasoning I'm missing for always returning 1-D arrays. Maybe integer indexing came after return_inverse was added. And whilst the shape of the returned inverse indices is not documented, I wonder if there are unintended consequences to changing this behaviour for downstream libraries.

@Dan-Patterson
Copy link

perhaps the axis parameter is what you are looking for and perhaps return_index if you want the same order.

a = np.array([[1, 2, 6], [2, 3, 2], [1, 2, 6]])
u, idx, cnts = np.unique(a, return_index=True, return_counts=True, axis=0)

u
array([[1, 2, 6],
       [2, 3, 2]])

a[np.sort(idx)]
array([[1, 2, 6],
       [2, 3, 2]])

cnts
array([2, 1], dtype=int64)`

@seberg
Copy link
Member

seberg commented Dec 23, 2021

I agree that returning the original shape seems strictly more useful, even though axis=None does usually mean "work on this as if it is 1-D", applying that meaning to an output that has naturally the same shape as the input (and even reconstructs it) seems a bit weird.
I am not even sure that axis=range(arr.ndim) could possibly denote the requested version either. My current thought is: seems right when it comes to ideal future, but I am not yet sure how the transition can/would look like.

@honno
Copy link
Contributor Author

honno commented Jan 22, 2024

If I'm correct in saying #25553 resolves this PR, but hesitant to close just yet, i.e. from Ralf

This broke SciPy in two places, in ways that were a little nontrivial to diagnose: scipy/scipy#19868. The fix does make sense, but more complaints may roll in - this probably broke most usages of the returned reverse indices.

I don't know, I'd give it a week or so - if no issues turn up for other libraries that test against numpy nightlies, it's probably okay to keep the change.

@seberg
Copy link
Member

seberg commented Jul 13, 2024

Closing, the behavior is now:

  • With axis=None, the inverse should indeed have the same shape.
  • With axis=0, 1, ... the inverse must remain a 1-D array.

This is now the case for unique, it isn't fixed for unique_inverse, I think that is a mistake, but I opened on issue in the array API and that is as far as I am willing to annoy/push this.

FWIW, the data-api website right now says:

The array must have the same shape as x and have the default array index data type.
Which is incorrect as it is impossible for axis=0. This was interpreted as "broadcastable" along the way, I guess. Which is unfortunately the wrong concept for unique (it doesn't behave like argmin, the non-specified axes are core axes, not broadcast axes).

@seberg seberg closed this as completed Jul 13, 2024
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this issue Jan 24, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this issue Feb 4, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this issue Feb 4, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this issue Feb 4, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this issue Feb 4, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this issue Feb 4, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this issue Feb 5, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this issue Feb 5, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
brendan-m-murphy added a commit to brendan-m-murphy/pytensor that referenced this issue Feb 6, 2025
From numpy PR numpy/numpy#22449,
the repr of scalar values has changed, e.g. from "1" to
"np.int64(1)", which caused two doctests to fail.

The doctest for tensor.extra_ops.Unique was failing because
the output shape for the inverse indices has changed when
axis is None: numpy/numpy#20638
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants