Skip to content

Revert returning a single NaN for np.unique in 1.22.0? #20326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rgommers opened this issue Nov 8, 2021 · 18 comments
Closed

Revert returning a single NaN for np.unique in 1.22.0? #20326

rgommers opened this issue Nov 8, 2021 · 18 comments

Comments

@rgommers
Copy link
Member

rgommers commented Nov 8, 2021

This mailing list thread discusses reverting the change in 1.21.0 to make unique return a single nan in its output when the input array contains multiple nans. This issue is to track that, needs a decision before 1.22.0rc1 I think.

@rgommers
Copy link
Member Author

rgommers commented Nov 8, 2021

Also see gh-19655, which shows that np.ma.unique still returns multiple nans in 1.21.0.

@asmeurer
Copy link
Member

asmeurer commented Nov 8, 2021

If NumPy keeps its current behavior (single NaN), it would be useful to have an internal function that computes the old behavior, so that it can be used in the array_api submodule which requires the old, multiple NaN behavior. Otherwise it would have to workaround this in a way that involves building a new array from the result from np.unique.

The inverse, if we do revert, could also be useful (or it could be added as a flag to unique), but wouldn't be required as far as array_api is concerned.

@asmeurer
Copy link
Member

asmeurer commented Nov 8, 2021

By the way, it would be useful to add a label in the issue tracker for numpy.array_api.

@rgommers
Copy link
Member Author

rgommers commented Nov 8, 2021

By the way, it would be useful to add a label in the issue tracker for numpy.array_api.

Done: https://github.com/numpy/numpy/pulls?q=is%3Aopen+is%3Apr+label%3A%22component%3A+numpy.array_api%22

@charris
Copy link
Member

charris commented Nov 8, 2021

@asmeurer I was thinking a keyword would be useful, something line equal_nans=True. Note that if nans are ever made singletons, the old behavior will change anyway.

asmeurer added a commit to asmeurer/numpy that referenced this issue Nov 9, 2021
unique() in the array API was replaced with three separate functions,
unique_all(), unique_inverse(), and unique_values(), in order to avoid
polymorphic return types.

Additionally, it should be noted that these functions to not currently conform
to the spec with respect to NaN behavior. The spec requires multiple NaNs to
be returned, but np.unique() returns a single NaN. Since this is currently an
open issue in NumPy to possibly revert, I have not yet worked around this. See
numpy#20326.
@rgommers
Copy link
Member Author

rgommers commented Nov 9, 2021

@asmeurer I was thinking a keyword would be useful, something line equal_nans=True.

That may be a good option if we have implementations for both behaviors and we know that there is demand for both from downstream libraries.

@HaoZeke
Copy link
Member

HaoZeke commented Nov 10, 2021

I think it makes more sense to keep the new behaviour, returning NaNs as a single unique value instead of multiple; which is also the behaviour of R:

> a=c(NA,3,1,5,6,7,NA,2,NA,2)
> unique(a)
[1] NA  3  1  5  6  7  2

Not that is a strong case in its favor, but it is logically more consistent, since none of the NaNs are distinguishable (typically).

@rgommers
Copy link
Member Author

R's NA is not nan, it's "missing value" - R has both and they're conceptually different.

@HaoZeke
Copy link
Member

HaoZeke commented Nov 10, 2021

Sorry, my bad. Still the same behaviour though:

> a=c(NaN,3,1,5,6,7,NaN,2,NaN,2)
> unique(a)
[1] NaN   3   1   5   6   7   2

For reference:

  1. NA (Not Available)
  2. NaN Numbers

@seberg
Copy link
Member

seberg commented Nov 10, 2021

In the meeting, we seemed to converge on: lets just add a kwarg. The not quite clear followup to that is what the default should be.

@rgommers
Copy link
Member Author

Agreed with a kwarg, because there's a desire for both sets of behaviors. It's indeed unclear what the default should be. This was a silent change in behavior in 1.21.0, which could have broken existing valid code (e.g. people chopping off the last values of some returned array, rather than post-processing with np.isnan). So if we had caught this sooner, it'd be pretty clear-cut - then the default value should have been backwards-compatible, i.e. equal_nans=False.

Now that we're 4-5 months in though, it's more fuzzy. There may be people who updated their code, and changing it back would break them again. So I'd say perhaps we should leave it as it is in 1.21.0.

Leaving this on the 1.22.0 milestone, we should add this option even if we only get to it post .rc1 (or perhaps we should make sure to include it - @charris?).

@charris
Copy link
Member

charris commented Nov 12, 2021

we should add this option even if we only get to it post .rc1

Should not be a problem as long as there is a test for both behaviors.

rgommers pushed a commit that referenced this issue Nov 12, 2021
* Allow casting in the array API asarray()

* Restrict multidimensional indexing in the array API namespace

The spec has recently been updated to only require multiaxis (i.e., tuple)
indices in the case where every axis is indexed, meaning there are either as
many indices as axes or the index has an ellipsis.

* Fix type promotion for numpy.array_api.where

where does value-based promotion for 0-dimensional arrays, so we use the same
trick as in the Array operators to avoid this.

* Print empty array_api arrays using empty()

Printing behavior isn't required by the spec. This is just to make things
easier to understand, especially with the array API test suite.

* Fix an incorrect slice bounds guard in the array API

* Disallow multiple different dtypes in the input to np.array_api.meshgrid

* Remove DLPack support from numpy.array_api.asarray()

from_dlpack() should be used to create arrays using DLPack.

* Remove __len__ from the array API array object

* Add astype() to numpy.array_api

* Update the unique_* functions in numpy.array_api

unique() in the array API was replaced with three separate functions,
unique_all(), unique_inverse(), and unique_values(), in order to avoid
polymorphic return types.

Additionally, it should be noted that these functions to not currently conform
to the spec with respect to NaN behavior. The spec requires multiple NaNs to
be returned, but np.unique() returns a single NaN. Since this is currently an
open issue in NumPy to possibly revert, I have not yet worked around this. See
#20326.

* Add the stream argument to the array API to_device method

This does nothing in NumPy, and is just present so that the signature is valid
according to the spec.

* Use the NamedTuple classes for the type signatures

* Add unique_counts to the array API namespace

* Remove some unused imports

* Update the array_api indexing restrictions

The "multiaxis indexing must index every axis explicitly or use an ellipsis"
was supposed to include any type of index, not just tuple indices.

* Use a simpler type annotation for the array API to_device method

* Fix a test failure in the array_api submodule

The array_api cannot use the NumPy testing functions because array_api arrays
do not mix with NumPy arrays, and also NumPy testing functions may use APIs
that aren't supported in the array API.

* Add dlpack support to the array_api submodule
@rgommers
Copy link
Member Author

This will not make it into 1.22.0rc1, so I suggest to stay with equal_nans=True as the default (matching 1.21.x behavior).

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Nov 16, 2021
@charris
Copy link
Member

charris commented Nov 16, 2021

Pushing off to 1.22.1, but might go into 1.22.0 if it arrives soon enough. Probably an enhancement at this point, but it would be nice to provide a compatibility path for array_api testing.

@asmeurer
Copy link
Member

It should be noted that numpy.array_api.unique (and related unique functions) currently do not conform to the array API spec, because I didn't find it worth trying to work around NumPy's current behavior given that a flag is planned. So when we add the flag, we should be sure to also use it in the array API submodule so that it becomes correct (unique is not yet tested in the array-api-tests test suite, but it should be added soon).

@charris
Copy link
Member

charris commented Jan 13, 2022

Pushing off to 1.23.

@seberg
Copy link
Member

seberg commented May 4, 2022

Brought it up briefly in the triage meeting, this seems like we need to push this one off. But IIRC we were probably going to add a new kwarg here.

@seberg seberg modified the milestones: 1.23.0 release, 1.24.0 release May 4, 2022
@charris
Copy link
Member

charris commented May 5, 2022

Adding a keyword looks like a fairly simple enhancement that we should try to get into 1.23. At this point I think it preferable to keep the current behavior as the default.

@mattip mattip self-assigned this May 18, 2022
@mattip mattip closed this as completed in 6cada27 Jun 1, 2022
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Sep 6, 2022
asmeurer added a commit to data-apis/array-api-strict that referenced this issue Jan 22, 2024
* Allow casting in the array API asarray()

* Restrict multidimensional indexing in the array API namespace

The spec has recently been updated to only require multiaxis (i.e., tuple)
indices in the case where every axis is indexed, meaning there are either as
many indices as axes or the index has an ellipsis.

* Fix type promotion for numpy.array_api.where

where does value-based promotion for 0-dimensional arrays, so we use the same
trick as in the Array operators to avoid this.

* Print empty array_api arrays using empty()

Printing behavior isn't required by the spec. This is just to make things
easier to understand, especially with the array API test suite.

* Fix an incorrect slice bounds guard in the array API

* Disallow multiple different dtypes in the input to np.array_api.meshgrid

* Remove DLPack support from numpy.array_api.asarray()

from_dlpack() should be used to create arrays using DLPack.

* Remove __len__ from the array API array object

* Add astype() to numpy.array_api

* Update the unique_* functions in numpy.array_api

unique() in the array API was replaced with three separate functions,
unique_all(), unique_inverse(), and unique_values(), in order to avoid
polymorphic return types.

Additionally, it should be noted that these functions to not currently conform
to the spec with respect to NaN behavior. The spec requires multiple NaNs to
be returned, but np.unique() returns a single NaN. Since this is currently an
open issue in NumPy to possibly revert, I have not yet worked around this. See
numpy/numpy#20326.

* Add the stream argument to the array API to_device method

This does nothing in NumPy, and is just present so that the signature is valid
according to the spec.

* Use the NamedTuple classes for the type signatures

* Add unique_counts to the array API namespace

* Remove some unused imports

* Update the array_api indexing restrictions

The "multiaxis indexing must index every axis explicitly or use an ellipsis"
was supposed to include any type of index, not just tuple indices.

* Use a simpler type annotation for the array API to_device method

* Fix a test failure in the array_api submodule

The array_api cannot use the NumPy testing functions because array_api arrays
do not mix with NumPy arrays, and also NumPy testing functions may use APIs
that aren't supported in the array API.

* Add dlpack support to the array_api submodule

Original NumPy Commit: ff2e2a1e7eea29d925063b13922e096d14331222
asmeurer added a commit to data-apis/array-api-strict that referenced this issue Jan 22, 2024
* Allow casting in the array API asarray()

* Restrict multidimensional indexing in the array API namespace

The spec has recently been updated to only require multiaxis (i.e., tuple)
indices in the case where every axis is indexed, meaning there are either as
many indices as axes or the index has an ellipsis.

* Fix type promotion for numpy.array_api.where

where does value-based promotion for 0-dimensional arrays, so we use the same
trick as in the Array operators to avoid this.

* Print empty array_api arrays using empty()

Printing behavior isn't required by the spec. This is just to make things
easier to understand, especially with the array API test suite.

* Fix an incorrect slice bounds guard in the array API

* Disallow multiple different dtypes in the input to np.array_api.meshgrid

* Remove DLPack support from numpy.array_api.asarray()

from_dlpack() should be used to create arrays using DLPack.

* Remove __len__ from the array API array object

* Add astype() to numpy.array_api

* Update the unique_* functions in numpy.array_api

unique() in the array API was replaced with three separate functions,
unique_all(), unique_inverse(), and unique_values(), in order to avoid
polymorphic return types.

Additionally, it should be noted that these functions to not currently conform
to the spec with respect to NaN behavior. The spec requires multiple NaNs to
be returned, but np.unique() returns a single NaN. Since this is currently an
open issue in NumPy to possibly revert, I have not yet worked around this. See
numpy/numpy#20326.

* Add the stream argument to the array API to_device method

This does nothing in NumPy, and is just present so that the signature is valid
according to the spec.

* Use the NamedTuple classes for the type signatures

* Add unique_counts to the array API namespace

* Remove some unused imports

* Update the array_api indexing restrictions

The "multiaxis indexing must index every axis explicitly or use an ellipsis"
was supposed to include any type of index, not just tuple indices.

* Use a simpler type annotation for the array API to_device method

* Fix a test failure in the array_api submodule

The array_api cannot use the NumPy testing functions because array_api arrays
do not mix with NumPy arrays, and also NumPy testing functions may use APIs
that aren't supported in the array API.

* Add dlpack support to the array_api submodule

Original NumPy Commit: ff2e2a1e7eea29d925063b13922e096d14331222
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants