-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Revert returning a single NaN for np.unique
in 1.22.0?
#20326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also see gh-19655, which shows that |
If NumPy keeps its current behavior (single NaN), it would be useful to have an internal function that computes the old behavior, so that it can be used in the array_api submodule which requires the old, multiple NaN behavior. Otherwise it would have to workaround this in a way that involves building a new array from the result from The inverse, if we do revert, could also be useful (or it could be added as a flag to |
By the way, it would be useful to add a label in the issue tracker for numpy.array_api. |
Done: https://github.com/numpy/numpy/pulls?q=is%3Aopen+is%3Apr+label%3A%22component%3A+numpy.array_api%22 |
@asmeurer I was thinking a keyword would be useful, something line |
unique() in the array API was replaced with three separate functions, unique_all(), unique_inverse(), and unique_values(), in order to avoid polymorphic return types. Additionally, it should be noted that these functions to not currently conform to the spec with respect to NaN behavior. The spec requires multiple NaNs to be returned, but np.unique() returns a single NaN. Since this is currently an open issue in NumPy to possibly revert, I have not yet worked around this. See numpy#20326.
That may be a good option if we have implementations for both behaviors and we know that there is demand for both from downstream libraries. |
I think it makes more sense to keep the new behaviour, returning > a=c(NA,3,1,5,6,7,NA,2,NA,2)
> unique(a)
[1] NA 3 1 5 6 7 2 Not that is a strong case in its favor, but it is logically more consistent, since none of the |
R's |
Sorry, my bad. Still the same behaviour though: > a=c(NaN,3,1,5,6,7,NaN,2,NaN,2)
> unique(a)
[1] NaN 3 1 5 6 7 2 For reference: |
In the meeting, we seemed to converge on: lets just add a kwarg. The not quite clear followup to that is what the default should be. |
Agreed with a kwarg, because there's a desire for both sets of behaviors. It's indeed unclear what the default should be. This was a silent change in behavior in Now that we're 4-5 months in though, it's more fuzzy. There may be people who updated their code, and changing it back would break them again. So I'd say perhaps we should leave it as it is in Leaving this on the |
Should not be a problem as long as there is a test for both behaviors. |
* Allow casting in the array API asarray() * Restrict multidimensional indexing in the array API namespace The spec has recently been updated to only require multiaxis (i.e., tuple) indices in the case where every axis is indexed, meaning there are either as many indices as axes or the index has an ellipsis. * Fix type promotion for numpy.array_api.where where does value-based promotion for 0-dimensional arrays, so we use the same trick as in the Array operators to avoid this. * Print empty array_api arrays using empty() Printing behavior isn't required by the spec. This is just to make things easier to understand, especially with the array API test suite. * Fix an incorrect slice bounds guard in the array API * Disallow multiple different dtypes in the input to np.array_api.meshgrid * Remove DLPack support from numpy.array_api.asarray() from_dlpack() should be used to create arrays using DLPack. * Remove __len__ from the array API array object * Add astype() to numpy.array_api * Update the unique_* functions in numpy.array_api unique() in the array API was replaced with three separate functions, unique_all(), unique_inverse(), and unique_values(), in order to avoid polymorphic return types. Additionally, it should be noted that these functions to not currently conform to the spec with respect to NaN behavior. The spec requires multiple NaNs to be returned, but np.unique() returns a single NaN. Since this is currently an open issue in NumPy to possibly revert, I have not yet worked around this. See #20326. * Add the stream argument to the array API to_device method This does nothing in NumPy, and is just present so that the signature is valid according to the spec. * Use the NamedTuple classes for the type signatures * Add unique_counts to the array API namespace * Remove some unused imports * Update the array_api indexing restrictions The "multiaxis indexing must index every axis explicitly or use an ellipsis" was supposed to include any type of index, not just tuple indices. * Use a simpler type annotation for the array API to_device method * Fix a test failure in the array_api submodule The array_api cannot use the NumPy testing functions because array_api arrays do not mix with NumPy arrays, and also NumPy testing functions may use APIs that aren't supported in the array API. * Add dlpack support to the array_api submodule
This will not make it into |
Pushing off to 1.22.1, but might go into 1.22.0 if it arrives soon enough. Probably an enhancement at this point, but it would be nice to provide a compatibility path for array_api testing. |
It should be noted that |
Pushing off to 1.23. |
Brought it up briefly in the triage meeting, this seems like we need to push this one off. But IIRC we were probably going to add a new kwarg here. |
Adding a keyword looks like a fairly simple enhancement that we should try to get into 1.23. At this point I think it preferable to keep the current behavior as the default. |
* Allow casting in the array API asarray() * Restrict multidimensional indexing in the array API namespace The spec has recently been updated to only require multiaxis (i.e., tuple) indices in the case where every axis is indexed, meaning there are either as many indices as axes or the index has an ellipsis. * Fix type promotion for numpy.array_api.where where does value-based promotion for 0-dimensional arrays, so we use the same trick as in the Array operators to avoid this. * Print empty array_api arrays using empty() Printing behavior isn't required by the spec. This is just to make things easier to understand, especially with the array API test suite. * Fix an incorrect slice bounds guard in the array API * Disallow multiple different dtypes in the input to np.array_api.meshgrid * Remove DLPack support from numpy.array_api.asarray() from_dlpack() should be used to create arrays using DLPack. * Remove __len__ from the array API array object * Add astype() to numpy.array_api * Update the unique_* functions in numpy.array_api unique() in the array API was replaced with three separate functions, unique_all(), unique_inverse(), and unique_values(), in order to avoid polymorphic return types. Additionally, it should be noted that these functions to not currently conform to the spec with respect to NaN behavior. The spec requires multiple NaNs to be returned, but np.unique() returns a single NaN. Since this is currently an open issue in NumPy to possibly revert, I have not yet worked around this. See numpy/numpy#20326. * Add the stream argument to the array API to_device method This does nothing in NumPy, and is just present so that the signature is valid according to the spec. * Use the NamedTuple classes for the type signatures * Add unique_counts to the array API namespace * Remove some unused imports * Update the array_api indexing restrictions The "multiaxis indexing must index every axis explicitly or use an ellipsis" was supposed to include any type of index, not just tuple indices. * Use a simpler type annotation for the array API to_device method * Fix a test failure in the array_api submodule The array_api cannot use the NumPy testing functions because array_api arrays do not mix with NumPy arrays, and also NumPy testing functions may use APIs that aren't supported in the array API. * Add dlpack support to the array_api submodule Original NumPy Commit: ff2e2a1e7eea29d925063b13922e096d14331222
* Allow casting in the array API asarray() * Restrict multidimensional indexing in the array API namespace The spec has recently been updated to only require multiaxis (i.e., tuple) indices in the case where every axis is indexed, meaning there are either as many indices as axes or the index has an ellipsis. * Fix type promotion for numpy.array_api.where where does value-based promotion for 0-dimensional arrays, so we use the same trick as in the Array operators to avoid this. * Print empty array_api arrays using empty() Printing behavior isn't required by the spec. This is just to make things easier to understand, especially with the array API test suite. * Fix an incorrect slice bounds guard in the array API * Disallow multiple different dtypes in the input to np.array_api.meshgrid * Remove DLPack support from numpy.array_api.asarray() from_dlpack() should be used to create arrays using DLPack. * Remove __len__ from the array API array object * Add astype() to numpy.array_api * Update the unique_* functions in numpy.array_api unique() in the array API was replaced with three separate functions, unique_all(), unique_inverse(), and unique_values(), in order to avoid polymorphic return types. Additionally, it should be noted that these functions to not currently conform to the spec with respect to NaN behavior. The spec requires multiple NaNs to be returned, but np.unique() returns a single NaN. Since this is currently an open issue in NumPy to possibly revert, I have not yet worked around this. See numpy/numpy#20326. * Add the stream argument to the array API to_device method This does nothing in NumPy, and is just present so that the signature is valid according to the spec. * Use the NamedTuple classes for the type signatures * Add unique_counts to the array API namespace * Remove some unused imports * Update the array_api indexing restrictions The "multiaxis indexing must index every axis explicitly or use an ellipsis" was supposed to include any type of index, not just tuple indices. * Use a simpler type annotation for the array API to_device method * Fix a test failure in the array_api submodule The array_api cannot use the NumPy testing functions because array_api arrays do not mix with NumPy arrays, and also NumPy testing functions may use APIs that aren't supported in the array API. * Add dlpack support to the array_api submodule Original NumPy Commit: ff2e2a1e7eea29d925063b13922e096d14331222
This mailing list thread discusses reverting the change in
1.21.0
to makeunique
return a singlenan
in its output when the input array contains multiplenan
s. This issue is to track that, needs a decision before1.22.0rc1
I think.The text was updated successfully, but these errors were encountered: