-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH, BUG: add equals_nans parameter to np.unique and fixed functionality with equal_nans along an axis #20896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
61bfb7b
to
60ba6ff
Compare
60ba6ff
to
d3eca24
Compare
I am a little weary of the performance cost for checking for NaNs along an axis.
np.unique(a, axis=0)
With 1-D arrays we check all the duplicate NaNs at the end of a sorted array, which scales with # of NaN, rather than the actual size of the array. Any ideas for optimization would be much appreciated. |
…gument simultaneously
b23133b
to
14fc087
Compare
The first part is merged now (so needs a rebase). The second part is still interesting, but on first glance needs a bit more thought I believe. After a brief look, I do not think we can go via strings.
with:
Which is the same, except that it also returns
The best part. That should effectively also fix object arrays containing NaNs. |
Addresses #20326 and #20873
np.unique previously had it's functionality changed so NaN values would be treated as non-unique.
This PR puts the functionality into the parameter equal_nans(default: True).
In addition, it fixes the inability to use equal_nans and axis functionality simultaneously.
Ex.
np.unique(arr, axis = 0)
Result:
np.unique(arr, axis = 0, equal_nans = False)
Result: