-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Add axis
argument to numpy.unique
#7742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi Martino, thanks for reviving this. I seem to recall from the original discussion that we weren't able to reach an agreement on what If you could go ahead and bring the discussion up on the mailing list again, that'd be great... We really need to get some consensus on this beforewe can merge your code, which other than that seems mostly ready to go. |
RE unique rows or unique columns For me, it's |
The current behaviour is: >>> a = array([[0, 0, 0],
[1, 1, 1],
[0, 0, 0]])
>>> unique(a, axis=0)
array([[0, 0, 0],
[1, 1, 1]]) So |
@martinosorb - Glad to see you revived this! Things have been a bit hectic for me lately, so I never got around to adding the |
Hi @joferkington, I tried to attract your attention at your old pull request, but I couldn't contact you. I hope you don't mind me using your code. |
@martinosorb - I don't mind at all! I actually did get your ping on the old PR, but it got lost amongst everything else in my inbox, and I forgot to reply. My apologies for that. |
There was only one reply on the mailing list. How shall we take this forward? |
The code look OK at first sight, but I'll do a proper code review during this week. |
This appears to solve a problem I posted earlier this year (#7273). Any more review necessary on the patch? |
We were waiting for opinions on the mailing list, there was only one, which agreed with the current solution. I don't know about anything else. |
Even though we didn't get a lot of replies on the mailing list, I think this is a simple and useful enough extension of current functionality that this is OK from the perspective of expanding the API. |
if abs(axis) > ar.ndim: | ||
raise ValueError('Invalid axis kwarg specified for unique') | ||
|
||
ar = np.swapaxes(ar, axis, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use np.moveaxis(ar, axis, 0)
instead, which avoids the need to swap axes back at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I may be missing something. Why so? That axis doesn't disappear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, you are correct
Not sure if you guys get notified of comments under comments, so writing here. Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of minor suggestions, but otherwise this looks good to me!
@@ -190,7 +192,7 @@ def unique(ar, return_index=False, return_inverse=False, | |||
ar = np.asanyarray(ar) | |||
if axis is None: | |||
return _unique1d(ar, return_index, return_inverse, return_counts) | |||
if abs(axis) > ar.ndim: | |||
if abs(axis) > ar.ndim or axis == ar.ndim: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest (basically one less case to think about):
if not (-ar.ndim <= axis < ar.ndim):
@martinosorb I do get notified of comments... but I lost track of this one. Thanks for the reminder! |
Done as you said, thanks for the comments. Let's see what CI says. |
@charris I think this is basically ready to merge (after squashing and adding release notes). Am I correct that this is too late for 1.12 and should be targeted for 1.13 instead? |
Yes, the line needs to be drawn somewhere. If you feel that it is really important I can sneak it in, but otherwise I'd rather move on with 1.13. Small bug fixes will still go into 1.12. |
@charris Thanks, that's what I thought. @martinosorb Could you please rebase this to a single commit and add something about this to the 1.13 release notes? |
Should be OK now. |
Thanks @martinosorb ! |
try: | ||
consolidated = ar.view(dtype) | ||
except TypeError: | ||
# There's no good way to do this for object arrays, etc... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #8514
This is an attempt at reviving joferkington's pull request (#3584, I was unable to contact him and his fork is no longer available). It makes use mostly of his code, but updates it to supporting return_counts, which was added later.
The idea is being able to return the unique rows (or columns) of a 2d array, or the unique (N-1)-dimensional slices of a N-dimensional array. If an N-dimensional array is provided, the function will return an N-dimensional array where the duplicate (N-1)-dimensional arrays found along the provided axis were removed.
Future work could involve providing more than an axis at once, for example when one wants to find the unique rows of a 10 dimensional array.