Skip to content

Add axis argument to numpy.unique #7742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 13, 2016
Merged

Add axis argument to numpy.unique #7742

merged 1 commit into from
Nov 13, 2016

Conversation

martinosorb
Copy link

@martinosorb martinosorb commented Jun 14, 2016

This is an attempt at reviving joferkington's pull request (#3584, I was unable to contact him and his fork is no longer available). It makes use mostly of his code, but updates it to supporting return_counts, which was added later.

The idea is being able to return the unique rows (or columns) of a 2d array, or the unique (N-1)-dimensional slices of a N-dimensional array. If an N-dimensional array is provided, the function will return an N-dimensional array where the duplicate (N-1)-dimensional arrays found along the provided axis were removed.

Future work could involve providing more than an axis at once, for example when one wants to find the unique rows of a 10 dimensional array.

@jaimefrio
Copy link
Member

Hi Martino, thanks for reviving this. I seem to recall from the original discussion that we weren't able to reach an agreement on what axis should mean. There were conflicting views on whether axis=0 should mean unique rows or unique columns, I believe in this PR it is unique rows, right?

If you could go ahead and bring the discussion up on the mailing list again, that'd be great... We really need to get some consensus on this beforewe can merge your code, which other than that seems mostly ready to go.

@shoyer
Copy link
Member

shoyer commented Jun 14, 2016

RE unique rows or unique columns

For me, it's axis=0 meaning unique rows is the "obvious" answer. That way the axis argument would affect axes in a similar way to the existing aggregation routines (sum, mean, etc.). But I don't think this choice matters too much, as long as it's clearly documented with examples. Either way, this functionality is clearly very welcome!

@martinosorb
Copy link
Author

The current behaviour is:

>>> a = array([[0, 0, 0],
               [1, 1, 1],
               [0, 0, 0]])

>>> unique(a, axis=0)
array([[0, 0, 0],
       [1, 1, 1]])

So axis=0 gives unique rows.

@joferkington
Copy link
Contributor

@martinosorb - Glad to see you revived this! Things have been a bit hectic for me lately, so I never got around to adding the return_counts functionality after that was added to unique. Good to see the PR coming back around.

@martinosorb
Copy link
Author

Hi @joferkington, I tried to attract your attention at your old pull request, but I couldn't contact you. I hope you don't mind me using your code.

@joferkington
Copy link
Contributor

@martinosorb - I don't mind at all! I actually did get your ping on the old PR, but it got lost amongst everything else in my inbox, and I forgot to reply. My apologies for that.

@martinosorb
Copy link
Author

There was only one reply on the mailing list. How shall we take this forward?

@jaimefrio
Copy link
Member

The code look OK at first sight, but I'll do a proper code review during this week.

@machinaut
Copy link

This appears to solve a problem I posted earlier this year (#7273). Any more review necessary on the patch?

@martinosorb
Copy link
Author

We were waiting for opinions on the mailing list, there was only one, which agreed with the current solution. I don't know about anything else.

@shoyer
Copy link
Member

shoyer commented Aug 19, 2016

Even though we didn't get a lot of replies on the mailing list, I think this is a simple and useful enough extension of current functionality that this is OK from the perspective of expanding the API.

if abs(axis) > ar.ndim:
raise ValueError('Invalid axis kwarg specified for unique')

ar = np.swapaxes(ar, axis, 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use np.moveaxis(ar, axis, 0) instead, which avoids the need to swap axes back at the end.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I may be missing something. Why so? That axis doesn't disappear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, you are correct

@martinosorb
Copy link
Author

Not sure if you guys get notified of comments under comments, so writing here. Thoughts?

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of minor suggestions, but otherwise this looks good to me!

@@ -190,7 +192,7 @@ def unique(ar, return_index=False, return_inverse=False,
ar = np.asanyarray(ar)
if axis is None:
return _unique1d(ar, return_index, return_inverse, return_counts)
if abs(axis) > ar.ndim:
if abs(axis) > ar.ndim or axis == ar.ndim:
Copy link
Member

@shoyer shoyer Nov 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest (basically one less case to think about):

if not (-ar.ndim <= axis < ar.ndim):

@shoyer
Copy link
Member

shoyer commented Nov 12, 2016

Not sure if you guys get notified of comments under comments, so writing here. Thoughts?

@martinosorb I do get notified of comments... but I lost track of this one. Thanks for the reminder!

@martinosorb
Copy link
Author

Done as you said, thanks for the comments. Let's see what CI says.

@shoyer
Copy link
Member

shoyer commented Nov 13, 2016

@charris I think this is basically ready to merge (after squashing and adding release notes). Am I correct that this is too late for 1.12 and should be targeted for 1.13 instead?

@charris
Copy link
Member

charris commented Nov 13, 2016

Yes, the line needs to be drawn somewhere. If you feel that it is really important I can sneak it in, but otherwise I'd rather move on with 1.13. Small bug fixes will still go into 1.12.

@shoyer
Copy link
Member

shoyer commented Nov 13, 2016

@charris Thanks, that's what I thought.

@martinosorb Could you please rebase this to a single commit and add something about this to the 1.13 release notes?

@martinosorb
Copy link
Author

Should be OK now.

@shoyer shoyer merged commit 5d062f3 into numpy:master Nov 13, 2016
@shoyer
Copy link
Member

shoyer commented Nov 13, 2016

Thanks @martinosorb !

try:
consolidated = ar.view(dtype)
except TypeError:
# There's no good way to do this for object arrays, etc...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #8514

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants