Skip to content

[MRG] Backport np.unique with return_counts in sklearn.utils.fixes #9976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

johny-c
Copy link

@johny-c johny-c commented Oct 22, 2017

Reference Issues/PRs

During the discussion of #8602 (comment) , it came up that maybe the newer (since 1.9.0) unique function
of numpy that also returns counts of the unique elements might be useful and should be backported,
since the required numpy version is not going to be 1.9.0 in the near future.

What does this implement/fix? Explain your changes.

Backports the return_counts functionality of np.unique to builds with numpy version < 1.9.0.
I copied the function body from the latest commit in numpy 1.9.X to sklearn.utils.fixes.

Any other comments?

No.

@johny-c johny-c changed the title Backport np.unique with return_counts in sklearn.utils.fixes [MRG] Backport np.unique with return_counts in sklearn.utils.fixes Oct 22, 2017

if np_version < (1, 9, 0):
# Allow unique to return counts
# https://github.com/numpy/numpy/commit/da3c6a28f651877dfb4cfdbbbe76dab985669252
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following link might be more relevant:

https://github.com/numpy/numpy/commit/09fb4205a1d56090e13257a181f23514684f532b

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, thanks!

ret += (perm[flag],)
if return_inverse:
iflag = np.cumsum(flag) - 1
iperm = perm.argsort()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is not up-to-date with numpy since numpy/numpy@036efee

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but this change was in numpy 1.10.0b1. Shouldn't the 1.9.X version be backported? Otherwise why not backport the current version 1.13.3 that also has an axis keyword argument?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the function from 1.9.X might be more likely to work with versions < 1.9.
Yet, if the function from 1.13.3 works in the tests, it should be better, since a bug fix and an speed improvement have been added.

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the axis feature is pretty useful so I would be for including it if it passes the tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know though. Is it common to backport something from numpy's master branch instead of a maintenance branch?

@amueller
Copy link
Member

I find it strange to backport something that's not used anywhere....

@johny-c
Copy link
Author

johny-c commented Oct 24, 2017

You mean unique in general, or the axis argument?

@amueller
Copy link
Member

I mean return_counts. You added a util that is not used anywhere in the sklearn code, right?

@johny-c
Copy link
Author

johny-c commented Oct 26, 2017

I guess the point is to make it available for cases where a np.unique (without return_counts) is followed by a np.bincount. For instance in LMNN (#8602) I would use it to obtain the class sizes directly. And I think np.bincount is used quite a lot in that sense, no?

@jnothman
Copy link
Member

jnothman commented Oct 27, 2017 via email

@jnothman
Copy link
Member

jnothman commented Oct 27, 2017 via email

@lesteve
Copy link
Member

lesteve commented Oct 27, 2017

I find it strange to backport something that's not used anywhere....

Same feeling here. If you can find a good use of this in the current scikit-learn code, then do that in this PR. If not then close this PR and do the backport in #8602.

@johny-c
Copy link
Author

johny-c commented Oct 27, 2017

Some git grep-ing shows what @jnothman suggested, the most common use case is (or could be replaced by) LabelEncoder. So I close this one.

@johny-c johny-c closed this Oct 27, 2017
@jnothman
Copy link
Member

jnothman commented Oct 28, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants