-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] Backport np.unique with return_counts in sklearn.utils.fixes #9976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/utils/fixes.py
Outdated
|
||
if np_version < (1, 9, 0): | ||
# Allow unique to return counts | ||
# https://github.com/numpy/numpy/commit/da3c6a28f651877dfb4cfdbbbe76dab985669252 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following link might be more relevant:
https://github.com/numpy/numpy/commit/09fb4205a1d56090e13257a181f23514684f532b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, thanks!
ret += (perm[flag],) | ||
if return_inverse: | ||
iflag = np.cumsum(flag) - 1 | ||
iperm = perm.argsort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is not up-to-date with numpy since numpy/numpy@036efee
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, but this change was in numpy 1.10.0b1. Shouldn't the 1.9.X version be backported? Otherwise why not backport the current version 1.13.3 that also has an axis
keyword argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, the function from 1.9.X
might be more likely to work with versions < 1.9
.
Yet, if the function from 1.13.3
works in the tests, it should be better, since a bug fix and an speed improvement have been added.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the axis feature is pretty useful so I would be for including it if it passes the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know though. Is it common to backport something from numpy's master branch instead of a maintenance branch?
I find it strange to backport something that's not used anywhere.... |
You mean |
I mean |
I guess the point is to make it available for cases where a |
it probably is used quite a lot, but many such uses of unique are through
LabelEncoder or similar...
|
not that it would hurt to store training distribution on LabelEncoder
necessary
|
Same feeling here. If you can find a good use of this in the current scikit-learn code, then do that in this PR. If not then close this PR and do the backport in #8602. |
Some git grep-ing shows what @jnothman suggested, the most common use case is (or could be replaced by) |
well using this backport in LabelEncoder may be a fruitful idea if it helps
streamline our code?
…On 28 Oct 2017 2:17 am, "John Chiotellis" ***@***.***> wrote:
Some git grep-ing shows what @jnothman <https://github.com/jnothman>
suggested, the most common use case is (or could be replaced by)
LabelEncoder. So I close this one.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9976 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67dVABCDWCH9Dr9O9Dnq59zpF6y5ks5swfQFgaJpZM4QCCQb>
.
|
Reference Issues/PRs
During the discussion of #8602 (comment) , it came up that maybe the newer (since 1.9.0) unique function
of numpy that also returns counts of the unique elements might be useful and should be backported,
since the required numpy version is not going to be 1.9.0 in the near future.
What does this implement/fix? Explain your changes.
Backports the
return_counts
functionality ofnp.unique
to builds with numpy version < 1.9.0.I copied the function body from the latest commit in numpy 1.9.X to sklearn.utils.fixes.
Any other comments?
No.