-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Fix #13134: Ensure sorted bin edges for KBinsDiscretizer strategy kmeans #13135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fix #13134: Ensure sorted bin edges for KBinsDiscretizer strategy kmeans #13135
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not confirmed the test fails at master, but this lgtm
Please add an entry to the change log at doc/whats_new/v0.20.rst
under 0.20.3. Like the other entries there, please reference this pull request with :issue:
and credit yourself (and other contributors if applicable) with :user:
It fails at master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @SandroCasagrande
…retizer (scikit-learn#13135)" This reverts commit 7762f1f.
…retizer (scikit-learn#13135)" This reverts commit 7762f1f.
Reference Issues/PRs
Fixes #13134
What does this implement/fix? Explain your changes.
Since centers returned by 1d kmeans can be unsorted, I simply added a sort before constructing the bin edges. I also added a test which reproduces the issue when not sorting centers.
Any other comments?
Performance of sorting centers should not be an issue compared to kmeans itself. In most circumstances the list of centers is already sorted, but afaik in this situation checking if the numpy array is sorted is not better than invoking sort() directly (https://stackoverflow.com/questions/47004506/check-if-a-numpy-array-is-sorted), so I skipped the check.