-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Improve best run detection in kmeans when n_init > 1 #21195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Improve best run detection in kmeans when n_init > 1 #21195
Conversation
It seems to fix it indeed on my machine:
|
I guess it would be great to have a few tests for |
I also observe this on my machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @jeremiedbb!
LGTM, one thing I am wondering is whether the changelog should go in v1.0.rst or v1.1.rst ... |
v1.0 because the next bug fix release will be 1.0.1 |
Merging, thanks! |
Fixes #21160
Instead of relying of some tolerance to take rouding errors into account, we can simply check that the clustering of the new run is the same as the clustering of the best run so far (up to a permutation of the labels).
The check is really quick: ~200ms for 100_000_000 samples, which is negligible compared to a complete run of kmeans.
The non regression test is already there: test_k_means_fit_predict (see the linked issue), even if it does not fail in the CI because it's an effect quite hard to trigger. I think checking that it now passes is enough. @lesteve could you confirm that ?