Skip to content

Cosine metric for K-Means algorithm #31541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Conversation

Radu1999
Copy link

@Radu1999 Radu1999 commented Jun 13, 2025

Reference Issues/PRs

Addresses #31450

What does this implement/fix? Explain your changes.

This PR adds support for an optional cosine distance metric in the K-means algorithm, allowing it to function as Spherical K-means when selected.

Changes:

  • Added a new paramter: metric - defaults to 'euclidean'
  • If metric is cosine, and normalize both input and centroids. Then, euclidean distance will be equivalent to cosine distance.
  • Added coverage test for the new functionality.

Any other comments?

Relevant sources:

Copy link

github-actions bot commented Jun 13, 2025

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff format

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


--- sklearn/cluster/tests/test_k_means.py
+++ sklearn/cluster/tests/test_k_means.py
@@ -1367,7 +1367,7 @@
 @pytest.mark.parametrize("Estimator", [KMeans, MiniBatchKMeans])
 def test_cosine_distance_metric(Estimator, global_random_seed):
     """Test that cosine distance metric works correctly for
-      both KMeans and MiniBatchKMeans."""
+    both KMeans and MiniBatchKMeans."""
     # Create points on a unit circle and add some noise
     rng = np.random.RandomState(global_random_seed)
     n_samples = 100

1 file would be reformatted, 923 files already formatted

Generated for commit: 60ae6c8. Link to the linter CI: here

rchivereanu added 2 commits June 13, 2025 14:37
@Radu1999 Radu1999 closed this Jun 13, 2025
@Radu1999
Copy link
Author

Closed temporarily due to not respecting contribution guidlines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant