Skip to content

Spherical K-means support (unit norm centroids and input) #31450

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Radu1999 opened this issue May 28, 2025 · 10 comments
Open

Spherical K-means support (unit norm centroids and input) #31450

Radu1999 opened this issue May 28, 2025 · 10 comments
Labels
Needs Decision - Include Feature Requires decision regarding including feature New Feature

Comments

@Radu1999
Copy link

Radu1999 commented May 28, 2025

Describe the workflow you want to enable

Hi,
I was wondering if there is—or has been—any initiative to support cosine similarity in the KMeans implementation (i.e., spherical KMeans). I find the algorithm quite useful and would be happy to propose an implementation. The addition should be relatively straightforward.

Describe your proposed solution

Enable the use of cosine similarity with KMeans or implement a separate SphericalKMeans class.

Describe alternatives you've considered, if relevant

No response

Additional context

No response

@Radu1999 Radu1999 added New Feature Needs Triage Issue requires triage labels May 28, 2025
@Namit24
Copy link

Namit24 commented Jun 1, 2025

Hey I'd like to take up this @Radu1999

@Radu1999
Copy link
Author

Radu1999 commented Jun 2, 2025

Hey I'd like to take up this @Radu1999

No, it's ok, I was planning to implement it once I confirm there is interest for it.

@Namit24
Copy link

Namit24 commented Jun 2, 2025

Fairs go for it then

@betatim
Copy link
Member

betatim commented Jun 4, 2025

@scikit-learn/core-devs any interest in this?

@Radu1999 to help evaluate this, could you provide some references and context that helps answer the questions from https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms

@rohnsha0
Copy link

rohnsha0 commented Jun 6, 2025

+1

@rohnsha0
Copy link

rohnsha0 commented Jun 6, 2025

@Radu1999 to help evaluate this, could you provide some references and context that helps answer the questions from https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms

@betatim the paper has more than 200+ citations and it is published in 2012..
IMO it excels at clustering normalized, directional data (like text), where vector direction matters more than magnitude.

@rohnsha0
Copy link

rohnsha0 commented Jun 6, 2025

@betatim I'll like to take this!

@virchan virchan added Needs Decision Requires decision Needs Decision - Include Feature Requires decision regarding including feature and removed Needs Triage Issue requires triage Needs Decision Requires decision labels Jun 8, 2025
@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jun 10, 2025 via email

@adrinjalali
Copy link
Member

I'd say with a small maintainable implementation, I'd be happy to have it.

@Radu1999
Copy link
Author

I'd suggest adding configurable distance metric with 'euclidean' by default into existing Kmeans, rather than implementing a separate class. Just like here: https://spark.apache.org/docs/latest/api/python/_modules/pyspark/ml/clustering.html#KMeans

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Decision - Include Feature Requires decision regarding including feature New Feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants