Skip to content

Performance regression in KMeans.transform #26100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ogrisel opened this issue Apr 5, 2023 · 2 comments · Fixed by #26275
Closed

Performance regression in KMeans.transform #26100

ogrisel opened this issue Apr 5, 2023 · 2 comments · Fixed by #26275

Comments

@ogrisel
Copy link
Member

ogrisel commented Apr 5, 2023

https://scikit-learn.org/scikit-learn-benchmarks/#cluster.KMeansBenchmark.time_transform?p-representation='dense'&p-algorithm='elkan'&p-init='k-means%2B%2B'&commits=ab7e3d19-b397b8f2

The increment is quite small in absolute value but quite consistent so it could be worth investigating what could have caused it (between ab7e3d1 and b397b8f).

@betatim
Copy link
Member

betatim commented Apr 6, 2023

Looking at the diff between those two commits there are two things that could be relevant. There is no obvious change.

The lockfiles were updated but I doubt this is what caused this.

There was also the change by @jeremiedbb to introduce IntNotReal for parameter validation. However it isn't used directly in KMeans. Do you think it is worth investigating this aspect further Jeremie?

@jeremiedbb
Copy link
Member

A profiling is necessary, but I strongly suspect that it comes from #25864 which added validation on gen_batches used many times in euclidean_distances on float32. This is another motivation for #25815

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants