-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MNT avoid thread limit for non nested parts of KMeans #16499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT avoid thread limit for non nested parts of KMeans #16499
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
can degrade perfs if the init is 'k-means++'
Is this because k-means++ uses blas? (I haven't checked)
failure looks like something fun to debug ^^
# Threadpoolctl wrappers to limit the number of threads in second level of | ||
# nested parallelism (i.e. BLAS) to avoid oversubsciption. | ||
def elkan_iter_chunked_dense(*args, **kwargs): | ||
with threadpool_limits(limits=1, user_api="blas"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts regarding using the CM that deep in the code, versus doing it sooner e.g. in _kmeans_single_lloyd
after _init_centroids
is called?
I guess we may miss oversubscription issues if one day we decide to re-write _inertia_*
Only curious what you think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we may miss oversubscription issues if one day we decide to re-write inertia*
True, but right now it uses no prange and no blas so I don't see it changing soon :)
And more importantly, we should think of the implications each time we want to add a prange somewhere.
Thoughts regarding using the CM that deep in the code, versus doing it sooner e.g. in _kmeans_single_lloyd after _init_centroids is called?
I wanted to put it where we're sure there will be no BLAS call took inside the context manager. That way we make sure that we can change the python code more safely.
Besides, In _kmeans_single_elkan you have:
for i in range(max_iter):
elkan_iter(...)
...
euclidean_distances(...) # BLAS
we don't want to include euclidean distances in the CM. So we have to put the CM around elkan_iter.
Yes, to compute some distances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but can you please do a quick benchmark on a multicore machine to check that we regain approximately the expected scalability during the init ? E.g. with KMeans
with max_iter=1
and a large enough number of clusters + features and samples to get maximum BLAS parallelism during k-means++.
Here's a benchmark. I used taskset to simulate different #cores machines. The difference between now and before 11950 might be due to how I made the benchmark. Since i can't benchmark k-means++ alone (the issue comes form using k-means++ in the fit), the reported timings are actually the difference between a 1 iter run with 'k-means++' init and a 1 iter run with 'random' init. |
Thanks for the confirmation benchmark. The scalability is not very good beyond 4 threads but it's already a nice net improvement. Still +1 for merge. |
Maybe we should explore a |
Thanks @jeremiedbb |
follow up on #11950. I just realized that the initialization was included in the threadpoolctl context, which can degrade perfs if the init is 'k-means++'.
I moved the context manager as close as possible to the nested prans/BLAS part of the code.
ping @ogrisel @NicolasHug it should still be fresh in your minds :)