MNT avoid thread limit for non nested parts of KMeans #16499

jeremiedbb · 2020-02-20T17:22:52Z

follow up on #11950. I just realized that the initialization was included in the threadpoolctl context, which can degrade perfs if the init is 'k-means++'.

I moved the context manager as close as possible to the nested prans/BLAS part of the code.

ping @ogrisel @NicolasHug it should still be fresh in your minds :)

NicolasHug

Looks good

can degrade perfs if the init is 'k-means++'

Is this because k-means++ uses blas? (I haven't checked)

failure looks like something fun to debug ^^

NicolasHug · 2020-02-20T17:56:18Z

sklearn/cluster/_k_means_elkan.pyx

+# Threadpoolctl wrappers to limit the number of threads in second level of
+# nested parallelism (i.e. BLAS) to avoid oversubsciption.
+def elkan_iter_chunked_dense(*args, **kwargs):
+    with threadpool_limits(limits=1, user_api="blas"):


Thoughts regarding using the CM that deep in the code, versus doing it sooner e.g. in _kmeans_single_lloyd after _init_centroids is called?

I guess we may miss oversubscription issues if one day we decide to re-write _inertia_*

Only curious what you think

I guess we may miss oversubscription issues if one day we decide to re-write inertia*

True, but right now it uses no prange and no blas so I don't see it changing soon :)
And more importantly, we should think of the implications each time we want to add a prange somewhere.

Thoughts regarding using the CM that deep in the code, versus doing it sooner e.g. in _kmeans_single_lloyd after _init_centroids is called?

I wanted to put it where we're sure there will be no BLAS call took inside the context manager. That way we make sure that we can change the python code more safely.

Besides, In _kmeans_single_elkan you have:

for i in range(max_iter): elkan_iter(...) ... euclidean_distances(...) # BLAS

we don't want to include euclidean distances in the CM. So we have to put the CM around elkan_iter.

jeremiedbb · 2020-02-21T13:21:32Z

The failure is fixed in ~~#16514~~ #16506
It was not so hard to debug :)

jeremiedbb · 2020-02-21T13:22:11Z

Is this because k-means++ uses blas? (I haven't checked)

Yes, to compute some distances.

ogrisel

LGTM but can you please do a quick benchmark on a multicore machine to check that we regain approximately the expected scalability during the init ? E.g. with KMeans with max_iter=1 and a large enough number of clusters + features and samples to get maximum BLAS parallelism during k-means++.

jeremiedbb · 2020-02-27T10:44:00Z

Here's a benchmark. I used taskset to simulate different #cores machines.
This PR restores the lost of scalability of k-means++ during fit

The difference between now and before 11950 might be due to how I made the benchmark. Since i can't benchmark k-means++ alone (the issue comes form using k-means++ in the fit), the reported timings are actually the difference between a 1 iter run with 'k-means++' init and a 1 iter run with 'random' init.

ogrisel · 2020-02-27T11:12:42Z

Thanks for the confirmation benchmark. The scalability is not very good beyond 4 threads but it's already a nice net improvement. Still +1 for merge.

ogrisel · 2020-02-27T11:16:44Z

Maybe we should explore a prange on n_samples to get better scalability but let's do that in a latter PR.

NicolasHug · 2020-03-31T12:14:04Z

Thanks @jeremiedbb

…6499)

move threadpoolctl context to avoid limiting non nested parts

7458c1f

jeremiedbb added Performance module:cluster labels Feb 20, 2020

NicolasHug reviewed Feb 20, 2020

View reviewed changes

jeremiedbb mentioned this pull request Feb 21, 2020

[MRG] TST Adapt rtol to precision in a sparsefuncs test #16514

Closed

don't limit to 1 thread for mini batch

b4e976f

ogrisel approved these changes Feb 25, 2020

View reviewed changes

jeremiedbb added this to the 0.23 milestone Mar 3, 2020

NicolasHug approved these changes Mar 31, 2020

View reviewed changes

NicolasHug changed the title ~~[MRG] Move threadpoolctl context in KMeans to avoid limiting non nested parts~~ MNT avoid thread limit for non nested parts of KMeans Mar 31, 2020

NicolasHug merged commit ab01816 into scikit-learn:master Mar 31, 2020

gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020

MNT avoid thread limit for non nested parts of KMeans (scikit-learn#1…

dc0f7a9

…6499)

ogrisel mentioned this pull request May 25, 2020

KMeans(init='k-means++') performance issue with OpenBLAS #17334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNT avoid thread limit for non nested parts of KMeans #16499

MNT avoid thread limit for non nested parts of KMeans #16499

jeremiedbb commented Feb 20, 2020

NicolasHug left a comment

NicolasHug Feb 20, 2020

jeremiedbb Feb 21, 2020

jeremiedbb commented Feb 21, 2020 •

edited

Loading

jeremiedbb commented Feb 21, 2020

ogrisel left a comment

jeremiedbb commented Feb 27, 2020 •

edited

Loading

ogrisel commented Feb 27, 2020

ogrisel commented Feb 27, 2020

NicolasHug commented Mar 31, 2020

MNT avoid thread limit for non nested parts of KMeans #16499

MNT avoid thread limit for non nested parts of KMeans #16499

Conversation

jeremiedbb commented Feb 20, 2020

NicolasHug left a comment

Choose a reason for hiding this comment

NicolasHug Feb 20, 2020

Choose a reason for hiding this comment

jeremiedbb Feb 21, 2020

Choose a reason for hiding this comment

jeremiedbb commented Feb 21, 2020 • edited Loading

jeremiedbb commented Feb 21, 2020

ogrisel left a comment

Choose a reason for hiding this comment

jeremiedbb commented Feb 27, 2020 • edited Loading

ogrisel commented Feb 27, 2020

ogrisel commented Feb 27, 2020

NicolasHug commented Mar 31, 2020

jeremiedbb commented Feb 21, 2020 •

edited

Loading

jeremiedbb commented Feb 27, 2020 •

edited

Loading