Add num_threads in the prange loop of init_bounds in KMeans #22773

jeremiedbb · 2022-03-11T23:01:50Z

Fixes #20483

adrinjalali · 2022-03-12T13:29:18Z

Can this be tested? 😁

jeremiedbb · 2022-03-12T17:27:16Z

I don't see how to do that easily. We don't do it for the other pranges. It should mainly be noticeable when running in e.g a docker container with quota on cpu ressources on a machine with many cores.

thomasjpfan · 2022-03-12T17:43:03Z

At a high level, we already test that the results are the same with different n_threads setting:

scikit-learn/sklearn/cluster/tests/test_k_means.py

Lines 932 to 934 in f9d7423

    
           def test_result_equal_in_diff_n_threads(Estimator): 
        
               # Check that KMeans/MiniBatchKMeans give the same results in parallel mode 
        
               # than in sequential mode.

jeremiedbb · 2022-03-12T17:46:19Z

This PR is not about parallelizing a some part of the code. This function is already parallel, it's just that we don't control the number of threads the way we do for other pranges, i.e. with _openmp_effective_n_threads. That's why testing this change is particularily hard. The parallelism in itself is indeed already covered by the test mentioned above.

thomasjpfan

LGTM

thomasjpfan · 2022-03-12T18:02:34Z

Sorry for being unclear. I was trying to say that we already test for different n_threads settings showing that this PR does not introduce any regressions.

As long as _openmp_effective_n_threads is correct, then I do not think we need to test for functions that use n_threads in terms of oversubscription.

ogrisel

Good catch. LGTM!

add num_threads in kmeans init_bounds

985b79c

github-actions bot added module:cluster cython labels Mar 11, 2022

jeremiedbb added the Quick Review For PRs that are quick to review label Mar 11, 2022

thomasjpfan approved these changes Mar 12, 2022

View reviewed changes

ogrisel approved these changes Mar 14, 2022

View reviewed changes

ogrisel merged commit 09a7293 into scikit-learn:main Mar 14, 2022

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 6, 2022

add num_threads in kmeans init_bounds (scikit-learn#22773)

511d232

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add num_threads in the prange loop of init_bounds in KMeans #22773

Add num_threads in the prange loop of init_bounds in KMeans #22773

Uh oh!

jeremiedbb commented Mar 11, 2022

Uh oh!

adrinjalali commented Mar 12, 2022

Uh oh!

jeremiedbb commented Mar 12, 2022

Uh oh!

thomasjpfan commented Mar 12, 2022

Uh oh!

jeremiedbb commented Mar 12, 2022 •

edited

Loading

Uh oh!

thomasjpfan left a comment

Uh oh!

thomasjpfan commented Mar 12, 2022 •

edited

Loading

Uh oh!

ogrisel left a comment

Uh oh!

Uh oh!

Uh oh!

Add num_threads in the prange loop of init_bounds in KMeans #22773

Add num_threads in the prange loop of init_bounds in KMeans #22773

Uh oh!

Conversation

jeremiedbb commented Mar 11, 2022

Uh oh!

adrinjalali commented Mar 12, 2022

Uh oh!

jeremiedbb commented Mar 12, 2022

Uh oh!

thomasjpfan commented Mar 12, 2022

Uh oh!

jeremiedbb commented Mar 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan commented Mar 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeremiedbb commented Mar 12, 2022 •

edited

Loading

thomasjpfan commented Mar 12, 2022 •

edited

Loading