[MRG] Fix parallelisation of kmeans clustering #12955

nixphix · 2019-01-11T15:36:49Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Fixes parallelisation of kmean clustering

nixphix · 2019-01-11T17:42:46Z

@jnothman kmeans doctest fails because the cluster centroid changed, following is the depiction of doctest clusters before and after the fix.

should I update the expected cluster result in doctest per the new centroid or try different test data?

sklearn/cluster/k_means_.py

jnothman · 2019-01-14T07:52:29Z

Strange that this doctest result changes... the doctest hasn't been changed in 40e6c43 or subsequently :|

jnothman · 2019-01-14T07:55:39Z

But it's only failing on the latest numpy and scipy... so it's probably due to an upstream change. Can you change the example to make it deterministic? For example, move the top-right point further right?

jeremiedbb · 2019-01-15T15:28:55Z

I can reproduce the issue and I confirm that this PR fixes it.

nixphix · 2019-01-15T17:08:40Z

I have updated the doctest to be more deterministic by moving a set of cluster points far apart from the other.

jnothman · 2019-01-16T02:25:54Z

Please add an entry to the change log for version 0.20.3 at doc/whats_new/v0.20.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

qinhanmin2014

thanks @nixphix I agree that we don't need a test here.
please add a what's new entry.
Btw, why do current code use all the cores when n_clusters=1 (and not when n_clusters>1)?

qinhanmin2014 · 2019-01-16T15:19:21Z

And seems that we have a similar problem

scikit-learn/sklearn/decomposition/dict_learning.py

Line 303 in ff46f6e

if effective_n_jobs(n_jobs) or algorithm == 'threshold':

Feel free to fix it in another PR, otherwise we'll open an issue.

jeremiedbb · 2019-01-16T15:27:40Z

Btw, why do current code use all the cores when n_clusters=1 (and not when n_clusters>1)?

I can't reproduce that. However, it might be that the multithreading comes from MKL in that case. Running the snippet with the env variable MKL_NUM_THREADS=1 makes things easier to check. In that case only one core is used in any situation (before this PR). And all cores are used in any situation after this PR.

qinhanmin2014 · 2019-01-16T15:38:49Z

I can't reproduce that. However, it might be that the multithreading comes from MKL in that case. Running the snippet with the env variable MKL_NUM_THREADS=1 makes things easier to check. In that case only one core is used in any situation (before this PR). And all cores are used in any situation after this PR.

Thanks, I don't have time to reproduce but your version seems more reasonable. According to the code, before this PR, there's no parallelism at scikit-learn level. I guess the reporter make some mistakes accidentally.

nixphix · 2019-01-16T15:40:52Z

And seems that we have a similar problem
scikit-learn/sklearn/decomposition/dict_learning.py

Line 303 in ff46f6e

if effective_n_jobs(n_jobs) or algorithm == 'threshold':

Feel free to fix it in another PR, otherwise we'll open an issue.

Sure will fix it in another PR

doc/whats_new/v0.20.rst

jnothman · 2019-01-16T23:42:26Z

Thanks @nixphix

This reverts commit f5e9d8e.

nixphix changed the title ~~Fix parallelisation of kmeans clustering~~ [MGR] Fix parallelisation of kmeans clustering Jan 11, 2019

fwillo mentioned this pull request Jan 11, 2019

KMeans not running in parallel when init='random' #12949

Closed

nixphix changed the title ~~[MGR] Fix parallelisation of kmeans clustering~~ [MRG] Fix parallelisation of kmeans clustering Jan 12, 2019

jnothman reviewed Jan 13, 2019

View reviewed changes

sklearn/cluster/k_means_.py Outdated Show resolved Hide resolved

jeremiedbb approved these changes Jan 15, 2019

View reviewed changes

jnothman approved these changes Jan 16, 2019

View reviewed changes

qinhanmin2014 approved these changes Jan 16, 2019

View reviewed changes

nixphix added 6 commits January 16, 2019 21:01

FIX Fix parallelisation of kmeans clustering

d0626e4

Use effective_n_jobs to find number of threads

6ebf8c7

Update docstring test to be more deterministic

ce1e952

Update doctest expected result

a257c19

Fix cluster centre result alignment

6cb846c

Add whats new entry

0fa1263

nixphix force-pushed the fix/kmeans-joblib branch from 6e43bfe to 0fa1263 Compare January 16, 2019 15:33

qinhanmin2014 reviewed Jan 16, 2019

View reviewed changes

doc/whats_new/v0.20.rst Outdated Show resolved Hide resolved

Update whats new entry

9966e20

jnothman merged commit 8a604f7 into scikit-learn:master Jan 16, 2019

nixphix mentioned this pull request Jan 17, 2019

[MRG] Parallelisation of decomposition/sparse_encode #13005

Merged

qinhanmin2014 added this to the 0.20.3 milestone Feb 18, 2019

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Feb 19, 2019

FIX parallelisation of kmeans clustering (scikit-learn#12955)

f809a8f

jnothman mentioned this pull request Feb 19, 2019

Release 0.20.3 #13186

Merged

17 tasks

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

FIX parallelisation of kmeans clustering (scikit-learn#12955)

f5e9d8e

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX parallelisation of kmeans clustering (scikit-learn#12955)"

6ca523e

This reverts commit f5e9d8e.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "FIX parallelisation of kmeans clustering (scikit-learn#12955)"

4788090

This reverts commit f5e9d8e.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

FIX parallelisation of kmeans clustering (scikit-learn#12955)

5f00d9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] Fix parallelisation of kmeans clustering #12955

[MRG] Fix parallelisation of kmeans clustering #12955

Uh oh!

nixphix commented Jan 11, 2019

Uh oh!

nixphix commented Jan 11, 2019

Uh oh!

Uh oh!

jnothman commented Jan 14, 2019

Uh oh!

jnothman commented Jan 14, 2019

Uh oh!

jeremiedbb commented Jan 15, 2019

Uh oh!

nixphix commented Jan 15, 2019

Uh oh!

jnothman commented Jan 16, 2019

Uh oh!

qinhanmin2014 left a comment

Uh oh!

qinhanmin2014 commented Jan 16, 2019

Uh oh!

jeremiedbb commented Jan 16, 2019

Uh oh!

qinhanmin2014 commented Jan 16, 2019

Uh oh!

nixphix commented Jan 16, 2019

Uh oh!

Uh oh!

jnothman commented Jan 16, 2019

Uh oh!

Uh oh!

Uh oh!

[MRG] Fix parallelisation of kmeans clustering #12955

[MRG] Fix parallelisation of kmeans clustering #12955

Uh oh!

Conversation

nixphix commented Jan 11, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

nixphix commented Jan 11, 2019

Uh oh!

Uh oh!

jnothman commented Jan 14, 2019

Uh oh!

jnothman commented Jan 14, 2019

Uh oh!

jeremiedbb commented Jan 15, 2019

Uh oh!

nixphix commented Jan 15, 2019

Uh oh!

jnothman commented Jan 16, 2019

Uh oh!

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Jan 16, 2019

Uh oh!

jeremiedbb commented Jan 16, 2019

Uh oh!

qinhanmin2014 commented Jan 16, 2019

Uh oh!

nixphix commented Jan 16, 2019

Uh oh!

Uh oh!

jnothman commented Jan 16, 2019

Uh oh!

Uh oh!