[MRG+1] MAINT: Refactor and speed up silhoutte_score (samples) #6151

MechCoder · 2016-01-10T07:15:57Z

A refactor of metrics.silhouette_score. Able to get at least 2 times speedup in most cases.

 n_samples=10000, n_labels=6
 In this branch: 13.4s, in master:23.2s
 n_samples=10000, n_labels=4
 In this branch: 14.4s, in master:23.9s

 n_samples=1000, n_labels=6
 In this branch: 115ms, in master:287ms
 n_samples=1000, n_labels=4
 In this branch: 110ms, in master:249ms

 n_samples=100, n_labels=6
 In this branch: 1.87ms, in master:7.13ms
 n_samples=100, n_labels=4
 In this branch: 1.54ms, in master:5.84ms

Also fixed a bug related to non-encoded labels.

GaelVaroquaux · 2016-01-10T09:49:37Z

I think that this code would greatly benefit from a few comments, and maybe calling 'A' and 'B' with more explicite names. Right now, it is very hard to follow. Harder than the code it replaces.

MechCoder · 2016-01-10T15:47:26Z

@GaelVaroquaux I have added comments and replaced a few variable names. Can you tell me if it still harder to follow?

MechCoder · 2016-01-10T15:52:31Z

In general, I think the tests can be made more stronger but I'll leave that for another PR

TomDLT · 2016-01-13T09:54:46Z

sklearn/metrics/cluster/unsupervised.py

-    B = np.array([_nearest_cluster_distance(distances[i], labels, i)
-                  for i in range(n)])
-    sil_samples = (B - A) / np.maximum(A, B)
+    n_labels = labels.shape[0]


this is n_samples, not n_labels

MechCoder · 2016-01-13T13:42:57Z

@TomDLT looks ok now?

MechCoder · 2016-01-13T13:46:18Z

@agramfort if you have the time :-)

TomDLT · 2016-01-13T14:01:41Z

LGTM, you can squash

amueller · 2016-01-15T21:00:29Z

needs rebase ;)

amueller · 2016-01-15T21:01:45Z

how does it scale to very large n_labels?

MechCoder · 2016-01-15T21:16:14Z

It scales pretty well ! :-) (best of 3)

For n_samples=10000, n_labels=100
In this branch: 12.3s, In master: 28s
For n_samples=1000, n_labels=100
In this branch: 277ms, In master: 992ms

Do I haz your +1 as well?

MechCoder · 2016-01-15T21:18:30Z

rebased

GaelVaroquaux · 2016-01-17T12:32:27Z

👍 on my side. This is a good rewrite. It is clear and fast. I am merging this!

[MRG+1] MAINT: Refactor and speed up silhoutte_score (samples)

MechCoder · 2016-01-17T15:13:17Z

thanks for the reviews!

MechCoder force-pushed the silhouette_score_refactor branch from ff2f719 to 76b1b3d Compare January 10, 2016 07:33

MechCoder force-pushed the silhouette_score_refactor branch from ec3ce2f to 7bb62bd Compare January 10, 2016 15:51

TomDLT reviewed Jan 13, 2016
View reviewed changes

TomDLT changed the title ~~[MRG] MAINT: Refactor and speed up silhoutte_score (samples)~~ [MRG+1] MAINT: Refactor and speed up silhoutte_score (samples) Jan 13, 2016

TomDLT added the Waiting for Reviewer label Jan 13, 2016

MechCoder force-pushed the silhouette_score_refactor branch from f91abff to 11f9b08 Compare January 13, 2016 14:07

MAINT: Refactor and speed up silhoutte_score

c7ce0ab

MechCoder force-pushed the silhouette_score_refactor branch from 11f9b08 to c7ce0ab Compare January 15, 2016 21:18

GaelVaroquaux added a commit that referenced this pull request Jan 17, 2016

Merge pull request #6151 from MechCoder/silhouette_score_refactor

769127c

[MRG+1] MAINT: Refactor and speed up silhoutte_score (samples)

GaelVaroquaux merged commit 769127c into scikit-learn:master Jan 17, 2016

MechCoder deleted the silhouette_score_refactor branch January 17, 2016 15:03

jnothman mentioned this pull request Aug 11, 2016

[MRG] Block-wise silhouette calculation to avoid memory consumption #7177

Closed

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+1] MAINT: Refactor and speed up silhoutte_score (samples) #6151

[MRG+1] MAINT: Refactor and speed up silhoutte_score (samples) #6151

Uh oh!

MechCoder commented Jan 10, 2016

Uh oh!

GaelVaroquaux commented Jan 10, 2016

Uh oh!

MechCoder commented Jan 10, 2016

Uh oh!

MechCoder commented Jan 10, 2016

Uh oh!

TomDLT Jan 13, 2016

Uh oh!

MechCoder commented Jan 13, 2016

Uh oh!

MechCoder commented Jan 13, 2016

Uh oh!

TomDLT commented Jan 13, 2016

Uh oh!

amueller commented Jan 15, 2016

Uh oh!

amueller commented Jan 15, 2016

Uh oh!

MechCoder commented Jan 15, 2016

Uh oh!

MechCoder commented Jan 15, 2016

Uh oh!

GaelVaroquaux commented Jan 17, 2016

Uh oh!

MechCoder commented Jan 17, 2016

Uh oh!

Uh oh!

Uh oh!

[MRG+1] MAINT: Refactor and speed up silhoutte_score (samples) #6151

[MRG+1] MAINT: Refactor and speed up silhoutte_score (samples) #6151

Uh oh!

Conversation

MechCoder commented Jan 10, 2016

Uh oh!

GaelVaroquaux commented Jan 10, 2016

Uh oh!

MechCoder commented Jan 10, 2016

Uh oh!

MechCoder commented Jan 10, 2016

Uh oh!

TomDLT Jan 13, 2016

Choose a reason for hiding this comment

Uh oh!

MechCoder commented Jan 13, 2016

Uh oh!

MechCoder commented Jan 13, 2016

Uh oh!

TomDLT commented Jan 13, 2016

Uh oh!

amueller commented Jan 15, 2016

Uh oh!

amueller commented Jan 15, 2016

Uh oh!

MechCoder commented Jan 15, 2016

Uh oh!

MechCoder commented Jan 15, 2016

Uh oh!

GaelVaroquaux commented Jan 17, 2016

Uh oh!

MechCoder commented Jan 17, 2016

Uh oh!

Uh oh!