-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] MAINT: Refactor and speed up silhoutte_score (samples) #6151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] MAINT: Refactor and speed up silhoutte_score (samples) #6151
Conversation
ff2f719
to
76b1b3d
Compare
I think that this code would greatly benefit from a few comments, and maybe calling 'A' and 'B' with more explicite names. Right now, it is very hard to follow. Harder than the code it replaces. |
@GaelVaroquaux I have added comments and replaced a few variable names. Can you tell me if it still harder to follow? |
ec3ce2f
to
7bb62bd
Compare
In general, I think the tests can be made more stronger but I'll leave that for another PR |
B = np.array([_nearest_cluster_distance(distances[i], labels, i) | ||
for i in range(n)]) | ||
sil_samples = (B - A) / np.maximum(A, B) | ||
n_labels = labels.shape[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is n_samples
, not n_labels
@TomDLT looks ok now? |
@agramfort if you have the time :-) |
LGTM, you can squash |
f91abff
to
11f9b08
Compare
needs rebase ;) |
how does it scale to very large |
It scales pretty well ! :-) (best of 3) For n_samples=10000, n_labels=100 Do I haz your +1 as well? |
11f9b08
to
c7ce0ab
Compare
rebased |
👍 on my side. This is a good rewrite. It is clear and fast. I am merging this! |
[MRG+1] MAINT: Refactor and speed up silhoutte_score (samples)
thanks for the reviews! |
A refactor of
metrics.silhouette_score
. Able to get at least 2 times speedup in most cases.Also fixed a bug related to non-encoded labels.