[MRG] support sample_weight in silhouette_score #4087

jnothman · 2015-01-12T13:35:12Z

I sought sample_weight in silhouette_score, to account for multiple points that are merged into one when calculating average distances. Hacking it into the current implementation resulted in a very slow solution. Thus this PR also rewrites the implementation, yielding something that's a bit slower than the solution at master, but supports sample_weight.

I've also added tests for correctness which I haven't otherwise found in the code.

amueller · 2015-01-13T23:14:26Z

sklearn/metrics/cluster/tests/test_unsupervised.py

                         silhouette_score, X, y)
+
+
+def test_paper_example():


Can you maybe add the reference Silhouettes: a Graphical Aid to the Interpretation and Validation of Cluster Analysis again?

amueller · 2015-01-13T23:27:54Z

LGTM apart from an explanation of the strategy and possibly a reference.

jnothman · 2015-06-03T10:53:18Z

Btw, I don't know its value as a reference, but http://cs.au.dk/~simina/weighted.pdf intuits the notion of weighted clustering as I did (last para section 4), but I've not yet read on to see how they extend this to the real-valued case

amueller · 2015-06-03T19:22:29Z

needs a rebase btw.

jnothman · 2015-06-06T14:01:40Z

rebased

MechCoder · 2015-11-10T22:38:08Z

sklearn/metrics/cluster/unsupervised.py

    distances = pairwise_distances(X, metric=metric, **kwds)
+    if sample_weight is not None:


This can also be viewed like,

for a given sample, if another sample belonging to the same cluster has a very high sample weight and is far away, it should reduce the silhouette score more
right?
And vice versa.

MechCoder · 2015-11-10T23:05:26Z

You need to Rebase.

Again.

MechCoder · 2015-11-10T23:06:19Z

sklearn/metrics/cluster/tests/test_unsupervised.py

@@ -68,3 +68,64 @@ def test_correct_labelsize():
                         'Number of labels is %d\. Valid values are 2 '
                         'to n_samples - 1 \(inclusive\)' % len(np.unique(y)),
                         silhouette_score, X, y)
+
+
+def test_paper_example():


test_paper_example is not a great name for a test :P . Would be great to mention the name of the paper.

haiatn · 2023-08-26T20:43:43Z

Is it possible #11135 should have closed this? I see the current version has D_chunk that calculates sample_weights, although I must say I am familiar with the algorithm. I also see the paper example test is in the main branch

adrinjalali · 2024-03-06T09:11:18Z

@jnothman would you be able to give this a fresh update?

If not, @StefanieSenger would you be able to take this over and update per our codebase these days?

jnothman force-pushed the silhouette branch 2 times, most recently from ccf6cf9 to e56d3af Compare January 12, 2015 13:52

amueller reviewed Jan 13, 2015
View reviewed changes

terrycojones mentioned this pull request Mar 1, 2015

Added missing space to exception message #4308

Merged

jnothman force-pushed the silhouette branch from e56d3af to aa45432 Compare June 6, 2015 14:01

jnothman force-pushed the silhouette branch from aa45432 to baf5bd0 Compare June 14, 2015 14:35

jnothman added 3 commits June 15, 2015 00:37

ENH vectorized silhouette calculation

408b8e7

TST add data-based test to silhouette score

59cffea

ENH sample_weight support in silhouette_score

baf5bd0

MechCoder reviewed Nov 10, 2015
View reviewed changes

amueller added the Waiting for Reviewer label Dec 10, 2015

jnothman mentioned this pull request Aug 11, 2016

[MRG] Block-wise silhouette calculation to avoid memory consumption #7177

Closed

10 tasks

github-actions bot added the module:metrics label Mar 2, 2020

cmarmo removed the Waiting for Reviewer label Dec 14, 2020

Base automatically changed from master to main January 22, 2021 10:48

adrinjalali added the Stalled label Mar 6, 2024

adrinjalali added the help wanted label Apr 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] support sample_weight in silhouette_score #4087

[MRG] support sample_weight in silhouette_score #4087

Uh oh!

jnothman commented Jan 12, 2015

Uh oh!

amueller Jan 13, 2015

Uh oh!

amueller commented Jan 13, 2015

Uh oh!

jnothman commented Jun 3, 2015

Uh oh!

amueller commented Jun 3, 2015

Uh oh!

jnothman commented Jun 6, 2015

Uh oh!

MechCoder Nov 10, 2015

Uh oh!

MechCoder commented Nov 10, 2015

Uh oh!

MechCoder Nov 10, 2015

Uh oh!

haiatn commented Aug 26, 2023

Uh oh!

adrinjalali commented Mar 6, 2024

Uh oh!

Uh oh!

		distances = pairwise_distances(X, metric=metric, **kwds)
		if sample_weight is not None:

Uh oh!

[MRG] support sample_weight in silhouette_score #4087

Are you sure you want to change the base?

[MRG] support sample_weight in silhouette_score #4087

Uh oh!

Conversation

jnothman commented Jan 12, 2015

Uh oh!

amueller Jan 13, 2015

Choose a reason for hiding this comment

Uh oh!

amueller commented Jan 13, 2015

Uh oh!

jnothman commented Jun 3, 2015

Uh oh!

amueller commented Jun 3, 2015

Uh oh!

jnothman commented Jun 6, 2015

Uh oh!

MechCoder Nov 10, 2015

Choose a reason for hiding this comment

Uh oh!

MechCoder commented Nov 10, 2015

Uh oh!

MechCoder Nov 10, 2015

Choose a reason for hiding this comment

Uh oh!

haiatn commented Aug 26, 2023

Uh oh!

adrinjalali commented Mar 6, 2024

Uh oh!

Uh oh!