[MRG+2] Raise ValueError for metrics.cluster.supervised with too many classes #5445

tomMoral · 2015-10-19T10:09:55Z

Fix for #4976

Add max_n_classes param to cluster.supervised metric
Add testing for the Value Error
Check that n_clusters,n_classes are not too high in contingency matrix

- Add testing for the Value Error - Check that n_clusters,n_classes are not too high in contingency matrix

TomDLT · 2015-10-19T11:26:42Z

sklearn/metrics/cluster/supervised.py

@@ -671,7 +728,7 @@ def adjusted_mutual_info_score(labels_true, labels_pred):
    return ami


-def normalized_mutual_info_score(labels_true, labels_pred):
+def normalized_mutual_info_score(labels_true, labels_pred, max_n_classes=5000):


~~You seems to be above column 79 here~~

It's a 79 line I think.

-Now the overall test time is below .2s for cluster metrics

TomDLT · 2015-10-19T12:28:29Z

LGTM

GaelVaroquaux · 2015-10-19T12:34:09Z

sklearn/metrics/cluster/supervised.py

@@ -226,6 +246,11 @@ def homogeneity_completeness_v_measure(labels_true, labels_pred):
    labels_pred : array, shape = [n_samples]
        cluster labels to evaluate

+    max_n_classes : int
+        Maximal number of class handled by the adjusted_rand_score


typo: "number of classes", IMHO

GaelVaroquaux · 2015-10-19T14:23:26Z

LGTM. Waiting for travis to finish and then merging.

GaelVaroquaux · 2015-10-19T21:31:25Z

Travis ran. Merging!

[MRG+2] Raise ValueError for metrics.cluster.supervised with too many classes

amueller · 2015-10-21T09:13:42Z

That's a clever way to deal with this. I'm not sure it is better than the type_of_target logic, which uses _is_integral_float, though. The issue came up because someone gave floats to the functions. And I think we should deal with this issue consistently.

amueller · 2015-10-21T09:14:40Z

If we do keep this solution, it needs a whatsnew entry, refering to this PR and a versionadded for the option.

jnothman · 2016-08-17T23:28:08Z

FWIW, I think this is the wrong solution, when most of those metrics should be using a sparse contingency matrix, as per #4788. (Apart from concerns like passing continuous targets by accident, there are problem setting for clustering where the assignments are very sparse, such as coreference resolution.) Would anyone mind if we deprecated max_n_classes? I think it's very obscure.

jnothman · 2016-09-14T10:18:07Z

The issue here is really a validation issue. If someone puts continuous scores into a clustering metric we can either explicitly catch that and raise an error; otherwise, we should return a result of 0.

There are, on the other hand, real clustering problems with a large number of clusters, but any likely clustering is sparse.

ogrisel · 2016-09-14T11:16:51Z

For the sake of navigating between issues, a rebased version of #4788 to implement sparse contengy matrix computation has been re-submitted as #7203.

jnothman · 2016-09-14T11:27:08Z

Never mind that, I'm about to make a PR of a more complete version of #7203.

ogrisel · 2016-09-14T11:38:11Z

I was about to do the same :) I will wait for yours.

jnothman · 2016-09-14T11:41:46Z

oh dear. :)

On 14 September 2016 at 21:38, Olivier Grisel notifications@github.com
wrote:

I was about to do the same :) I will wait for yours.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#5445 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz6wi1kefvLT7r1WO9p00w9h4p-7v7ks5qp9ylgaJpZM4GROGL
.

- Add max_n_classes param to cluster.supervised metric

fbd4b79

- Add testing for the Value Error - Check that n_clusters,n_classes are not too high in contingency matrix

TomDLT reviewed Oct 19, 2015
View reviewed changes

Reduce test time.

e916008

-Now the overall test time is below .2s for cluster metrics

GaelVaroquaux reviewed Oct 19, 2015
View reviewed changes

GaelVaroquaux changed the title ~~[MRG] Raise ValueError for metrics.cluster.supervised with too many classes~~ [MRG+1] Raise ValueError for metrics.cluster.supervised with too many classes Oct 19, 2015

GaelVaroquaux changed the title ~~[MRG+1] Raise ValueError for metrics.cluster.supervised with too many classes~~ [MRG+2] Raise ValueError for metrics.cluster.supervised with too many classes Oct 19, 2015

Correct Typo with Gael comment

1dd5518

tomMoral force-pushed the devTom branch from 9b08b13 to 1dd5518 Compare October 19, 2015 14:38

GaelVaroquaux added a commit that referenced this pull request Oct 19, 2015

Merge pull request #5445 from tomMoral/devTom

f6b85d9

[MRG+2] Raise ValueError for metrics.cluster.supervised with too many classes

GaelVaroquaux merged commit f6b85d9 into scikit-learn:master Oct 19, 2015

GaelVaroquaux mentioned this pull request Oct 19, 2015

metrics.mutual_info_score hangs when given real vectors #4976

Closed

tomMoral deleted the devTom branch October 19, 2015 21:38

ogrisel mentioned this pull request Sep 14, 2016

[MRG + 1] More versionadded everywhere! #7403

Merged

jnothman mentioned this pull request Sep 14, 2016

[MRG+1] Use sparse cluster contingency matrix by default #7419

Closed

Uh oh!

[MRG+2] Raise ValueError for metrics.cluster.supervised with too many classes #5445

[MRG+2] Raise ValueError for metrics.cluster.supervised with too many classes #5445

Uh oh!

Conversation

tomMoral commented Oct 19, 2015

Uh oh!

TomDLT Oct 19, 2015

Choose a reason for hiding this comment

Uh oh!

tomMoral Oct 19, 2015

Choose a reason for hiding this comment

Uh oh!

TomDLT Oct 19, 2015

Choose a reason for hiding this comment

Uh oh!

TomDLT commented Oct 19, 2015

Uh oh!

GaelVaroquaux Oct 19, 2015

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux commented Oct 19, 2015

Uh oh!

GaelVaroquaux commented Oct 19, 2015

Uh oh!

amueller commented Oct 21, 2015

Uh oh!

amueller commented Oct 21, 2015

Uh oh!

jnothman commented Aug 17, 2016

Uh oh!

jnothman commented Sep 14, 2016

Uh oh!

ogrisel commented Sep 14, 2016

Uh oh!

jnothman commented Sep 14, 2016

Uh oh!

ogrisel commented Sep 14, 2016

Uh oh!

jnothman commented Sep 14, 2016

Uh oh!

Uh oh!