-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] Raise ValueError for metrics.cluster.supervised with too many classes #5445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add testing for the Value Error - Check that n_clusters,n_classes are not too high in contingency matrix
@@ -671,7 +728,7 @@ def adjusted_mutual_info_score(labels_true, labels_pred): | |||
return ami | |||
|
|||
|
|||
def normalized_mutual_info_score(labels_true, labels_pred): | |||
def normalized_mutual_info_score(labels_true, labels_pred, max_n_classes=5000): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seems to be above column 79 here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a 79 line I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my mistake
-Now the overall test time is below .2s for cluster metrics
LGTM |
@@ -226,6 +246,11 @@ def homogeneity_completeness_v_measure(labels_true, labels_pred): | |||
labels_pred : array, shape = [n_samples] | |||
cluster labels to evaluate | |||
|
|||
max_n_classes : int | |||
Maximal number of class handled by the adjusted_rand_score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "number of classes", IMHO
LGTM. Waiting for travis to finish and then merging. |
Travis ran. Merging! |
[MRG+2] Raise ValueError for metrics.cluster.supervised with too many classes
That's a clever way to deal with this. I'm not sure it is better than the |
If we do keep this solution, it needs a whatsnew entry, refering to this PR and a |
FWIW, I think this is the wrong solution, when most of those metrics should be using a sparse contingency matrix, as per #4788. (Apart from concerns like passing continuous targets by accident, there are problem setting for clustering where the assignments are very sparse, such as coreference resolution.) Would anyone mind if we deprecated |
The issue here is really a validation issue. If someone puts continuous scores into a clustering metric we can either explicitly catch that and raise an error; otherwise, we should return a result of 0. There are, on the other hand, real clustering problems with a large number of clusters, but any likely clustering is sparse. |
Never mind that, I'm about to make a PR of a more complete version of #7203. |
I was about to do the same :) I will wait for yours. |
oh dear. :) On 14 September 2016 at 21:38, Olivier Grisel notifications@github.com
|
Fix for #4976