-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[WIP] ENH Multilabel confusion matrix #4126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f05e92a
to
bdff9e2
Compare
sklearn/metrics/classification.py
Outdated
mcc = np.empty(shape=(n_labels, 4), dtype=int) | ||
|
||
for label_idx in range(n_labels): | ||
y_pred_col = y_pred.getcol(label_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember the state of things, but I think this needs to support dense or sparse y
and you should use the count_nonzero
helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick comment! will fix this by tomorrow... soon ;)
72fc7b0
to
b811206
Compare
b811206
to
7b2e166
Compare
I am unable to validate the below inputs for
If this is done the test which checks for mix of binary and multilabel-sequences should be scraped... Is that okay? Should EDIT: Or perhaps I should locally deal with it and binarize it before the Also looks like |
7b2e166
to
3752f9e
Compare
Don't worry about multilabel-sequences. They're deprecated and do not need to be supported in new code. |
8f16312
to
63b16a9
Compare
c55cd16
to
7d5741d
Compare
@jnothman This is done... please take a look when you find time! |
7d5741d
to
530bf04
Compare
sklearn/metrics/classification.py
Outdated
@@ -186,6 +187,74 @@ def accuracy_score(y_true, y_pred, normalize=True, sample_weight=None): | |||
return _weighted_sum(score, sample_weight, normalize) | |||
|
|||
|
|||
def multilabel_confusion_matrix(y_true, y_pred): | |||
"""Compute True positive, False positive, True negative, False negative | |||
for a multilabel classification problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow PEP 257: one liner short description, blank line, possibly multi-line detailed description.
Having a short first line is important for many dev tools (e.g. IDEs) and API documentation tools such as the API summary page built with sphinx.
def multilabel_confusion_matrix(y_true, y_pred):
"""Compute the confusion matrix for a multilabel classification problem.
Compute True positive, False positive, True negative, False negative rates for each
binary classification sub-problem.
...
"""
...
@ogrisel I think I did this wrong... According to table 7 of the reference paper and your answer in metaoptimize ( ;) ) the implementation should use tp tn fp fn but must be presented it in a different way. Could you advise if my understanding is true and whether I should modify it that way? |
The website seems to be down... Kindly refer the cached page instead. |
doc/modules/model_evaluation.rst
Outdated
Each row contains the values for the 4 expressions of the F-Score namely | ||
:math:`TP, FP, TN, FN`. | ||
|
||
For each label (row) the values are computed as follows : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is described more succinctly under Binary Classification. Reference that instead.
I think as @ogrisel says on metaoptimize, the notion of confusion matrix when it comes to multilabel data is not especially standardised. This tp, fp, fn, tn breakdown is useful, but admittedly it doesn't really show confusion (and the same information is more-or-less conveyed by The sort of confusion matrix shown in table 7 of the reference paper may be useful, although I'm not actually sure that the partial scores (i.e. dividing the counts among the number of predicted) help much with interpretation, but I admit that at least the marginals make some sense. Are there other references that show this confusion matrix (a textbook would be a good start)? Another interesting analysis akin to a confusion matrix would essentially treat each distinct labelling as a class and while with this combinatoric explosion might make a complete multiclass confusion matrix hard to use, just getting a list of the most frequent confusions might be insightful. Rather than doing this blind, why don't you make an example called "Analysing confusion in multilabel classification" and use a real dataset (e.g. Reuters Corpus classification) to actually work out which tabulation (or visualisation) helps you explain best what's going on? |
530bf04
to
c31c96a
Compare
c31c96a
to
ecf4441
Compare
@raghavrv Were you able to get your changes in, or is it still under review. Do you need any help ? |
@pramitchoudhary If you would like to take over, please go ahead! There is some comparison (between the possible forms) that needs to be done to fix the API. |
FIX Use structured array for access label of multilabel binarized confusion matrix. DOC Minor DOC test fix. FIX Clean up and minor fixes. FIX map function from python 2 to python 3
ecf4441
to
96189d9
Compare
Closing, resolved in #11179 |
Fixes #3452
Based on #3614(This is rewritten to conform to the referenced implementations). Thanks @Magellanea!binarized_multilabel_confusion_matrix
->multilabel_confusion_matrix
(n_labels, 4)
instead of numpy struct array.multilabel_confusion_matrix
entry totest_common.py
multilabel_confusion_matrix
tests totest_classification.py
Rewrite TODO
multilabel_confusion_matrix
to conform to more standard approaches.Ref - http://www.clips.ua.ac.be/~vincent/scripts/confusionmatrix.py
Ref2 - http://www.clips.ua.ac.be/~vincent/pdf/microaverage.pdf
Ref3 - http://metaoptimize.com/qa/questions/8964/confusion-matrix-for-multiclass-multilabel-classification