-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] Added metrics support for multiclass-multioutput classification #3681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@MechCoder could you please help in figuring out the test failure? |
The errors look like somehow you're transforming metric outputs into integers... |
Your current code in |
@jnothman The issue was with the |
if y_type == 'multilabel-sequences': | ||
labels = unique_labels(y_true, y_pred) | ||
binarizer = MultiLabelBinarizer(classes=labels, sparse_output=True) | ||
y_true = binarizer.fit_transform(y_true) | ||
y_pred = binarizer.fit_transform(y_pred) | ||
|
||
y_type = 'multilabel-indicator' | ||
if y_type == 'multiclass-multioutput': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clause is redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman Yes i will remove that, apart from that does this look good?
Thanks for tackling this issue! Can you also update the documentation and narrative documentation? |
assert_equal(zero_one_loss(y1, y2), 0.5) | ||
assert_equal(zero_one_loss(y1, y1), 0) | ||
assert_equal(zero_one_loss(y2, y2), 0) | ||
assert_equal(zero_one_loss(y2, [(), ()]), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not multi-class multi-output, but multi-label sequence. This should result in an error.
@arjoly I have made some changes you suggested! |
y_pred = random_state.randint(0, 4, size=(20, 5)) | ||
n_samples = y_true.shape[0] | ||
|
||
for name in ["accuracy_score","zero_one_loss"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I would add a constant METRICS_WITH_MULTICLASS_MULITOUTPUT
at the top and loop over it.
@arjoly Does this look good now? |
@@ -74,8 +74,9 @@ tasks :ref:`Decision Trees <tree>`, :ref:`Random Forests <forest>`, | |||
|
|||
.. warning:: | |||
|
|||
At present, no metric in :mod:`sklearn.metrics` | |||
supports the multioutput-multiclass classification task. | |||
At present, metrics such as accuracy_score and zero_one_loss in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say :
At present, only the :fun:`accuracy_score`and :fun:`zero_one_loss` support multioutput-multiclass classification task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm this paragraph could be removed. Since thanks to you, we will have metrics now.
For the narrative doc, I was thinking in updating this page / file |
@arjoly I have adressed the comments, Does this look good? |
random_state = check_random_state(0) | ||
y_true = random_state.randint(0, 4, size=(20, 5)) | ||
y_pred = random_state.randint(0, 4, size=(20, 5)) | ||
for name in ALL_METRICS.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the keys to iterate over all keys
@arjoly Sorry for the late reply. I was busy with my university exams. |
|
||
for name in ALL_METRICS: | ||
if (name not in METRICS_WITH_MULTICLASS_MULITOUTPUT and | ||
name not in MULTIOUTPUT_METRICS): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I think that here it should be an or
instead of an and
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arjoly I dont think so, the test is to raise an Exception
for all the metrics which do not support multiclass-multioutput inputs and note that Multioutput_Metrics
do support them hence the and
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm sorry, I misread the not.
Can you ensure that we still get meaninfull error message? Now, we have
While previously it was returning
? |
@arjoly Any more changes to be made? |
@@ -293,6 +300,10 @@ In the multilabel case with binary label indicators: :: | |||
>>> accuracy_score(np.array([[0, 1], [1, 1]]), np.ones((2, 2))) | |||
0.5 | |||
|
|||
In the case of multiclass-multioutput: :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a blank line for readability of the source. Also you can write multiclass-multioutput::
directly instead of multiclass-multioutput: ::
.
@Akshay0724, there was a request at #3453 that this be finished up. Do you intend to complete it, or should we find another contributor? |
Don't any developer wants to resolve conflicts within this? @jnothman I'll be happy contribute to this one as I need it personally. |
Thanks for the ping. I don't think it would hurt to support multioutput for
accuracy. We should probably review this.
|
Fix for #3453
Ping @arjoly . Added support for
zero_one_loss
andaccuracy_score