Skip to content

[MRG] Added metrics support for multiclass-multioutput classification #3681

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 19 commits into from
Closed

Conversation

akshayah3
Copy link
Contributor

Fix for #3453
Ping @arjoly . Added support for zero_one_loss and accuracy_score

@akshayah3
Copy link
Contributor Author

@MechCoder could you please help in figuring out the test failure?

@jnothman
Copy link
Member

The errors look like somehow you're transforming metric outputs into integers...

@jnothman
Copy link
Member

Your current code in _check_targets is reporting the type as multilabel-indicator when the input is multiclass-multioutput. I don't see why that's creating the current barrage of errors, but it can't possibly be correct behaviour.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.04%) when pulling 392f18a on akshayah3:metrics into 9580431 on scikit-learn:master.

@akshayah3
Copy link
Contributor Author

@jnothman The issue was with the _check_targets method. I fixed it, could you review the code please?

if y_type == 'multilabel-sequences':
labels = unique_labels(y_true, y_pred)
binarizer = MultiLabelBinarizer(classes=labels, sparse_output=True)
y_true = binarizer.fit_transform(y_true)
y_pred = binarizer.fit_transform(y_pred)

y_type = 'multilabel-indicator'
if y_type == 'multiclass-multioutput':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clause is redundant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman Yes i will remove that, apart from that does this look good?

@arjoly
Copy link
Member

arjoly commented Sep 22, 2014

Thanks for tackling this issue!

Can you also update the documentation and narrative documentation?

assert_equal(zero_one_loss(y1, y2), 0.5)
assert_equal(zero_one_loss(y1, y1), 0)
assert_equal(zero_one_loss(y2, y2), 0)
assert_equal(zero_one_loss(y2, [(), ()]), 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not multi-class multi-output, but multi-label sequence. This should result in an error.

@akshayah3
Copy link
Contributor Author

@arjoly I have made some changes you suggested!

y_pred = random_state.randint(0, 4, size=(20, 5))
n_samples = y_true.shape[0]

for name in ["accuracy_score","zero_one_loss"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I would add a constant METRICS_WITH_MULTICLASS_MULITOUTPUT at the top and loop over it.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 04c02b7 on akshayah3:metrics into 9580431 on scikit-learn:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 43861b6 on akshayah3:metrics into 9580431 on scikit-learn:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling edfe486 on akshayah3:metrics into 9580431 on scikit-learn:master.

@akshayah3
Copy link
Contributor Author

@arjoly Does this look good now?

@@ -74,8 +74,9 @@ tasks :ref:`Decision Trees <tree>`, :ref:`Random Forests <forest>`,

.. warning::

At present, no metric in :mod:`sklearn.metrics`
supports the multioutput-multiclass classification task.
At present, metrics such as accuracy_score and zero_one_loss in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say :

At present, only the :fun:`accuracy_score`and :fun:`zero_one_loss` support multioutput-multiclass classification task

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm this paragraph could be removed. Since thanks to you, we will have metrics now.

@arjoly
Copy link
Member

arjoly commented Sep 22, 2014

For the narrative doc, I was thinking in updating this page / file

@akshayah3
Copy link
Contributor Author

@jnothman @arjoly Any changes to be done?

@akshayah3
Copy link
Contributor Author

@arjoly I have adressed the comments, Does this look good?

random_state = check_random_state(0)
y_true = random_state.randint(0, 4, size=(20, 5))
y_pred = random_state.randint(0, 4, size=(20, 5))
for name in ALL_METRICS.keys():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the keys to iterate over all keys

@akshayah3
Copy link
Contributor Author

@arjoly Sorry for the late reply. I was busy with my university exams.
Could you review the latest commit?


for name in ALL_METRICS:
if (name not in METRICS_WITH_MULTICLASS_MULITOUTPUT and
name not in MULTIOUTPUT_METRICS):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I think that here it should be an or instead of an and

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arjoly I dont think so, the test is to raise an Exception for all the metrics which do not support multiclass-multioutput inputs and note that Multioutput_Metrics do support them hence the and

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm sorry, I misread the not.

@arjoly
Copy link
Member

arjoly commented Nov 7, 2014

Can you ensure that we still get meaninfull error message?

Now, we have

In [1]: from sklearn.metrics import precision_score

In [2]: import numpy as np

In [3]: precision_score(np.array([[1, 2], [0, 3], [4, 3]]), np.array([[1, 2], [0, 3], [4, 3]]))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-c99aa70a6c5e> in <module>()
----> 1 precision_score(np.array([[1, 2], [0, 3], [4, 3]]), np.array([[1, 2], [0, 3], [4, 3]]))

/Users/ajoly/git/scikit-learn/sklearn/metrics/classification.py in precision_score(y_true, y_pred, labels, pos_label, average, sample_weight)
   1043                                                  average=average,
   1044                                                  warn_for=('precision',),
-> 1045                                                  sample_weight=sample_weight)
   1046     return p
   1047 

/Users/ajoly/git/scikit-learn/sklearn/metrics/classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight)
    843     label_order = labels  # save this for later
    844     if labels is None:
--> 845         labels = unique_labels(y_true, y_pred)
    846     else:
    847         labels = np.asarray(labels)

/Users/ajoly/git/scikit-learn/sklearn/utils/multiclass.py in unique_labels(*ys)
     85     # Check that we don't mix label format
     86 
---> 87     ys_types = set(type_of_target(x) for x in ys)
     88     if ys_types == set(["binary", "multiclass"]):
     89         ys_types = set(["multiclass"])

/Users/ajoly/git/scikit-learn/sklearn/utils/multiclass.py in <genexpr>((x,))
     85     # Check that we don't mix label format
     86 
---> 87     ys_types = set(type_of_target(x) for x in ys)
     88     if ys_types == set(["binary", "multiclass"]):
     89         ys_types = set(["multiclass"])

/Users/ajoly/git/scikit-learn/sklearn/utils/multiclass.py in type_of_target(y)
    297         # known to fail in numpy 1.3 for array of arrays
    298         return 'unknown'
--> 299     if y.ndim > 2 or (y.dtype == object and len(y) and
    300                       not isinstance(y.flat[0], string_types)):
    301         return 'unknown'

TypeError: len() of unsized object

While previously it was returning

In [6]: precision_score(np.array([[1, 2], [0, 3], [4, 3]]), np.array([[1, 2], [0, 3], [4, 3]]))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-c99aa70a6c5e> in <module>()
----> 1 precision_score(np.array([[1, 2], [0, 3], [4, 3]]), np.array([[1, 2], [0, 3], [4, 3]]))

/Users/ajoly/git/scikit-learn/sklearn/metrics/classification.py in precision_score(y_true, y_pred, labels, pos_label, average, sample_weight)
   1033                                                  average=average,
   1034                                                  warn_for=('precision',),
-> 1035                                                  sample_weight=sample_weight)
   1036     return p
   1037 

/Users/ajoly/git/scikit-learn/sklearn/metrics/classification.py in precision_recall_fscore_support(y_true, y_pred, beta, labels, pos_label, average, warn_for, sample_weight)
    829         raise ValueError("beta should be >0 in the F-beta score")
    830 
--> 831     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    832 
    833     label_order = labels  # save this for later

/Users/ajoly/git/scikit-learn/sklearn/metrics/classification.py in _check_targets(y_true, y_pred)
     89     if (y_type not in ["binary", "multiclass", "multilabel-indicator",
     90                        "multilabel-sequences"]):
---> 91         raise ValueError("{0} is not supported".format(y_type))
     92 
     93     if y_type in ["binary", "multiclass"]:

ValueError: multiclass-multioutput is not supported

?

@akshayah3
Copy link
Contributor Author

@arjoly Any more changes to be made?

@@ -293,6 +300,10 @@ In the multilabel case with binary label indicators: ::
>>> accuracy_score(np.array([[0, 1], [1, 1]]), np.ones((2, 2)))
0.5

In the case of multiclass-multioutput: ::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a blank line for readability of the source. Also you can write multiclass-multioutput:: directly instead of multiclass-multioutput: ::.

@jnothman
Copy link
Member

@Akshay0724, there was a request at #3453 that this be finished up. Do you intend to complete it, or should we find another contributor?

@arf1372
Copy link

arf1372 commented Feb 4, 2019

Don't any developer wants to resolve conflicts within this?
I need a multiclass-multioutput metric for grid search in my task and I see no support of such metric in sk-learn unfortunately.

@jnothman I'll be happy contribute to this one as I need it personally.

@jnothman
Copy link
Member

jnothman commented Feb 4, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants