[MRG] Multi-class roc_auc_score #10481

maskani-moh · 2018-01-16T16:34:19Z

Reference Issues/PRs

Fixes #7663
See also 3298

What does this implement/fix? Explain your changes.

This PR takes over the work initiated in PR #7663 and complements it given the comments on the same thread.

This PR incorporates ROC AUC computations as defined by:

Hand & Till (2001), one vs one
Provost & Domingo (2000), one vs rest

It also:

Does the tests for: OvO, OvR and invariance under permutation.
Validates the input in the multiclass case: np.allclose(1, y_score.sum(axis=1)).

Any other comments?

Due to rebase issues, I had to create a new PR from scratch including the work previously done.

maskani-moh · 2018-01-16T16:59:41Z

sklearn/metrics/ranking.py

+            # Hand & Till (2001) implementation
+            return _average_multiclass_ovo_score(
+                _binary_roc_auc_score, y_true, y_score, average)
+        elif multiclass == "ovr" and average == "weighted":


Is it the best way to use the P&D definition?
Should we state in the docstring that if one want to use the P&D implementation, he/she should set the parameters multiclass == "ovr" and average == "weighted"?

What happens if someone sets multiclass='ovr' and average='macro' right now?

jnothman · 2018-01-17T03:58:51Z

You have test failures.

…ndicator format

maskani-moh · 2018-01-19T17:54:45Z

TL;DR: one test fails because of check_array where dtype='object'. How to handle this case?

The only failing test is test_invariance_string_vs_numbers_labels (under metrics/tests/test_common.py) because of measure_with_strobj = metric(y1_str.astype('O'), y2) (here)

What happens is that in the roc_auc_score function (ranking.py) I check the array y_true = check_array(y_true, [ensure_2d=False)(here) which fails when the type of the elements in the array is object (while it does not for other types str, float, ...). The line in question in check_array is this one.

@jnothman any thoughts on how could I handle this case?

maskani-moh · 2018-01-19T18:03:16Z

Is it somehow related to this issue?

mfenner1 · 2018-02-07T21:50:29Z

sklearn/metrics/base.py

+    pair_scores = np.empty(n_pairs)
+
+    ix = 0
+    for a, b in itertools.combinations(range(n_classes), 2):


Consider using enumerate to avoid the manual ix and increment?

…c-new

jnothman · 2018-02-20T09:59:50Z

Have you tried just using dtype=None in that check_array call

amueller · 2018-03-16T19:21:45Z

@maskani-moh any progress on this? Do you need my help?

…c-new

amueller · 2018-03-26T14:51:21Z

still erroring ;) (just lines too long)

maskani-moh · 2018-03-26T14:53:25Z

Flake issue ... fixing that now

jnothman

I'm not sure I can review this all right now, but surely you should be modifying metrics/tests/test_common.py to run common tests on the multiclass variants?

jnothman · 2018-01-17T03:59:15Z

sklearn/metrics/ranking.py

+                                  y_score.shape[1] > 2):
+        # validation of the input y_score
+        if not np.allclose(1, y_score.sum(axis=1)):
+            raise ValueError("Target scores should sum up to 1.0 for all"


space missing between "all" and "samples"

We only need this for OvO, not for OvR, right?

jnothman · 2018-03-26T21:57:44Z

sklearn/metrics/ranking.py

+        # do not support partial ROC computation for multiclass
+        if max_fpr is not None and max_fpr != 1.:
+            raise ValueError("Partial AUC computation not available in "
+                             "multiclass setting. Parameter 'max_fpr' must be"


Please be consistent within a string about whether white space appears at the end or start of a line

GTimothy · 2018-05-14T13:58:44Z

Hi !
Would love to see multi-class ROC AUC capability !
I think that @maskani-moh has addressed the change requests on his branch, am I wrong?
(thanks for this great library!)

jnothman · 2018-05-15T11:52:33Z

common metric tests aren't currently testing this case, so no, it needs work

janvanrijn · 2018-07-26T03:16:50Z

@jnothman I am happy to do some work on this PR.

I just forked this branch and pulled master into it, now 4 common test cases fail on my side (test_root_import_all_completeness, test_non_meta_estimators[GaussianProcess-GaussianProcess-check_fit2d_1sample], test_non_meta_estimators[GaussianProcess-GaussianProcess-check_supervised_y_2d], test_non_meta_estimators[GaussianProcess-GaussianProcess-check_estimators_overwrite_params]). This is unexpected behavior, right? (I conclude this because both the unit tests in this branch and the unit tests in master seem to pass.. I'll spare you further details and stacktraces)

Can you please explain what you mean with common metric tests? When I remove roc_auc_score from METRIC_UNDEFINED_MULTICLASS, no additional tests are invoked (i.e., number of tests is still 5158). Is there somewhere some documentation that I could read into?

jnothman · 2018-07-26T21:28:16Z

We haven't changed the common metric tests to use pytest collection, so yes, it's hard to tell if tests are being run. The issue here, iirc, is that we don't currently have common metric tests designed for score-based multiclass metrics.

janvanrijn · 2018-07-26T21:38:51Z

So if I understand the todo for this PR correct, it should add common metric tests for score-based multi-class metrics?

jnothman · 2018-07-29T03:40:51Z

You should double check what code paths a multiclass roc_auc_score would go through in test_common.py (unfortunately) and make sure that it's actually covered.

jnothman · 2018-07-29T03:41:24Z

Really test_common should be better at telling us when something is not fully tested.

amueller · 2018-10-05T19:21:53Z

sklearn/metrics/base.py

+            if average == "weighted" else np.average(pair_scores))
+
+
+def _average_multiclass_ovr_score(binary_metric, y_true, y_score, average):


is this not the same as _average_binary_score?

amueller · 2019-07-17T21:32:28Z

fixed in #12789

maskani-moh added 3 commits January 16, 2018 11:23

Add Hand & Till (OvO) and Provost & Domingos (OvR) implementations

a666180

Add multi-class implementation in roc_auc_score method

118a700

Add tests for multi-class settings OvO and OvR

3371b1d

maskani-moh mentioned this pull request Jan 16, 2018

[MRG] Support for multi-class roc_auc scores #7663

Closed

4 tasks

maskani-moh commented Jan 16, 2018

View reviewed changes

maskani-moh added 5 commits January 17, 2018 15:06

Fix binary case roc computation

d74ce16

Make scores add up to 1.0

805d804

Fix typo

2bd693e

Differenciate binary case explicitly to avoid error when multilabel-i…

fc54dde

…ndicator format

Fix prediciton scores

133a09a

mfenner1 reviewed Feb 7, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into multiclass-roc-au…

bc40110

…c-new

maskani-moh added 3 commits March 26, 2018 10:02

Merge remote-tracking branch 'upstream/master' into multiclass-roc-au…

0d035e3

…c-new

Fix test error by setting param dtype=None

d08f084

Quick fix

4c7a656

maskani-moh added 2 commits March 26, 2018 11:25

Raise error for partial computation in multiclass

4723b00

Fix pep8

aa6dd49

jnothman requested changes Mar 26, 2018

View reviewed changes

amueller mentioned this pull request Apr 5, 2018

permutation_test_score has no user guide #10905

Closed

This was referenced Oct 5, 2018

test_sample_order_invariance in common metrics tests applied to threshold metrics #12308

Open

WIP Multiclass roc auc #12311

Closed

amueller reviewed Oct 5, 2018

View reviewed changes

thomasjpfan mentioned this pull request Dec 14, 2018

[MRG] Adds multiclass ROC AUC #12789

Merged

amueller closed this Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Multi-class roc_auc_score #10481

[MRG] Multi-class roc_auc_score #10481

maskani-moh commented Jan 16, 2018

maskani-moh Jan 16, 2018

amueller Oct 5, 2018

jnothman commented Jan 17, 2018

maskani-moh commented Jan 19, 2018

maskani-moh commented Jan 19, 2018

mfenner1 Feb 7, 2018

jnothman commented Feb 20, 2018

amueller commented Mar 16, 2018

amueller commented Mar 26, 2018 •

edited

Loading

maskani-moh commented Mar 26, 2018

jnothman left a comment

jnothman Jan 17, 2018

amueller Oct 5, 2018

jnothman Mar 26, 2018

GTimothy commented May 14, 2018

jnothman commented May 15, 2018 via email

janvanrijn commented Jul 26, 2018

jnothman commented Jul 26, 2018

janvanrijn commented Jul 26, 2018

jnothman commented Jul 29, 2018

jnothman commented Jul 29, 2018 via email

amueller Oct 5, 2018

amueller commented Jul 17, 2019

		if average == "weighted" else np.average(pair_scores))


		def _average_multiclass_ovr_score(binary_metric, y_true, y_score, average):

[MRG] Multi-class roc_auc_score #10481

[MRG] Multi-class roc_auc_score #10481

Conversation

maskani-moh commented Jan 16, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

maskani-moh Jan 16, 2018

Choose a reason for hiding this comment

amueller Oct 5, 2018

Choose a reason for hiding this comment

jnothman commented Jan 17, 2018

maskani-moh commented Jan 19, 2018

maskani-moh commented Jan 19, 2018

mfenner1 Feb 7, 2018

Choose a reason for hiding this comment

jnothman commented Feb 20, 2018

amueller commented Mar 16, 2018

amueller commented Mar 26, 2018 • edited Loading

maskani-moh commented Mar 26, 2018

jnothman left a comment

Choose a reason for hiding this comment

jnothman Jan 17, 2018

Choose a reason for hiding this comment

amueller Oct 5, 2018

Choose a reason for hiding this comment

jnothman Mar 26, 2018

Choose a reason for hiding this comment

GTimothy commented May 14, 2018

jnothman commented May 15, 2018 via email

janvanrijn commented Jul 26, 2018

jnothman commented Jul 26, 2018

janvanrijn commented Jul 26, 2018

jnothman commented Jul 29, 2018

jnothman commented Jul 29, 2018 via email

amueller Oct 5, 2018

Choose a reason for hiding this comment

amueller commented Jul 17, 2019

amueller commented Mar 26, 2018 •

edited

Loading