BUG Fixes error with multiclass roc auc scorer #15274

thomasjpfan · 2019-10-16T17:17:10Z

Adds multiclass support for threshold metric for roc_auc_score

qinhanmin2014 · 2019-10-28T03:57:36Z

sklearn/metrics/_scorer.py

@@ -323,6 +323,12 @@ def _score(self, method_caller, clf, X, y, sample_weight=None):
                                             self._score_func.__name__))
                elif isinstance(y_pred, list):
                    y_pred = np.vstack([p[:, -1] for p in y_pred]).T
+        else:  # multiclass
+            try:
+                y_pred = method_caller(clf, "predict_proba", X)


should we try decision_function first? _ThresholdScorer means that we "Evaluate decision function output for X relative to y_true".

I agree, _ThresholdScorer should be consistent and always try to use decision_function first whatever the type of output: it's should work for any kind model that has un-normalized but continuous class assignment scores.

If ROC AUC with multiclass averaging needs normalized class assignment probabilities instead, we should use this requirement in the defintions of the scorer instances instead:

Currently, scorer.py has:

# Score functions that need decision values roc_auc_scorer = make_scorer(roc_auc_score, greater_is_better=True, needs_threshold=True) average_precision_scorer = make_scorer(average_precision_score, needs_threshold=True) roc_auc_ovo_scorer = make_scorer(roc_auc_score, needs_threshold=True, multi_class='ovo') roc_auc_ovo_weighted_scorer = make_scorer(roc_auc_score, needs_threshold=True, multi_class='ovo', average='weighted') roc_auc_ovr_scorer = make_scorer(roc_auc_score, needs_threshold=True, multi_class='ovr') roc_auc_ovr_weighted_scorer = make_scorer(roc_auc_score, needs_threshold=True, multi_class='ovr', average='weighted')

If needed, we should replace needs_threshold=True by needs_proba=True but I do not see why this would be the case for ROC AUC.

If needed, we should replace needs_threshold=True by needs_proba=True but I do not see why this would be the case for ROC AUC.

I see, so we should use need_proba=True, we do not consider decision_function in _ProbaScorer.

For roc_auc_scorer, I think we should use _ThresholdScorer becasue we accept the output from decision_function.

ogrisel · 2019-10-29T08:21:28Z

In my previous comment I said:

If needed, we should replace needs_threshold=True by needs_proba=True but I do not see why this would be the case for ROC AUC.

But are you sure we really need to use proba for multiclass ROC AUC scores? Why couldn't we use unnormalized class assignment scores also in the multiclass case?

ogrisel · 2019-10-29T08:48:35Z

Replying to myself: when calling _multiclass_roc_auc_score on un-normalized decision thresholds, one gets the following message:

ValueError: Target scores need to be probabilities for multiclass roc_auc, i.e. they should sum up to 1.0 over classes

So indeed the fix implemented in this PR is correct.

ogrisel

Please don't forget to add an entry to the changelog.

Also, a few more nits:

sklearn/metrics/tests/test_score_objects.py

ogrisel

LGTM!

ogrisel · 2019-10-30T09:22:59Z

@qinhanmin2014 any further comment?

jnothman · 2019-10-30T13:37:00Z

doc/whats_new/v0.22.rst

@@ -493,6 +493,10 @@ Changelog
  ``multioutput`` parameter.
  :pr:`14732` by :user:`Agamemnon Krasoulis <agamemnonc>`.

+- |Fix| The scorers: 'roc_auc_ovr', 'roc_auc_ovo', 'roc_auc_ovr_weighted', 


Since these were never released, it doesn't really make sense to advertise as a separate change log entry does it?

I had not realized that. Indeed we should remove that changelog entry.

thomasjpfan · 2019-10-30T17:05:31Z

Moved the mentioning of the scores to the entry about the introduction of the multiclass roc_auc metric.

qinhanmin2014 · 2019-10-31T03:07:24Z

I think there's still an annoying issue here.
In roc_auc_score, we have

if y_type == "multiclass" or (y_type == "binary" and
                              y_score.ndim == 2 and
                              y_score.shape[1] > 2):
    # multiclass case
else:
    # binary case and multilabel indicator case

which means that when y_type is binary, this can still be a multiclass problem, but in _ProbaScorer

if y_type == "binary":
    if y_pred.shape[1] == 2:
        y_pred = y_pred[:, 1]
    else:
        raise ValueError(...)

So we'll get a ValueError.

qinhanmin2014 · 2019-10-31T03:17:21Z

One possible solution is to remove input validation in Scorer. We can do input validation in the function where we calculate the score.

thomasjpfan · 2019-10-31T22:38:29Z

For reference, the weird condition on roc_auc_score is to support the following use case:

import numpy as np
from sklearn.metrics import roc_auc_score

y_true = np.array([0, 1, 0, 1])

y_score = np.array([[0.1 , 0.8 , 0.1 ],
                    [0.3 , 0.4 , 0.3 ],
                    [0.35, 0.5 , 0.15],
                    [0.  , 0.2 , 0.8 ]])

roc_auc_score(y_true, y_score, labels=[0, 1, 2], multi_class='ovo')

Without the labels this will fail. I have a feeling we need to think of a better API for this. (#12385)

thomasjpfan · 2019-10-31T23:09:04Z

Even if we remove the restriction on _ProbaScorer a user would need to create their own scorer for this to work:

scorer = make_scorer(roc_auc_score, multi_class='ovo',
                        labels=[0, 1, 2], needs_proba=True)
X, y = make_classification(n_classes=3, n_informative=3, n_samples=20,
                            random_state=0)
lr = LogisticRegression(multi_class="multinomial").fit(X, y)
scorer(lr, X, y == 0)

qinhanmin2014 · 2019-11-01T13:29:49Z

@thomasjpfan When defining built-in scorers, we often use default value of the parameters. If users do not want to use default value of the parameters, they'll need to define a scorer themselves through make_scorer. I think it's reasonable.

In multiclass roc_auc_score, we infer the labels by default. When y_true do not contain all the labels, it's impossible to imfer the labels, so it's reasonble to require users to define a scorer themselves.

Current issue is that when y_true only contains two labels, users can't define a scorer themselves because of the input validation in _ProbaScorer, so I think we need to remove that.

thomasjpfan · 2019-11-01T14:09:29Z

Updated the check in _ProbaScorer to check if y_pred looks multiclass before raising the error.

Edit: If it does not look multiclass, then it raises the ValueError.

qinhanmin2014

Maybe update _ThresholdScorer simultaneous?

qinhanmin2014 · 2019-11-01T14:19:30Z

sklearn/metrics/_scorer.py

@@ -247,7 +247,7 @@ def _score(self, method_caller, clf, X, y, sample_weight=None):
        if y_type == "binary":
            if y_pred.shape[1] == 2:
                y_pred = y_pred[:, 1]
-            else:
+            elif y_pred.shape[1] == 1:  # not multiclass


Hmm, why is this useful?

Looking at the blame, this was added in #12486 to resolve #7598

It looks like it was trying to get a better error message for the y_pred.shape[1]==1 case.

qinhanmin2014 · 2019-11-01T14:27:54Z

sklearn/metrics/tests/test_score_objects.py

+@pytest.mark.parametrize('scorer_name', [
+    'roc_auc_ovr', 'roc_auc_ovo',
+    'roc_auc_ovr_weighted', 'roc_auc_ovo_weighted'])
+def test_multiclass_roc_no_proba_scorer_errors(scorer_name):


could you please tell me why multiclass roc_auc_score do not support the output of decision_function? thanks

The paper this was based on used probabilities for ranking: https://link.springer.com/content/pdf/10.1023%2FA%3A1010920819831.pdf

For reference this was discussed in the original issue: #7663 (comment)

qinhanmin2014 · 2019-11-02T08:35:29Z

thank, though I still think that multiclass roc_auc can accept the output of decision_function. Not all estimators in scikit-learn has predict_proba :)

thomasjpfan added 3 commits October 16, 2019 12:01

BUG Fixes error with multiclass roc auc scorer

6347353

CLN Less lines

1cb82c8

Merge remote-tracking branch 'upstream/master' into fix_roc_auc_scorer

62e6305

thomasjpfan added this to the 0.22 milestone Oct 25, 2019

qinhanmin2014 reviewed Oct 28, 2019

View reviewed changes

qinhanmin2014 added the Blocker label Oct 28, 2019

thomasjpfan added 2 commits October 28, 2019 13:58

Merge remote-tracking branch 'upstream/master' into fix_roc_auc_scorer

47350a8

BUG Makes roc_auc_score depend on predict_proba

0add9af

ogrisel approved these changes Oct 29, 2019

View reviewed changes

sklearn/metrics/tests/test_score_objects.py Show resolved Hide resolved

sklearn/metrics/tests/test_score_objects.py Outdated Show resolved Hide resolved

thomasjpfan added 2 commits October 29, 2019 14:21

DOC Adds whats new

c961ea3

Merge remote-tracking branch 'upstream/master' into fix_roc_auc_scorer

6c90324

ogrisel approved these changes Oct 30, 2019

View reviewed changes

jnothman approved these changes Oct 30, 2019

View reviewed changes

DOC Move to one entry

86903f5

Merge remote-tracking branch 'upstream/master' into fix_roc_auc_scorer

3c27e9b

TST Checks for not multiclass

043821f

qinhanmin2014 reviewed Nov 1, 2019

View reviewed changes

qinhanmin2014 approved these changes Nov 2, 2019

View reviewed changes

qinhanmin2014 merged commit 96c1a5b into scikit-learn:master Nov 2, 2019

larsoner mentioned this pull request Jan 29, 2020

MRG, ENH: Allow using multiclass roc_auc mne-tools/mne-python#7264

Merged

thomasjpfan mentioned this pull request Sep 9, 2020

FIX select the probability estimates or transform the decision values when pos_label is provided #18114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG Fixes error with multiclass roc auc scorer #15274

BUG Fixes error with multiclass roc auc scorer #15274

thomasjpfan commented Oct 16, 2019

qinhanmin2014 Oct 28, 2019

ogrisel Oct 28, 2019

qinhanmin2014 Oct 29, 2019

ogrisel commented Oct 29, 2019

ogrisel commented Oct 29, 2019

ogrisel left a comment

ogrisel left a comment

ogrisel commented Oct 30, 2019

jnothman Oct 30, 2019

ogrisel Oct 30, 2019

thomasjpfan commented Oct 30, 2019

qinhanmin2014 commented Oct 31, 2019

qinhanmin2014 commented Oct 31, 2019

thomasjpfan commented Oct 31, 2019 •

edited

Loading

thomasjpfan commented Oct 31, 2019

qinhanmin2014 commented Nov 1, 2019

thomasjpfan commented Nov 1, 2019 •

edited

Loading

qinhanmin2014 left a comment

qinhanmin2014 Nov 1, 2019

thomasjpfan Nov 1, 2019

qinhanmin2014 Nov 1, 2019 •

edited

Loading

thomasjpfan Nov 1, 2019

qinhanmin2014 commented Nov 2, 2019

BUG Fixes error with multiclass roc auc scorer #15274

BUG Fixes error with multiclass roc auc scorer #15274

Conversation

thomasjpfan commented Oct 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel commented Oct 29, 2019

ogrisel commented Oct 29, 2019

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel commented Oct 30, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomasjpfan commented Oct 30, 2019

qinhanmin2014 commented Oct 31, 2019

qinhanmin2014 commented Oct 31, 2019

thomasjpfan commented Oct 31, 2019 • edited Loading

thomasjpfan commented Oct 31, 2019

qinhanmin2014 commented Nov 1, 2019

thomasjpfan commented Nov 1, 2019 • edited Loading

qinhanmin2014 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qinhanmin2014 Nov 1, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qinhanmin2014 commented Nov 2, 2019

thomasjpfan commented Oct 31, 2019 •

edited

Loading

thomasjpfan commented Nov 1, 2019 •

edited

Loading

qinhanmin2014 Nov 1, 2019 •

edited

Loading