Improve error message with invalid pos_label in plot_roc_curve and implicit pos_label in precision_recall_curve #15405

ogrisel · 2019-10-30T14:16:19Z

~~This is a potential fix for the issue raised by @amueller in #15303.~~

Edit: this PR now aims to improve the error messages in cases of invalid inputs. The actual fix for #15303 for plot_roc_curve is to be implemented in #15316.

…curve

ogrisel · 2019-10-30T14:20:33Z

Note that only raises a ValueError when calling _binary_clf_curve via plot_roc_curve.

Ideally I would also like to have roc_auc_score raise a ValueError when y_true has string labels and pos_label is not provided by the caller (left to None):

diff --git a/sklearn/metrics/tests/test_ranking.py b/sklearn/metrics/tests/test_ranking.py
index 60ab3784a..f8c0ead08 100644
--- a/sklearn/metrics/tests/test_ranking.py
+++ b/sklearn/metrics/tests/test_ranking.py
@@ -662,6 +662,14 @@ def test_auc_score_non_binary_class():
             roc_auc_score(y_true, y_pred)
 
 
+def test_roc_auc_score_implicit_pos_label():
+    y_true = ["a", "b", "a"]
+    y_pred = [0.2, 0.3, 0.1]
+    err_msg = "42"
+    with pytest.raises(ValueError, match=err_msg):
+        roc_auc_score(y_true, y_pred)
+
+
 def test_binary_clf_curve():
     rng = check_random_state(404)
     y_true = rng.randint(0, 3, size=10)

However at the moment roc_auc_score is doing automated label encoding internally which hides the problem.

If we decide that the fix for plot_roc_curve in this PR is valid, then we should open a sister issue for roc_auc_score but this one will require a FutureWarning and actually implement the change in 2 releases.

sklearn/metrics/_plot/roc_curve.py

thomasjpfan · 2019-10-30T23:30:03Z

sklearn/metrics/_ranking.py

@@ -532,7 +532,10 @@ def _binary_clf_curve(y_true, y_score, pos_label=None, sample_weight=None):
             np.array_equal(classes, [0]) or
             np.array_equal(classes, [-1]) or
             np.array_equal(classes, [1]))):
-        raise ValueError("Data is not binary and pos_label is not specified")
+        raise ValueError("y_true takes value in {classes} and pos_label is "


Do we need something like the following?

if (pos_label is None and _determine_key_type(classes) == 'str' ...

Thanks indeed, I updated my PR based on your suggestion and I could get rid of the remaining numpy FutureWarning in array_equal.

qinhanmin2014 · 2019-10-31T02:34:50Z

I think Andy wants to infer pos_label automatically? I also prefer to infer pos_label since we accept a trained estimator here. (seems that Thomas also agree with Andy?)
We can raise an error when pos_label is not in estimator.classes_ but I guess that's not a blocker.

qinhanmin2014 · 2019-10-31T02:39:15Z

Ideally I would also like to have roc_auc_score raise a ValueError when y_true has string labels and pos_label is not provided by the caller (left to None):

There's a PR but we finally decided to infer pos_label automatically. I'm involved in that decision but now I feel regretful. I think it's better to introduce pos_label parameter in roc_auc_score (which is consistent with roc_auc_score)

thomasjpfan · 2019-10-31T02:42:07Z

I can go either way as long as we are consistent. I agree with @ogrisel that with string targets pos_label should be set and not automatically inferred.

sklearn/metrics/_plot/roc_curve.py

qinhanmin2014 · 2019-10-31T02:42:41Z

sklearn/metrics/_ranking.py

@@ -532,7 +532,10 @@ def _binary_clf_curve(y_true, y_score, pos_label=None, sample_weight=None):
             np.array_equal(classes, [0]) or
             np.array_equal(classes, [-1]) or
             np.array_equal(classes, [1]))):
-        raise ValueError("Data is not binary and pos_label is not specified")
+        raise ValueError("y_true takes value in {classes} and pos_label is "


I agree that the error message here need to be updated.

qinhanmin2014 · 2019-10-31T02:51:28Z

I can go either way as long as we are consistent. I agree with @ogrisel that with string targets pos_label should be set and not automatically inferred.

This will actually close Andy's issue?

qinhanmin2014 · 2019-10-31T02:59:11Z

I think the best solution is to deprecate pos_label=None and use pos_label=1 as the default (this is consistent with P/R/F, jaccard, average_precision_score, we need to do deprecation in brier_score_loss, roc_auc_curve, precision_recall_curve and introduce pos_label in roc_auc_score)
See my comment in #10010
We don't have a correct implementation of pos_label=None in scikit-learn currently. The meaning of pos_label=None is hard to understand by user and hard to define. If there are two labels, ok, we pick the greater one. How about the situation if there is only one label? Pick that one? I'm afraid it is not always the case ,e.g., when there's only negative samples.

jnothman · 2019-10-31T06:07:20Z

Ideally I would also like to have roc_auc_score raise a ValueError when y_true has string labels and pos_label is not provided by the caller

Just to remind that if this change, the 'roc_auc' scorer still needs to work in binary cases with string labels. The choice of positive class will not affect the result.

ogrisel · 2019-11-04T18:34:40Z

The choice of positive class will not affect the result.

I am not sure I understand. If you have y_true with values in {"red", "blue"}, roc_auc_score(y_true, y_pred) will have different values if either "blue" or "red" are considered positive. "red-is-positive" roc auc score should be equal to (1 - "blue-is-positive" roc auc score).

ogrisel · 2019-11-04T18:41:48Z

The fact that you get different scores with different string labels encoding of fundamentally the same data should be considered a bug:

>>> from sklearn.metrics import roc_auc_score                                                                                                                                   
>>> import numpy as np                                                                                                                                                          
>>> y_true = np.random.choice(["red", "blue"], size=100) 
... y_true2 = np.where(y_true == "red", "0 red", "1 blue") 
... y_pred = np.random.randn(100) 
... print(roc_auc_score(y_true, y_pred)) 
... print(roc_auc_score(y_true2, y_pred)) 
...                                                                                                                                                                             
0.5466893039049237
0.4533106960950764

jnothman · 2019-11-04T21:01:54Z

I mean that you can train and test a scikit-learn estimator with whichever labelling of binary classes, and using the roc scorer you will get the same result. This is not true of precision, recall, etc.

jnothman · 2019-11-05T00:47:08Z

I'm fine with this solution in plot_* in any case. I'll try review later.

jnothman

Tests erroring on some platforms btw, @ogrisel

ogrisel · 2019-11-06T08:57:15Z

I mean that you can train and test a scikit-learn estimator with whichever labeling of binary classes, and using the roc scorer you will get the same result. This is not true of precision, recall, etc.

It's true that when we use a scorer on a sklearn classifier, the implicitly inferred positive class will be consistently the same for the estimator and the scorer (the label encoder used internally in the classifier will yield consistent results with the one used internally in roc_auc_score). But this is only thanks to implicit conventions.

Tests erroring on some platforms btw, @ogrisel

I will have a look into it.

assert_equal on str elements with numerical elements.

Co-Authored-By: Thomas J Fan <thomasjpfan@gmail.com>

ogrisel · 2019-11-06T11:04:57Z

Actually I think I am changing my mind, we should probably implement #15316 and just document the positive class inference in case the user passes strings.

… plot_roc_curve

…rror-message

ogrisel · 2019-11-06T12:34:56Z

sklearn/metrics/tests/test_ranking.py

+    msg = ("make y_true take integer value in {0, 1} or {-1, 1}"
+           " or pass pos_label explicitly.")
+    with pytest.raises(ValueError, match=msg):
+        precision_recall_curve(y_true, y_pred)


@qinhanmin2014 I have changed the test in this PR to trigger the exception in _binary_clf_curve from precision_recall_curve rather then plot_roc_curve so that we can merge both #15316 while also improving the error message.

This PR is still useful to:

raise a meaningful error message when the user passes an invalid pos_label value;

improve the error message when passing string labels without passing an explicit pos_label for function that do not automatically label encode string labels (e.g. precision_recall_curve);

have more tests.

qinhanmin2014 · 2019-11-06T15:00:11Z

The fact that you get different scores with different string labels encoding of fundamentally the same data should be considered a bug:

I agree. I prefer to introduce pos_label and set the default to 1.

ogrisel · 2019-11-06T15:18:15Z

I agree. I prefer to introduce pos_label and set the default to 1.

But I changed my mind in the context of plot_roc_curve, because in this context, the estimator already makes this implicit assumption (last class label in alphabetical order is positive) internally. So we can just introspect it and stay consistent with it.

ogrisel · 2019-11-06T16:05:03Z

I prefer to introduce pos_label and set the default to 1.

Actually, in the context of the plot_roc_curve function, I don't see the point in letting the user pass a pos_label that is different from estimator.classes_[1] when estimator.classes_.shape == (2,): for all scikit-learn binary classifiers, estimator.classes_[1] is always the positive class.

Furthermore setting the default of pos_label to 1 would be problematic in case y_train was [1, 2] for instance:

>>> from sklearn.linear_model import LogisticRegression                                                                                                                         
>>> clf = LogisticRegression().fit([[0], [1]], [1, 2])                                                                                                                          
>>> clf.classes_                                                                                                                                                                
array([1, 2])

For the classifier, the positive class is 2, not 1. Plotting the ROC curve with pos_label=1 would be meaningless: it would be significantly below the random/constant baseline model represented by the diagonal curve.

amueller · 2019-11-06T20:16:06Z

For the classifier, the positive class is 2, not 1. Plotting the ROC curve with pos_label=1 would be meaningless: it would be significantly below the random/constant baseline model represented by the diagonal curve.

I think I (and possibly others) were a bit confused here. It would make sense to treat 1 as the positive class, but setting pos_label=1 would not achieve that. So indeed, pos_label can be perfectly inferred from the estimator with the current semantics.

However, we could provide an interface that allows the user to specify which of the classes is the semantically positive class, and plot_roc_curve would take the appropriate column from predict_proba.

Basically: if you only apply pos_label to roc_curve and always use predict_proba(X)[:, 1] then there is only one obvious choice.
But if you allow slicing the predict_proba then any of the classes make sense.

amueller · 2019-11-06T20:18:28Z

so if we want a quick fix, we can remove the pos_label attribute and infer it from the estimator. If we want to allow the user to pick the pos_label in a meaningful way, we need to do slightly more work. I'm not sure if still calling that pos_label would be useful, though...

ogrisel · 2019-11-07T11:13:14Z

This PR still had useful improvements not directly related to the pos_label option of plot_roc_curve. Let me re-open another PR.

qinhanmin2014 · 2019-11-07T13:13:05Z

This PR still had useful improvements not directly related to the pos_label option of plot_roc_curve. Let me re-open another PR.

So numpy will raise FutureWarning for things like np.array_equal(["a", "b"], [1, 2])? That's strange.
Yes, please open another PR or modify this one directly.

ogrisel · 2019-11-07T13:17:22Z

Indeed:

>>> import numpy as np                                                                                                                                                          
>>> np.array_equal(["a", "b"], [1, 2])                                                                                                                                          
/home/ogrisel/miniconda3/envs/pip/lib/python3.7/site-packages/numpy/core/numeric.py:2339: 
FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  return bool(asarray(a1 == a2).all())

ogrisel · 2019-11-07T14:18:25Z

I extracted the good part and rebased on top of current master in #15562.

Improve error message with implicit or invalid pos_label in plot_roc_…

2636423

…curve

ogrisel added this to the 0.22 milestone Oct 30, 2019

ogrisel requested review from thomasjpfan, amueller and qinhanmin2014 October 30, 2019 14:21

ogrisel added the Blocker label Oct 30, 2019

This was referenced Oct 30, 2019

plot_roc_curve doesn't correctly infer pos_label #15303

Closed

FIX Infer pos_label automatically in plot_roc_curve #15316

Closed

ogrisel added 3 commits October 30, 2019 15:46

Fix elementwise comparison warning

d829fa0

pep8

ccb42bf

Missing pyplot fixture in test

93d497c

thomasjpfan reviewed Oct 30, 2019

View reviewed changes

qinhanmin2014 reviewed Oct 31, 2019

View reviewed changes

jnothman reviewed Nov 5, 2019

View reviewed changes

ogrisel and others added 2 commits November 6, 2019 11:43

Fix numpy FutureWarning

0a74233

assert_equal on str elements with numerical elements.

Update sklearn/metrics/_plot/roc_curve.py

abf3cd7

Co-Authored-By: Thomas J Fan <thomasjpfan@gmail.com>

test implicit pos_label message for precision_recall_curve instead of…

0a0b9f5

… plot_roc_curve

Merge remote-tracking branch 'origin/master' into fix-roc-pos-label-e…

bbbb2aa

…rror-message

ogrisel commented Nov 6, 2019

View reviewed changes

ogrisel changed the title ~~Improve error message with implicit or invalid pos_label in plot_roc_curve~~ Improve error message with invalid pos_label in plot_roc_curve and implicit pos_label in precision_recall_curve Nov 6, 2019

Simplify check for invalid pos_label

4bbceb6

amueller mentioned this pull request Nov 6, 2019

[MRG] Adds plot_precision_recall_curve #14936

Merged

thomasjpfan mentioned this pull request Nov 6, 2019

[MRG] BUG Remove pos_label in plot_roc_auc_curve #15555

Merged

qinhanmin2014 closed this in #15555 Nov 7, 2019

ogrisel mentioned this pull request Nov 7, 2019

[MRG] Improve error message with implicit pos_label in _binary_clf_curve #15562

Merged

ogrisel deleted the fix-roc-pos-label-error-message branch November 7, 2019 14:18

amueller mentioned this pull request Nov 8, 2019

plot_precision_recall_curve and plot_roc_curve don't allow picking positive class #15573

Closed

zhiqiangdon mentioned this pull request May 24, 2022

Improve AutoMM's binary classification's metrics by supporting customizing positive class autogluon/autogluon#1753

Merged

Uh oh!

Improve error message with invalid pos_label in plot_roc_curve and implicit pos_label in precision_recall_curve #15405

Improve error message with invalid pos_label in plot_roc_curve and implicit pos_label in precision_recall_curve #15405

Uh oh!

Conversation

ogrisel commented Oct 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Oct 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

thomasjpfan Oct 30, 2019

Choose a reason for hiding this comment

Uh oh!

ogrisel Nov 6, 2019

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Oct 31, 2019

Uh oh!

qinhanmin2014 commented Oct 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan commented Oct 31, 2019

Uh oh!

Uh oh!

qinhanmin2014 Oct 31, 2019

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Oct 31, 2019

Uh oh!

qinhanmin2014 commented Oct 31, 2019

Uh oh!

jnothman commented Oct 31, 2019

Uh oh!

ogrisel commented Nov 4, 2019

Uh oh!

ogrisel commented Nov 4, 2019

Uh oh!

jnothman commented Nov 4, 2019

Uh oh!

jnothman commented Nov 5, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Nov 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Nov 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel Nov 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Nov 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Nov 6, 2019

Uh oh!

ogrisel commented Nov 6, 2019

Uh oh!

ogrisel commented Nov 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Nov 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Nov 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Nov 7, 2019

Uh oh!

qinhanmin2014 commented Nov 7, 2019

Uh oh!

ogrisel commented Nov 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Oct 30, 2019 •

edited

Loading

ogrisel commented Oct 30, 2019 •

edited

Loading

qinhanmin2014 commented Oct 31, 2019 •

edited

Loading

ogrisel commented Nov 6, 2019 •

edited

Loading

ogrisel commented Nov 6, 2019 •

edited

Loading

ogrisel Nov 6, 2019 •

edited

Loading

ogrisel Nov 6, 2019 •

edited

Loading

ogrisel commented Nov 6, 2019 •

edited

Loading

amueller commented Nov 6, 2019 •

edited

Loading

amueller commented Nov 6, 2019 •

edited

Loading

ogrisel commented Nov 7, 2019 •

edited

Loading