ENH add a parameter pos_label in roc_auc_score #17704

glemaitre · 2020-06-24T15:06:32Z

closes #17572

Add a parameter pos_label to be able to specify the positive class in the case of binary classification.

We should make handle the usecase when GridSearchCV is used together with roc_auc.

glemaitre · 2020-07-17T11:02:12Z

So here is a proposal that handles roc_auc in the grid-search as well.

glemaitre · 2020-07-17T11:02:28Z

ping @thomasjpfan @ogrisel @jnothman

glemaitre · 2020-07-17T11:05:47Z

sklearn/metrics/_scorer.py

@@ -296,6 +302,13 @@ def _score(self, method_caller, clf, X, y, sample_weight=None):
            y_pred = method_caller(clf, "predict", X)
        else:
            try:
+                if (
+                    y_type == "binary"


So here, we could have a ScorerProperty defining that the score is symmetric and require pos_label instead of hard coding roc_auc_score

thomasjpfan

Thank you for working on this @glemaitre !

sklearn/metrics/_scorer.py

glemaitre · 2020-08-05T07:15:18Z

Firstly: I remain unconvinced that there is a problem with users getting
incorrect roc auc results when using the standard scorer: our convention is
clearly to encode probabilities to match classes_, which should be sorted.
I think that the need to allow for pos_label in roc_auc_score, where we do
not explicitly require the input to come from a scikit-learn-compatible
classifier is reasonable and separate.

@jnothman I agree with your argument but there is still something to solve here

    import numpy as np
    from sklearn.datasets import load_breast_cancer
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import train_test_split
    from sklearn.utils import shuffle
    from sklearn.metrics import roc_auc_score

    X, y = load_breast_cancer(return_X_y=True)
    # create an highly imbalanced
    idx_positive = np.flatnonzero(y == 1)
    idx_negative = np.flatnonzero(y == 0)
    idx_selected = np.hstack([idx_negative, idx_positive[:25]])
    X, y = X[idx_selected], y[idx_selected]
    X, y = shuffle(X, y, random_state=42)
    # only use 2 features to make the problem even harder
    X = X[:, :2]
    y = np.array(
        ["cancer" if c == 1 else "not cancer" for c in y], dtype=object
    )
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, stratify=y, random_state=0,
    )

    classifier = LogisticRegression()
    classifier.fit(X_train, y_train)

    # sanity check to be sure the positive class is classes_[0] and that we
    # are betrayed by the class imbalance
    assert classifier.classes_.tolist() == ["cancer", "not cancer"]
    
    y_pred = classifier.predict_proba(X_test)
    y_pred_pos = y_pred[:, 0]
    roc_auc_score(y_test, y_pred_pos)

So here the usage is fine but the result is incorrect. In this case, the issue is coming from the wrong assumption done in the underlying roc_curve call.

Basically, we pass y_pred which is column #0 and y_test will be encoded in the same order than the column of y_pred.
The AUC will be computed using roc_curve which will be equivalent to roc_curve(y_test, y_pred_pos). However, the assumption here is that when pos_label is not given, 1 will be the positive class while it is not due to our encoding.
roc_curve(y_test, y_pred_pos, pos_label=0) would lead to the right result.

So here, I am not sure how we can solve the problem without introducing a pos_label parameter: basically, since our user already gave us the positive column, we have no clue how to find this information if it is not provided?

jnothman · 2020-08-05T08:53:24Z

So here, I am not sure how we can solve the problem without introducing a pos_label parameter: basically, since our user already gave us the positive column, we have no clue how to find this information if it is not provided?

I am fine with adding pos_label to roc_auc_score and roc_curve. But that doesn't require modifying the scorer.

glemaitre · 2020-08-05T09:28:17Z

I am fine with adding pos_label to roc_auc_score and roc_curve.

Basically roc_curve already have a pos_label so it is just pipelining the pos_label from roc_auc_score to roc_curve.

But that doesn't require modifying the scorer.

It is where it becomes tricky. It will involve a regression as we saw there: #17594
roc_auc_scorer (returned when using scoring='roc_auc') will not have any pos_label.
Therefore, the above example will fail if the classifier is used within a GridSearchCV.

As mentioned we have 2 solutions:

The user pass a make_score(roc_auc_score, pos_label="cancer") but we still don't support scoring='roc_auc';
Or, since the score is symmetric, we can modify _ThresholdScorer to make the appropriate column selection of y_pred depending on the given y_true:
- we add the pos_label (with the mutable aspects discussed in the other PR) or
- we need to encode y_true in _ThresholdScorer. However, this encoding should be done only for scorer with a symmetric scoring function. Introducing something like this for f1_score would create a bug. So here we need to have some scoring property?

glemaitre · 2020-08-05T16:57:39Z

pinging @adrinjalali since this could be also nice to have your thoughts.

thomasjpfan · 2020-08-05T17:44:25Z

sklearn/metrics/_scorer.py

+                            self._score_func.__name__ == "roc_auc_score"
+                            and "pos_label" not in self._kwargs


we add the pos_label (with the mutable aspects discussed in the other PR)

I am okay with this with a symmetric property to _BaseScorer that defaults to False. This way, we can be generic and not depend on the name of the score function.

amueller · 2020-08-05T20:19:15Z

My understanding is that pos_label is not the semantics of the problem, but the semantics of the predict_proba/decision function that you passed. If you have some classes, 'a' and 'b' and a decision function, then pos_label='a' means that high values of the decision function are supposed to correspond to the class 'a' being likely.
The pos_label parameter was introduced so we don't need to make assumptions about the order of classes. It has nothing to do with which class you consider semantically positive, it tells you which string label corresponds to high values in the classifier, i.e. it tells you the order of entries in classes_. That is in particularly required if your test set only has one label, so you don't know whether it's the positive or the negative one. This was the original motivation for pos_label.

There is currently no way to define the positive class in roc_auc_score, except by changing y to y == pos_label before training the model, which is actually the easiest fix that will make everything work, in particular with scorers.

We could add an argument that allows you to specify the semantically positive class, but it should definitely not be called pos_label which already has this different meaning.

The code in your comment just has a bug, you always need to slice the first dimension to use roc_auc_curve and sklearn has no way to what you'd like to do directly.

glemaitre · 2020-08-05T21:16:33Z

OK, so I got a couple of things wrong then.

The code in your comment just has a bug, you always need to slice the first dimension to use roc_auc_curve and sklearn has no way to what you'd like to do directly.

So I assume that you mean that I should have done:

roc_auc_score(y_test, y_pred[:, 1])

To be honest, I find this really confusing. It is true that it is not mentioned in the documentation to pass the probability of the positive class but it is far to be clear what to slice indeed:

In the binary and multilabel cases, these can be either probability estimates or non-thresholded decision values (as returned by decision_function on some classifiers).

I think that I was even more confused since that average_precision_score will explicitly require the probability of the positive class. Would it make sense that we have something consistent regarding what to select and make the score work expecting that one gives the positive class always?

jnothman · 2020-08-05T21:21:19Z

In precision, recall, etc, pos_label references the semantically positive class, or the class of interest; it must, since there's no probabilistic output to correspond with. Here, I agree, it indicates the correspondence between the categorical and continuous representations. Since this issue pertains only to "thresholded" classification scorers, we can certainly extract the positive class from classes_ and pass it to pos_label of the metric, as long as we can identify that the metric accepts pos_label. I don't think that depends on symmetry, except insofar as for non-symmetric thresholded scores, you might want to allow the user to specify the "semantically positive class".

ogrisel · 2020-08-06T09:33:45Z

We face a related problem for the calibration error I believe: #11096.

glemaitre · 2020-08-06T10:14:47Z

I don't think that depends on symmetry, except insofar as for
non-symmetric thresholded scores, you might want to allow the user to
specify the "semantically positive class".

I am confused here. I think that a concrete example would help.
From what I see, only the average precision and the ROC are metrics used to create "thresholded" scorer.
If I understand well your comment, the average precision scorer would be a non-symmetric thresholded scorer and you actually need to pass pos_label.

I wrote the following tests: https://github.com/scikit-learn/scikit-learn/pull/18107/files#diff-fcdae0622eeb4bf500b43048996b2af5R774-R828
to illustrate the behaviour that I would expect. However, for the ROC I am using the positive class while @amueller mentioned earlier that one should use y_pred[:, 1] in all cases.

glemaitre · 2020-08-06T10:18:55Z

Oh now I see that this is really written in the documentation

The binary case expects a shape (n_samples,), and the scores must be the scores of the class with the greater label.

It should be in bold :)

amueller · 2020-08-06T14:52:57Z

Of course @jnothman is right, and actually both meanings of pos_label seem to be currently present in the code-base.
And I agree, that is very confusing. I commented in #18101.

glemaitre · 2020-08-06T17:00:41Z

OK, so it seems that I figure out some of the stuffs. I will close all my PRs and open the following:

Improve the documentation of the roc_auc_score. There is actually no bug there but the documentation could be more explicit. (We might rediscuss about the semantic of y_score but it would require much more work and API changes);
Solve the issue in the Scorer classes to take into account the pos_label when it is passed to make_scorer;
Improve the documentation regarding the last point.

glemaitre added 3 commits June 15, 2020 10:11

ENH add a parameter pos_label in roc_auc_score

db390c2

add documentation

0e2937b

TST pass pos_label with str in common test

7f4fa45

github-actions bot added the module:metrics label Jun 24, 2020

glemaitre added 6 commits July 17, 2020 11:41

ENH add a parameter pos_label in roc_auc_score

1852d50

add documentation

62efb2d

Merge commit '7f4fa45' into is/17572_bis

0648e0b

add versionadded

2746252

add test with grid-search

fca877c

iter

28cb7c8

glemaitre force-pushed the is/17572 branch from c05d13b to 28cb7c8 Compare July 17, 2020 09:45

glemaitre added 6 commits July 17, 2020 11:47

iter

419c300

PEP8

67c7a3e

add link to issue

8abad1c

TST make sure that pos_label is computing the right thing

113034f

iter

ebbbd84

PEP8

89eff66

glemaitre commented Jul 17, 2020

View reviewed changes

glemaitre added 3 commits July 17, 2020 13:54

need to make a deepcopy

4ebeb87

iter

d94f1fa

iter

33dfd93

thomasjpfan reviewed Jul 17, 2020

View reviewed changes

sklearn/metrics/_scorer.py Show resolved Hide resolved

sklearn/metrics/_scorer.py Show resolved Hide resolved

glemaitre mentioned this pull request Aug 4, 2020

EHN allow scorers to set addtional parameter of scoring function #17962

Closed

thomasjpfan reviewed Aug 5, 2020

View reviewed changes

thomasjpfan mentioned this pull request Aug 5, 2020

pos_label in PrecisionRecallDisplay and RocCurveDisplay #18101

Open

glemaitre mentioned this pull request Aug 6, 2020

FIX make it possible to specify the positive label in roc_auc_score #18107

Closed

ogrisel mentioned this pull request Aug 6, 2020

[MRG] Implement calibration loss metrics #11096

Open

amueller mentioned this pull request Aug 6, 2020

What should the metrics API look like? fairlearn/fairlearn-proposals#12

Open

glemaitre closed this Aug 6, 2020

		self._score_func.__name__ == "roc_auc_score"
		and "pos_label" not in self._kwargs

Uh oh!

ENH add a parameter pos_label in roc_auc_score #17704

ENH add a parameter pos_label in roc_auc_score #17704

Uh oh!

Conversation

glemaitre commented Jun 24, 2020

Uh oh!

glemaitre commented Jul 17, 2020

Uh oh!

glemaitre commented Jul 17, 2020

Uh oh!

glemaitre Jul 17, 2020

Choose a reason for hiding this comment

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Aug 5, 2020

Uh oh!

glemaitre commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Aug 5, 2020

Uh oh!

thomasjpfan Aug 5, 2020

Choose a reason for hiding this comment

Uh oh!

amueller commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Aug 5, 2020

Uh oh!

jnothman commented Aug 5, 2020 via email

Uh oh!

ogrisel commented Aug 6, 2020

Uh oh!

glemaitre commented Aug 6, 2020

Uh oh!

glemaitre commented Aug 6, 2020

Uh oh!

amueller commented Aug 6, 2020

Uh oh!

glemaitre commented Aug 6, 2020

Uh oh!

Uh oh!

glemaitre commented Aug 5, 2020 •

edited

Loading

glemaitre commented Aug 5, 2020 •

edited

Loading

amueller commented Aug 5, 2020 •

edited

Loading