Skip to content

ENH add a parameter pos_label in roc_auc_score #17594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 24, 2020

Conversation

glemaitre
Copy link
Member

closes #17572

Add a parameter pos_label to be able to specify the positive class in the case of binary classification.

@glemaitre
Copy link
Member Author

Arfff, this is breaking the backward compatibility. I would consider the current behaviour as a bug because the way the positive class is inferred is really opaque. Here, it depends on the result of np.unique call on the label. When labels are given as str, it seems better to just ask for the pos_label to remove any ambiguity.

Any thoughts @ogrisel @thomasjpfan @amueller

@ogrisel
Copy link
Member

ogrisel commented Jun 24, 2020

Ooops, I realised I merged the wrong PR. I wanted to merge #17569 instead but I was on the wrong tab. Let me review this a posteriori...

@ogrisel
Copy link
Member

ogrisel commented Jun 24, 2020

Arfff, this is breaking the backward compatibility. I would consider the current behaviour as a bug because the way the positive class is inferred is really opaque. Here, it depends on the result of np.unique call on the label. When labels are given as str, it seems better to just ask for the pos_label to remove any ambiguity.

I agree we should ask for an explicit pos_label in this case. Something that is still unclear is the interaction with the use as a scorer in a grid search. Let me give it a closer look.

@glemaitre
Copy link
Member Author

Ooops, I realised I merged the wrong PR. I wanted to merge #17569 instead but I was on the wrong tab. Let me review this a posteriori...

This is good strategy to get PR merged ;)

@ogrisel
Copy link
Member

ogrisel commented Jun 24, 2020

I am afraid we will have to revert this bug fix because it breaks the ability to run a GridSearchCV with scoring="roc_auc" and string class labels.

ValueError: y_true takes value in {'cancer', 'not cancer'} and pos_label is not specified: either make y_true take value in {0, 1} or {-1, 1} or pass pos_label explicitly.

when running this script with this PR:

https://gist.github.com/ogrisel/bfadf2dffef144d401b6e7c52d744219

This used to work before so this would be a regression for the model selection use case.

@thomasjpfan
Copy link
Member

I think we need pos_label=None to be a little more magical to handle the string case. (We can not use 'auto' because it is a string)

@glemaitre
Copy link
Member Author

Fair enough. So we revert the change and I reopen a PR. We should probably add a test for the case mentioned earlier

glemaitre pushed a commit that referenced this pull request Jun 24, 2020
@ogrisel
Copy link
Member

ogrisel commented Jun 24, 2020

I think we need pos_label=None to be a little more magical to handle the string case. (We can not use 'auto' because it is a string)

But I am not so sure this is a good solution in general. For ROC AUC swapping the positive and negative classes is fine as the score is symmetric, but this is not the case for other binary classification metrics such as average precision on imbalanced data for instance.

Here is what I mean by "symmetric":

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, average_precision_score


X, y = load_breast_cancer(return_X_y=True)
idx_positive = np.flatnonzero(y == 1)
idx_negative = np.flatnonzero(y == 0)
idx_selected = np.hstack([idx_negative, idx_positive[:25]])
X, y = X[idx_selected], y[idx_selected]
X, y = shuffle(X, y, random_state=42)
X = X[:, :2]
y = np.array(
    ["cancer" if c == 1 else "not cancer" for c in y], dtype=object
)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, random_state=0,
)

classifier = LogisticRegression()
classifier.fit(X_train, y_train)
assert classifier.classes_.tolist() == ["cancer", "not cancer"]

y_pred = classifier.predict_proba(X_test)


def compute_score(classifier, score_func, X_test, y_test, pos_label):
    pos_idx = classifier.classes_.tolist().index(pos_label)
    score = score_func(y_test, y_pred[:, pos_idx], pos_label=pos_label)
    print(f"{score_func.__name__} with {pos_label=}: {score=:.4f}")


compute_score(classifier, roc_auc_score, X_test, y_test, "cancer")
compute_score(classifier, roc_auc_score, X_test, y_test, "not cancer")

compute_score(classifier, average_precision_score, X_test, y_test, "cancer")
compute_score(classifier, average_precision_score, X_test, y_test, "not cancer")

output:

roc_auc_score with pos_label='cancer': score=0.9568
roc_auc_score with pos_label='not cancer': score=0.9568
average_precision_score with pos_label='cancer': score=0.6339
average_precision_score with pos_label='not cancer': score=0.9953

@ogrisel
Copy link
Member

ogrisel commented Jun 24, 2020

So I believe the only option would be to make it possible to pass pos_label also to GridSearchCV & co (any model that accepts a scoring argument).

Furthermore, we should find a way to let GridSearchCV & co know when classification scorers are not symmetric so as to raise a ValueError when pos_label is not passed explicitly while keeping some kind of auto behavior for the symmetric ones such as roc_auc_score.

rubywerman pushed a commit to MLH-Fellowship/scikit-learn that referenced this pull request Jun 24, 2020
rubywerman pushed a commit to MLH-Fellowship/scikit-learn that referenced this pull request Jun 24, 2020
@thomasjpfan
Copy link
Member

But I am not so sure this is a good solution in general. For ROC AUC swapping the positive and negative classes is fine as the score is symmetric, but this is not the case for other binary classification metrics such as average precision on imbalanced data for instance.

Good point

So I believe the only option would be to make it possible to pass pos_label also to GridSearchCV & co (any model that accepts a scoring argument).

Can this be extended to passing a "scorer_kwargs" that gets passed down to the metric? Concretely:

gs = GridSearchCV(..., scorer_kwargs={'pos_label': 'cancer'}, 
                  scorer='average_precision_scorer')

@jnothman
Copy link
Member

jnothman commented Jun 24, 2020 via email

@jnothman
Copy link
Member

jnothman commented Jun 24, 2020 via email

@glemaitre
Copy link
Member Author

glemaitre commented Jun 25, 2020

And I've previously noted that the symmetry of roc_auc means that the
default heuristic is fine for scoring with it without specifying pos_label.

Yes so I have to revisit my initial example.

Oh you mean in the grid-search probably.

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue in roc_auc_score which make wrong assumption of positive class
4 participants