Skip to content

FIX Infer pos_label automatically in plot_roc_curve #15316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

qinhanmin2014
Copy link
Member

Fixes #15303
I think we should drop parameter pos_label in plot_roc_curve, because we do not have parameter pos_label in roc_auc_score. In roc_auc_score, for binary y_true, y_score is supposed to be the score of the class with greater label.

@qinhanmin2014 qinhanmin2014 added this to the 0.22 milestone Oct 21, 2019
@thomasjpfan thomasjpfan self-requested a review October 21, 2019 15:05
@jnothman
Copy link
Member

I think we should drop parameter pos_label in plot_roc_curve, because we do not have parameter pos_label in roc_auc_score.

The reason we lack the parameter in roc_auc_score is that the score is symmetric with respect to the choice of the positive label. roc_auc_score(y_true, score) == roc_auc_score(y_true, -score) for binary y_true. But this same operation tranposes the plotted curve, so for plotting the choice of pos_label is significant.

@qinhanmin2014
Copy link
Member Author

The reason we lack the parameter in roc_auc_score is that the score is symmetric with respect to the choice of the positive label. roc_auc_score(y_true, score) == roc_auc_score(y_true, -score) for binary y_true

Sorry but I can't understand this part, should be roc_auc_score(y_true, score) == 1 - roc_auc_score(y_true, -score)?

@qinhanmin2014
Copy link
Member Author

But this same operation tranposes the plotted curve, so for plotting the choice of pos_label is significant.

Well we can keep pos_label parameter and define pos_label=None as the greater label (though this will be inconsistent with roc_auc_curve), but I can't understand why the choice of pos_label is significant here, the roc_auc_score of a model can be less than 0.5?

@jnothman
Copy link
Member

jnothman commented Oct 23, 2019 via email

@qinhanmin2014
Copy link
Member Author

Yes, you're right... now trying to remember which similar lemma is the reason we don't need pos_label in roc_auc_score. I know I've justified it to myself before!

Yes the decision was (mainly) made by you (#2723 (comment), #2723 (comment)) and the PR was submitted by me. Honestly I'm still unable to understand the reason. (Please kindly forgive me.)

Perhaps it's better to drop pos_label parameter here, because in roc_auc_score, we regard the class with greater label as the positive class. If we want to support pos_lable here, perhaps we need to support it in roc_auc_score at the same time.

@jnothman
Copy link
Member

jnothman commented Oct 23, 2019 via email

@qinhanmin2014
Copy link
Member Author

Firstly can we agree that plot_roc_curve is like roc_curve, not like roc_auc_score? So support for pos_label should reflect roc_curve.

If we keep consistent with roc_curve (we do so currently), seems that we won't be able to solve Andy's issue, because for roc_curve, pos_label = None(default) is the same as pos_label=1.

while I don't think it makes sense for the user to configure the pos_label for roc_auc_score where the scores are coming from a scikit-learn binary estimator,

Sorry I can'r understand this part. average_precision_score also comes from binary estimator but we support pos_label there?

@qinhanmin2014 qinhanmin2014 changed the title FIX Remove pos_label parameter in plot_roc_curve FIX Infer pos_label automatically in plot_roc_curve Oct 30, 2019
@qinhanmin2014
Copy link
Member Author

ping @thomasjpfan ready for a review

@ogrisel
Copy link
Member

ogrisel commented Oct 30, 2019

Sorry for late reaction but I have a concurrent suggestion complementary PR in #15405.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinhanmin2014 I pushed some improvements. Otherwise LGTM.

@ogrisel
Copy link
Member

ogrisel commented Nov 6, 2019

Based on the discussion in #15405 (comment), maybe we should actually remove pos_label from plot_roc_curve and let it always use estimator.classes_[1] internally.

WDYT @jnothman @qinhanmin2014 @thomasjpfan @amueller?

@amueller
Copy link
Member

amueller commented Nov 6, 2019

See #15405 (comment)

Unless we want to change the behavior to also slice predict_proba, then we should remove it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

plot_roc_curve doesn't correctly infer pos_label
4 participants