Skip to content

[MRG+1] Completely support binary y_true in roc_auc_score #9828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 11, 2017
Merged

[MRG+1] Completely support binary y_true in roc_auc_score #9828

merged 7 commits into from
Oct 11, 2017

Conversation

qinhanmin2014
Copy link
Member

@qinhanmin2014 qinhanmin2014 commented Sep 25, 2017

Reference Issue

Fixes #2723, proposed by @jnothman
Also see the discussions in #9805, #9567, #6874, #6873, #2616

What does this implement/fix? Explain your changes.

Currently, roc_auc_score only support either {0, 1} binary y_true or {-1, 1} binary y_true.
The PR completely support binary y_true. The basic thought is that for binary y_true, y_score is supposed to be the score of the class with greater label.
Use common test as the regression test.

Any other comments?

cc @jnothman

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

@@ -595,7 +595,8 @@ def test_invariance_string_vs_numbers_labels():

for name, metric in THRESHOLDED_METRICS.items():
if name in ("log_loss", "hinge_loss", "unnormalized_log_loss",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are we excluding by this? what happens when we remove the condition?

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks for the review.

What are we excluding by this? what happens when we remove the condition?

Test will fail because some metrics still only support {0, 1} y_true or {-1, 1} y_true.

After this PR, there are still some THRESHOLDED_METRICS which are excluded by the if statement:

# average_precision_score
average_precision_score
macro_average_precision_score
micro_average_precision_score
samples_average_precision_score
weighted_average_precision_score

# Multilabel ranking metrics
coverage_error
label_ranking_average_precision_score
label_ranking_loss

@jnothman
Copy link
Member

jnothman commented Sep 25, 2017 via email

@qinhanmin2014
Copy link
Member Author

@jnothman Thanks.
Seems that a possible way is to use METRIC_UNDEFINED_BINARY instead of the awkward list. It also seems reasonable because we are testing binary y_true here. But this should be based on #9786 (move roc_auc_score out of METRIC_UNDEFINED_BINARY).

@jnothman
Copy link
Member

jnothman commented Sep 25, 2017 via email

@qinhanmin2014
Copy link
Member Author

qinhanmin2014 commented Sep 25, 2017

@jnothman Thanks for your instant reply.

would it be a good idea to merge the PRs so we can see the state of affairs more completely?

From my perspective, #9786 actually solves another problem (improve the stability of roc_auc_score) and is almost finished. There's hardly any direct relationship between the two PRs. So it might be better not to combine #9786 and this PR unless you insist.

should we have a regression test to show that cross_val_score works regardless of which label is positive?

Sorry but I don't quite understand the necessity of such test. If roc_auc_score works appropriately, then seems that the scorer based on roc_auc_score as well as cross_val_score should work appropriately? I can't find similar test currently. If you can point out something similar to me, I will be able to further understand the problem.

And I'm not sure I get why average precision would be undefined-binary, but I'm still not in a position to look at the code.

According to the doc and a glance at the source code, seems that we should also move average_precision_score out of METRIC_UNDEFINED_BINARY. I'll take care of it after #9786.

I have wrapped up our discussions about the common test along with some opinions from myself in #9829.

@jnothman
Copy link
Member

Sounds good

@jnothman jnothman changed the title [MRG] Completely support binary y_true in roc_auc_score [MRG+1] Completely support binary y_true in roc_auc_score Sep 25, 2017
@qinhanmin2014
Copy link
Member Author

I'd rather the condition be a blacklist (suggesting "not yet implemented" or "not applicable") than a whitelist, which would seem to defy the purpose of common tests

@jnothman Now, we can get rid of the awkward list. Is it OK for you? Thanks.

@jnothman
Copy link
Member

That looks much better!

@qinhanmin2014
Copy link
Member Author

@jnothman Could you please give me some suggestions on how to make lgtm run? Thanks :)

@qinhanmin2014
Copy link
Member Author

@lesteve @amueller Could you kindly please give a second review? Thanks a lot :)

@TomDLT
Copy link
Member

TomDLT commented Oct 11, 2017

This is indeed much better than #9567, #6874, #2616

The basic use case seems to work:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

np.random.seed(0)
n_samples, n_features = 100, 10
est = LogisticRegression()
X = np.random.randn(n_samples, n_features)
y = np.random.randint(2, size=n_samples)
classes = np.array(['good', 'not-good'])

for y_true in (classes[y], classes[1 - y]):
    est.fit(X, y_true)
    y_score = est.decision_function(X)
    print(roc_auc_score(y_true, y_score))
# 0.678090575275
# 0.678090575275

LGTM

@TomDLT TomDLT merged commit 94db658 into scikit-learn:master Oct 11, 2017
@qinhanmin2014 qinhanmin2014 deleted the my-feature-3 branch October 11, 2017 11:04
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roc_auc_score fails if dtype is object
3 participants