-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Completely support binary y_true in roc_auc_score #9828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
sklearn/metrics/tests/test_common.py
Outdated
@@ -595,7 +595,8 @@ def test_invariance_string_vs_numbers_labels(): | |||
|
|||
for name, metric in THRESHOLDED_METRICS.items(): | |||
if name in ("log_loss", "hinge_loss", "unnormalized_log_loss", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we excluding by this? what happens when we remove the condition?
@jnothman Thanks for the review.
Test will fail because some metrics still only support {0, 1} y_true or {-1, 1} y_true. After this PR, there are still some THRESHOLDED_METRICS which are excluded by the if statement: # average_precision_score
average_precision_score
macro_average_precision_score
micro_average_precision_score
samples_average_precision_score
weighted_average_precision_score
# Multilabel ranking metrics
coverage_error
label_ranking_average_precision_score
label_ranking_loss |
I can't check now, but is that set complementary in some way, like
supporting pos_label? I'd rather the condition be a blacklist (suggesting
"not yet implemented" or "not applicable") than a whitelist, which would
seem to defy the purpose of common tests
…On 25 Sep 2017 8:46 pm, "Hanmin Qin" ***@***.***> wrote:
@jnothman <https://github.com/jnothman> Thanks for the review.
What are we excluding by this? what happens when we remove the condition?
Test will fail because some metrics still only support {0, 1} y_true or
{-1, 1} y_true.
After this PR, there are still some THRESHOLDED_METRICS which are excluded
by the if statement:
# average_precision_score
average_precision_score
macro_average_precision_score
micro_average_precision_score
samples_average_precision_score
weighted_average_precision_score
# Multilabel ranking metrics
coverage_error
label_ranking_average_precision_score
label_ranking_loss
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9828 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67EVda_79DPjlQ-wqohfvDikV5gYks5sl4R5gaJpZM4PiYJ_>
.
|
would it be a good idea to merge the PRs so we can see the state of affairs
more completely?
should we have a regression test to show that cross_val_score works
regardless of which label is positive?
And I'm not sure I get why average precision would be undefined-binary, but
I'm still not in a position to look at the code.
|
@jnothman Thanks for your instant reply.
From my perspective, #9786 actually solves another problem (improve the stability of roc_auc_score) and is almost finished. There's hardly any direct relationship between the two PRs. So it might be better not to combine #9786 and this PR unless you insist.
Sorry but I don't quite understand the necessity of such test. If roc_auc_score works appropriately, then seems that the scorer based on roc_auc_score as well as cross_val_score should work appropriately? I can't find similar test currently. If you can point out something similar to me, I will be able to further understand the problem.
According to the doc and a glance at the source code, seems that we should also move average_precision_score out of METRIC_UNDEFINED_BINARY. I'll take care of it after #9786. I have wrapped up our discussions about the common test along with some opinions from myself in #9829. |
Sounds good |
@jnothman Now, we can get rid of the awkward list. Is it OK for you? Thanks. |
That looks much better! |
@jnothman Could you please give me some suggestions on how to make lgtm run? Thanks :) |
This is indeed much better than #9567, #6874, #2616 The basic use case seems to work: import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
np.random.seed(0)
n_samples, n_features = 100, 10
est = LogisticRegression()
X = np.random.randn(n_samples, n_features)
y = np.random.randint(2, size=n_samples)
classes = np.array(['good', 'not-good'])
for y_true in (classes[y], classes[1 - y]):
est.fit(X, y_true)
y_score = est.decision_function(X)
print(roc_auc_score(y_true, y_score))
# 0.678090575275
# 0.678090575275 LGTM |
Reference Issue
Fixes #2723, proposed by @jnothman
Also see the discussions in #9805, #9567, #6874, #6873, #2616
What does this implement/fix? Explain your changes.
Currently, roc_auc_score only support either {0, 1} binary y_true or {-1, 1} binary y_true.
The PR completely support binary y_true. The basic thought is that for binary y_true, y_score is supposed to be the score of the class with greater label.
Use common test as the regression test.
Any other comments?
cc @jnothman