-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Assymetry of roc_auc_score #10247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, |
I do not understand how you find that
while I understand that this is in accordance with the function documentation, which forces y_true and y_pred to be in the same shape, aka both are per class predictions. This is explicitly not the format a classifier wants to handle the y_true input, which wants it as single column array of class-labels (compare docs). This effectively prohibit the use of |
Oh. You're right. Although this issue is fixed in master. That's quite nasty. I'll need to track down what fixed this... |
This behaviour changed in master in ee2025f. The issue is that the 'roc_auc' scorer is defined internally with We should probably consider some documentation improvements, if not changing the interface. |
@mzoll The problem is fixed in #9521. Now you get the same result using _ProbaScorer (by setting needs_proba=True) and _ThresholdScorer (by setting needs_threshold=True). make_scorer(roc_auc_score, needs_proba=True)(clf, X_test, y_test)
0.97641161457146342
make_scorer(roc_auc_score, needs_threshold=True)(clf, X_test, y_test)
0.97641161457146342 But according to the doc of make_scorer, it is recommented to set needs_threshold=True. Here is the definition from scikit-learn. scikit-learn/sklearn/metrics/scorer.py Lines 506 to 507 in ab984a6
For the doc, there's already some information in make_scorer doc and the user guide. Maybe we still need more explanation about _ProbaScorer(needs_proba) and _ThresholdScorer(needs_threshold), at least I don't quite understand why we need _ProbaScorer. According to the code, seems that it can be replaced by _ThresholdScorer? |
Well there a certainly metrics that can't take an unnormalised decision function, but the may not necessitate a separate class |
@jnothman Thanks a lot for the clarification :)
I'm +1 for improving the doc and the user guide. |
@qinhanmin2014 @jnothman Hi, I came across the same problem. Thanks for your updates, I got the AUC score using make_scorer. But when I use roc_curve to plot the ROC curve, the same issue occurred: `--------------------------------------------------------------------------- in train_predict_evaluate_model(classifier, train_features, train_labels, test_features, test_labels) ~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/ranking.py in roc_curve(y_true, y_score, pos_label, sample_weight, drop_intermediate) ~/anaconda3/lib/python3.6/site-packages/sklearn/metrics/ranking.py in _binary_clf_curve(y_true, y_score, pos_label, sample_weight) ~/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in column_or_1d(y, warn) ValueError: bad input shape (134, 2)` Do you have any solution to solve this problem? Please kindly advice. Thank you in advance! |
Please provide self-contained example code, including imports and data (if possible), so that other contributors can just run it and reproduce your issue. Ideally your example code should be minimal. |
I just figured out the problem. Because I didn't notice that there are two columns in the result of predict_proba(), so the result cannot apply to roc_curve(). The problem was solved by selecting the second column of the predict_proba() as y_score. Thanks! |
Description
For binary tasks with wrapped down custom scoring functions by the
metrics.make_scorerer
-schemeroc_auc_score
is behaving unexpectedly. When requiring the probability output from a binary classifier, which is a shape( n , 2) object, while the training/testing lables are an expected shape (n, ) input, A scoring will fail.However, the binary task and internal handling of different y shapes is incidentially correctly understood by
metrics.log_loss
by internal evaluations, butroc_auc_score
currently fails at this. This especially cumbersome, if the scoring function is wrapped down in across_val_score
and amake_scorer
much deeper in the code with possible nested pipelines etc, where the automatic correct evaluation of this particular metric is required.Steps/Code to Reproduce
This should illustrate what is failing
Expected Results
roc_auc_score
should behave in a similar way aslog_loss
, guessing the binary classification task and handle different shape input correctlyVersions
Windows-10-10.0.15063-SP0
Python 3.6.1 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]
NumPy 1.13.3
SciPy 0.19.1
Scikit-Learn 0.19.1
The text was updated successfully, but these errors were encountered: