-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
average_precision_score() overestimates AUC value #13074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for your comment. I did see some of these older issues, but not all of them. I did actually find some cases where the AUC value is underestimated as well, which makes the problem a bit more complex than I initially thought. For datasets with a small number of precision and recall thresholds, it seems better for now to use the interpolated area under the curve (i.e. sklearn.metrics.auc() or np.trapz()), or am I mistaken? |
We had a previous implementation like that, and extensive debate when
changing it.
|
I don't think the current implementation overestimates. Consider your second example:
If I told you I wanted a recall of 0.5, you couldn't give me the second operating point ( Your implementation uses |
As part of scikit-learn's triaging guidelines, I am closing this issue because it is a duplicate of #4577. |
Description
The average_precision_score() function in sklearn doesn't return a correct AUC value.
Steps/Code to Reproduce
Example:
Expected Results
AUC without interpolation = (0.294 - 0.235) * 0.800 = 0.472
AUC with trapezoidal interpolation = 0.472 + (0.294 - 0.235) * (0.833 - 0.800) / 2 = 0.0482
Actual Results
This is what sklearn implements for AUC without interpolation (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html):
This is what I think is correct (no longer; see edit):
EDIT: I found that the above 'correct' implementation doesn't always underestimate. It depends on the input. Therefore I have revised the uninterpolated AUC calculation to this:
This has the advantage that the AUC calculation is more consistent; it is either equal or underestimated, but never overestimated (compared to the current uninterpolated AUC function). Below I show some examples on what it does:
Versions
Windows-10-10.0.17134-SP0
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]
NumPy 1.14.0
SciPy 1.0.0
Scikit-Learn 0.19.1
The text was updated successfully, but these errors were encountered: