Skip to content

average_precision_score() overestimates AUC value #13074

@barrydebruin

Description

@barrydebruin

Description

The average_precision_score() function in sklearn doesn't return a correct AUC value.

Steps/Code to Reproduce

Example:

import numpy as np
"""
    Desc: average_precision_score returns overestimated AUC of precision-recall curve
"""
# pathological example
p = [0.833, 0.800] # precision
r = [0.294, 0.235] # recall

# computation of average_precision_score()
print("AUC       = {:3f}".format(-np.sum(np.diff(r) * np.array(p)[:-1]))) # _binary_uninterpolated_average_precision()

# computation of auc() with trapezoid interpolation
print("AUC TRAP. = {:3f}".format(-np.trapz(p, r)))

# possible fix in _binary_uninterpolated_average_precision() **(edited)**
print("AUC FIX   = {:3f}".format(-np.sum(np.diff(r) * np.minimum(p[:-1], p[1:])))

#>> AUC       = 0.049147
#>> AUC TRAP. = 0.048174
#>> AUC FIX   = 0.047200

Expected Results

AUC without interpolation = (0.294 - 0.235) * 0.800 = 0.472
AUC with trapezoidal interpolation = 0.472 + (0.294 - 0.235) * (0.833 - 0.800) / 2 = 0.0482

Actual Results

This is what sklearn implements for AUC without interpolation (https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html):

sum((r[i] - r[i+1]) * p[i] for i in range(len(p)-1))
>> 0.049147

This is what I think is correct (no longer; see edit):

sum((r[i] - r[i+1]) * p[i+1] for i in range(len(p)-1))
>> 0.047200

EDIT: I found that the above 'correct' implementation doesn't always underestimate. It depends on the input. Therefore I have revised the uninterpolated AUC calculation to this:

sum((r[i] - r[i+1]) * min(p[i] + p[i+1]) for i in range(len(p)-1)) 
>> 0.047200

This has the advantage that the AUC calculation is more consistent; it is either equal or underestimated, but never overestimated (compared to the current uninterpolated AUC function). Below I show some examples on what it does:

  • Example 1: all work fine
p = [0.3, 1.0]
r = [1.0, 0.0]

#Results:
>> 0.30    # sklearn's _binary_uninterpolated_average_precision()
>> 0.30    # my consistent _binary_uninterpolated_average_precision()
>> 0.65    # np.trapz() (trapezoidal interpolation)

pr_curve1

  • Example 2: sklearn's _binary_uninterpolated_average_precision returns inaccurate number
p = [1.0, 0.3]
r = [1.0, 0.0]

#Results:
>> 1.00    # sklearn's _binary_uninterpolated_average_precision()
>> 0.30    # my consistent _binary_uninterpolated_average_precision()
>> 0.65    # np.trapz() (trapezoidal interpolation)

pr_curve2

  • Example 3: extra example
p = [0.4, 0.1, 1.0]
r = [1.0, 0.9, 0.0]

#Results:
>> 0.13      # sklearn's _binary_uninterpolated_average_precision()
>> 0.10      # my consistent _binary_uninterpolated_average_precision()
>> 0.52      # np.trapz() (trapezoidal interpolation)

pr_curve3

Versions

Windows-10-10.0.17134-SP0
Python 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]
NumPy 1.14.0
SciPy 1.0.0
Scikit-Learn 0.19.1

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions