Implemented step-function interpolation for average_precision_score #6065

ndingwall · 2015-12-18T21:01:17Z

The current implementation of average_precision_score (in metrics) computes the area under the curve by linearly interpolating between operating points. This systematically mis-estimates the area under the precision-recall curve. As an example, consider a 'do-nothing' y_score of 0.5 for all examples. This is guaranteed to have an average precision score of > 0.5 (assuming at least one ground-truth positive datum) under the current implementation because it linearly interpolates between the two operating points (0, 1) and (1, x), where x is the fraction of positive examples in the sample.

This is particularly damaging in the case of a sparse sample with very few positive examples. It is very difficult to beat an average precision score of 0.5 with any genuine model, because we have to turn more false negatives into true positives than true negatives into false positives. (See this ppt for a concrete example.)

I propose a first-step solution, which is to interpolate by using the precision value of the next operating point (i.e. fill in gaps with a horizontal line from the operating point to the right). There are two justifications for this:

If you demand a recall of 0.5, providing a precision estimate of the midpoint of 1 and x (or, more generally, interpolating between p_i and p_j for the operating points (r_i, p_i) and (r_j, p_j) immediately to the left and right of r = 0.5) is meaningless: that precision is impossible to obtain. Instead, we should back off to the lowest obtainable recall value r' such that r' >= 0.5.
If we assume that y_scores are continuous and we have rounded versions of them, the best guess we can make for the 'original' y_scores is to order them randomly. With a large sample, a random ordering of y_scores approximates a horizontal line (with precision equal to the positive rate in the subsample).

For more discussion of this, see for example the discussion on the LingPipe blog or the chapter in Manning et al's Information Retrieval book. The latter of these proposes a more extreme solution: choosing the highest p' for all r' > r, but this may be lead to unexpected behavior since many operating points will be ignored.

To recreate the problem, consider:

import numpy as np
from sklearn.metrics import average_precision_score, precision_recall_curve
import matplotlib.pyplot as plt

def plot_pr_curve(y_true, y_pred):
    precision, recall, threshold = precision_recall_curve(
        y_true, y_pred)
    plt.plot(recall, precision, alpha=0.5)
    plt.xlabel("Recall")
    plt.ylabel("Precision")
    plt.legend(["Average precision score: {0:0.5f}".format(
                average_precision_score(y_true, y_pred))])
    plt.show()

class_imbalance = 0.1
y_true = np.random.binomial(1, class_imbalance, 1000000)
y_random = np.random.random(len(y_true))
y_constant = 0.5 * np.ones(len(y_true))
plot_pr_curve(y_true, y_constant)
plot_pr_curve(y_true, y_random)

The current implementation gives average precision scores of 0.55... and 0.10... respectively, while my implementation gives 0.10... for both.

A problem with making this change would be that the graph produced by naively plotting the recall and precision lists from precision_recall_curves would produce a curve with an AOC clearly different from the reported average_precision_score. The amended docstring should help to clarify the difference.

amueller · 2016-09-14T21:11:57Z

was this replaced by #7356?

varunagrawal · 2018-02-25T02:22:23Z

@amueller the update in this PR is now redundant since the step function integral has already been added to master. Please close this PR.

Implemented step-function interpolation for average_precision_score

774b903

ndingwall mentioned this pull request Dec 18, 2015

error in average_precision_score #5379

Closed

jnothman closed this Feb 25, 2018

jnothman mentioned this pull request Jan 31, 2019

average_precision_score() overestimates AUC value #13074

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implemented step-function interpolation for average_precision_score #6065

Implemented step-function interpolation for average_precision_score #6065

Uh oh!

ndingwall commented Dec 18, 2015

Uh oh!

amueller commented Sep 14, 2016

Uh oh!

varunagrawal commented Feb 25, 2018

Uh oh!

Uh oh!

Uh oh!

Implemented step-function interpolation for average_precision_score #6065

Implemented step-function interpolation for average_precision_score #6065

Uh oh!

Conversation

ndingwall commented Dec 18, 2015

Uh oh!

amueller commented Sep 14, 2016

Uh oh!

varunagrawal commented Feb 25, 2018

Uh oh!

Uh oh!