Skip to content

Implemented step-function interpolation for average_precision_score #6065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Implemented step-function interpolation for average_precision_score #6065

wants to merge 1 commit into from

Conversation

ndingwall
Copy link
Contributor

The current implementation of average_precision_score (in metrics) computes the area under the curve by linearly interpolating between operating points. This systematically mis-estimates the area under the precision-recall curve. As an example, consider a 'do-nothing' y_score of 0.5 for all examples. This is guaranteed to have an average precision score of > 0.5 (assuming at least one ground-truth positive datum) under the current implementation because it linearly interpolates between the two operating points (0, 1) and (1, x), where x is the fraction of positive examples in the sample.

This is particularly damaging in the case of a sparse sample with very few positive examples. It is very difficult to beat an average precision score of 0.5 with any genuine model, because we have to turn more false negatives into true positives than true negatives into false positives. (See this ppt for a concrete example.)

I propose a first-step solution, which is to interpolate by using the precision value of the next operating point (i.e. fill in gaps with a horizontal line from the operating point to the right). There are two justifications for this:

  1. If you demand a recall of 0.5, providing a precision estimate of the midpoint of 1 and x (or, more generally, interpolating between p_i and p_j for the operating points (r_i, p_i) and (r_j, p_j) immediately to the left and right of r = 0.5) is meaningless: that precision is impossible to obtain. Instead, we should back off to the lowest obtainable recall value r' such that r' >= 0.5.
  2. If we assume that y_scores are continuous and we have rounded versions of them, the best guess we can make for the 'original' y_scores is to order them randomly. With a large sample, a random ordering of y_scores approximates a horizontal line (with precision equal to the positive rate in the subsample).

For more discussion of this, see for example the discussion on the LingPipe blog or the chapter in Manning et al's Information Retrieval book. The latter of these proposes a more extreme solution: choosing the highest p' for all r' > r, but this may be lead to unexpected behavior since many operating points will be ignored.

To recreate the problem, consider:

import numpy as np
from sklearn.metrics import average_precision_score, precision_recall_curve
import matplotlib.pyplot as plt

def plot_pr_curve(y_true, y_pred):
    precision, recall, threshold = precision_recall_curve(
        y_true, y_pred)
    plt.plot(recall, precision, alpha=0.5)
    plt.xlabel("Recall")
    plt.ylabel("Precision")
    plt.legend(["Average precision score: {0:0.5f}".format(
                average_precision_score(y_true, y_pred))])
    plt.show()

class_imbalance = 0.1
y_true = np.random.binomial(1, class_imbalance, 1000000)
y_random = np.random.random(len(y_true))
y_constant = 0.5 * np.ones(len(y_true))
plot_pr_curve(y_true, y_constant)
plot_pr_curve(y_true, y_random)

The current implementation gives average precision scores of 0.55... and 0.10... respectively, while my implementation gives 0.10... for both.

A problem with making this change would be that the graph produced by naively plotting the recall and precision lists from precision_recall_curves would produce a curve with an AOC clearly different from the reported average_precision_score. The amended docstring should help to clarify the difference.

@amueller
Copy link
Member

was this replaced by #7356?

@varunagrawal
Copy link
Contributor

@amueller the update in this PR is now redundant since the step function integral has already been added to master. Please close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants