Skip to content

Precision Recall numbers computed by Scikits are not interpolated (non-standard) #4577

@karpathy

Description

@karpathy

Hi,

Scikit Learn seems to implement Precision Recall curves (and Average Precision values/AUC under PR curve) in a non-standard way, without documenting the discrepancy. The standard way of computing Precision Recall numbers is by interpolating the curve, as described here:
http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html
the motivation is to

  1. Smooth out the kinks and reduce noise contribution to the score
  2. In any practical application, if your PR curve ever went up, then you would strictly prefer to set your threshold there rather than at the original place (achieving both more precision and recall). Hence, people prefer to interpolate the curve, which better integrates out the threshold parameter and gives a more sensible estimate of the real performance.

This is also what standard code for Pascal VOC does and explain this in their writeup:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.5766&rep=rep1&type=pdf

VL_FEAT also has options for interpolation:
http://www.vlfeat.org/matlab/vl_pr.html
and as shown in their code here: https://github.com/vlfeat/vlfeat/blob/edc378a722ea0d79e29f4648a54bb62f32b22568/toolbox/plotop/vl_pr.m

The concern is that people using the scikit version will see incorrectly reported LOWER performance, than what they might see reported in other papers.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions