-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
Hi,
Scikit Learn seems to implement Precision Recall curves (and Average Precision values/AUC under PR curve) in a non-standard way, without documenting the discrepancy. The standard way of computing Precision Recall numbers is by interpolating the curve, as described here:
http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html
the motivation is to
- Smooth out the kinks and reduce noise contribution to the score
- In any practical application, if your PR curve ever went up, then you would strictly prefer to set your threshold there rather than at the original place (achieving both more precision and recall). Hence, people prefer to interpolate the curve, which better integrates out the threshold parameter and gives a more sensible estimate of the real performance.
This is also what standard code for Pascal VOC does and explain this in their writeup:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.5766&rep=rep1&type=pdf
VL_FEAT also has options for interpolation:
http://www.vlfeat.org/matlab/vl_pr.html
and as shown in their code here: https://github.com/vlfeat/vlfeat/blob/edc378a722ea0d79e29f4648a54bb62f32b22568/toolbox/plotop/vl_pr.m
The concern is that people using the scikit version will see incorrectly reported LOWER performance, than what they might see reported in other papers.