diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst index 4800569556758..c5384d49fa658 100644 --- a/doc/modules/model_evaluation.rst +++ b/doc/modules/model_evaluation.rst @@ -633,10 +633,25 @@ The :func:`precision_recall_curve` computes a precision-recall curve from the ground truth label and a score given by the classifier by varying a decision threshold. -The :func:`average_precision_score` function computes the average precision -(AP) from prediction scores. This score corresponds to the area under the -precision-recall curve. The value is between 0 and 1 and higher is better. -With random predictions, the AP is the fraction of positive samples. +The :func:`average_precision_score` function computes the +`average precision `_ +(AP) from prediction scores. The value is between 0 and 1 and higher is better. +AP is defined as + +.. math:: + \text{AP} = \sum_n (R_n - R_{n-1}) P_n + +where :math:`P_n` and :math:`R_n` are the precision and recall at the +nth threshold. With random predictions, the AP is the fraction of positive +samples. + +References [Manning2008]_ and [Everingham2010]_ present alternative variants of +AP that interpolate the precision-recall curve. Currently, +:func:`average_precision_score` does not implement any interpolated variant. +References [Davis2006]_ and [Flach2015]_ describe why a linear interpolation of +points on the precision-recall curve provides an overly-optimistic measure of +classifier performance. This linear interpolation is used when computing area +under the curve with the trapezoidal rule in :func:`auc`. Several functions allow you to analyze the precision, recall and F-measures score: @@ -671,6 +686,24 @@ binary classification and multilabel indicator format. for an example of :func:`precision_recall_curve` usage to evaluate classifier output quality. + +.. topic:: References: + + .. [Manning2008] C.D. Manning, P. Raghavan, H. Schütze, `Introduction to Information Retrieval + `_, + 2008. + .. [Everingham2010] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman, + `The Pascal Visual Object Classes (VOC) Challenge + `_, + IJCV 2010. + .. [Davis2006] J. Davis, M. Goadrich, `The Relationship Between Precision-Recall and ROC Curves + `_, + ICML 2006. + .. [Flach2015] P.A. Flach, M. Kull, `Precision-Recall-Gain Curves: PR Analysis Done Right + `_, + NIPS 2015. + + Binary classification ^^^^^^^^^^^^^^^^^^^^^ diff --git a/examples/model_selection/plot_precision_recall.py b/examples/model_selection/plot_precision_recall.py index dae720336dec8..633ceea85db53 100644 --- a/examples/model_selection/plot_precision_recall.py +++ b/examples/model_selection/plot_precision_recall.py @@ -61,9 +61,9 @@ in the threshold considerably reduces precision, with only a minor gain in recall. -**Average precision** summarizes such a plot as the weighted mean of precisions -achieved at each threshold, with the increase in recall from the previous -threshold used as the weight: +**Average precision** (AP) summarizes such a plot as the weighted mean of +precisions achieved at each threshold, with the increase in recall from the +previous threshold used as the weight: :math:`\\text{AP} = \\sum_n (R_n - R_{n-1}) P_n` @@ -71,6 +71,11 @@ nth threshold. A pair :math:`(R_k, P_k)` is referred to as an *operating point*. +AP and the trapezoidal area under the operating points +(:func:`sklearn.metrics.auc`) are common ways to summarize a precision-recall +curve that lead to different results. Read more in the +:ref:`User Guide `. + Precision-recall curves are typically used in binary classification to study the output of a classifier. In order to extend the precision-recall curve and average precision to multi-class or multi-label classification, it is necessary @@ -144,7 +149,7 @@ plt.ylabel('Precision') plt.ylim([0.0, 1.05]) plt.xlim([0.0, 1.0]) -plt.title('2-class Precision-Recall curve: AUC={0:0.2f}'.format( +plt.title('2-class Precision-Recall curve: AP={0:0.2f}'.format( average_precision)) ############################################################################### @@ -215,7 +220,7 @@ plt.ylim([0.0, 1.05]) plt.xlim([0.0, 1.0]) plt.title( - 'Average precision score, micro-averaged over all classes: AUC={0:0.2f}' + 'Average precision score, micro-averaged over all classes: AP={0:0.2f}' .format(average_precision["micro"])) ############################################################################### diff --git a/sklearn/metrics/ranking.py b/sklearn/metrics/ranking.py index 9755732a4f910..9eba8178bc956 100644 --- a/sklearn/metrics/ranking.py +++ b/sklearn/metrics/ranking.py @@ -40,7 +40,9 @@ def auc(x, y, reorder=False): """Compute Area Under the Curve (AUC) using the trapezoidal rule This is a general function, given points on a curve. For computing the - area under the ROC-curve, see :func:`roc_auc_score`. + area under the ROC-curve, see :func:`roc_auc_score`. For an alternative + way to summarize a precision-recall curve, see + :func:`average_precision_score`. Parameters ---------- @@ -68,7 +70,8 @@ def auc(x, y, reorder=False): See also -------- - roc_auc_score : Computes the area under the ROC curve + roc_auc_score : Compute the area under the ROC curve + average_precision_score : Compute average precision from prediction scores precision_recall_curve : Compute precision-recall pairs for different probability thresholds """ @@ -108,6 +111,19 @@ def average_precision_score(y_true, y_score, average="macro", sample_weight=None): """Compute average precision (AP) from prediction scores + AP summarizes a precision-recall curve as the weighted mean of precisions + achieved at each threshold, with the increase in recall from the previous + threshold used as the weight: + + .. math:: + \\text{AP} = \\sum_n (R_n - R_{n-1}) P_n + + where :math:`P_n` and :math:`R_n` are the precision and recall at the nth + threshold [1]_. This implementation is not interpolated and is different + from computing the area under the precision-recall curve with the + trapezoidal rule, which uses linear interpolation and can be too + optimistic. + Note: this implementation is restricted to the binary classification task or multilabel classification task. @@ -149,17 +165,12 @@ def average_precision_score(y_true, y_score, average="macro", References ---------- .. [1] `Wikipedia entry for the Average precision - `_ - .. [2] `Stanford Information Retrieval book - `_ - .. [3] `The PASCAL Visual Object Classes (VOC) Challenge - `_ + `_ See also -------- - roc_auc_score : Area under the ROC curve + roc_auc_score : Compute the area under the ROC curve precision_recall_curve : Compute precision-recall pairs for different probability thresholds @@ -190,7 +201,8 @@ def _binary_uninterpolated_average_precision( def roc_auc_score(y_true, y_score, average="macro", sample_weight=None): - """Compute Area Under the Curve (AUC) from prediction scores + """Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) + from prediction scores. Note: this implementation is restricted to the binary classification task or multilabel classification task in label indicator format. @@ -239,7 +251,7 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None): -------- average_precision_score : Area under the precision-recall curve - roc_curve : Compute Receiver operating characteristic (ROC) + roc_curve : Compute Receiver operating characteristic (ROC) curve Examples -------- @@ -396,6 +408,12 @@ def precision_recall_curve(y_true, probas_pred, pos_label=None, Increasing thresholds on the decision function used to compute precision and recall. + See also + -------- + average_precision_score : Compute average precision from prediction scores + + roc_curve : Compute Receiver operating characteristic (ROC) curve + Examples -------- >>> import numpy as np @@ -477,7 +495,7 @@ def roc_curve(y_true, y_score, pos_label=None, sample_weight=None, See also -------- - roc_auc_score : Compute Area Under the Curve (AUC) from prediction scores + roc_auc_score : Compute the area under the ROC curve Notes -----