-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
Describe the workflow you want to enable
Display the recall as a function of the predicted positive rate (PP) using sklearn.metrics.precision_recall_curve
to compute the recall and PP as a quantiles of the threshold scores. Currently not possible to perform consistently as sklearn.metrics.precision_recall_curve
drops a lot of threshold values corresponding recall = 1. This behavior has been recently introduced.
Describe your proposed solution
Enable a drop_intermediate
parameter in sklearn.metrics.precision_recall_curve
similar to sklearn.metrics.roc_curve
with default value = False and keep the extreme values of the threshold to avoid side effects.
Describe alternatives you've considered, if relevant
No response
Additional context
The documentation of sklearn.metrics.precision_recall_curve
describes n_thresholds = len(np.unique(probas_pred))
. This is not anymore the behavior of this function. This might cause a lot of backward incompatibility hence the suggested default value False.
Code to reproduce:
import numpy as np
import numpy.random as npr
import sklearn as sk
from sklearn.metrics import precision_recall_curve
scoresPredictor = np.arange(100)
groundTruth = np.concatenate((np.zeros(50), np.ones(50)))
precision_PR, recall_PR, thresholds_PR = precision_recall_curve(groundTruth, scoresPredictor)
print(len(np.unique(scoresPredictor)))
print(len(thresholds_PR))