Skip to content

Control default behavior of PR curve #24976

@edPauwels

Description

@edPauwels

Describe the workflow you want to enable

Display the recall as a function of the predicted positive rate (PP) using sklearn.metrics.precision_recall_curve to compute the recall and PP as a quantiles of the threshold scores. Currently not possible to perform consistently as sklearn.metrics.precision_recall_curve drops a lot of threshold values corresponding recall = 1. This behavior has been recently introduced.

Describe your proposed solution

Enable a drop_intermediate parameter in sklearn.metrics.precision_recall_curve similar to sklearn.metrics.roc_curve with default value = False and keep the extreme values of the threshold to avoid side effects.

Describe alternatives you've considered, if relevant

No response

Additional context

The documentation of sklearn.metrics.precision_recall_curve describes n_thresholds = len(np.unique(probas_pred)). This is not anymore the behavior of this function. This might cause a lot of backward incompatibility hence the suggested default value False.

Code to reproduce:

import numpy as np
import numpy.random as npr
import sklearn as sk
from sklearn.metrics import precision_recall_curve

scoresPredictor = np.arange(100)
groundTruth = np.concatenate((np.zeros(50), np.ones(50)))

precision_PR, recall_PR, thresholds_PR = precision_recall_curve(groundTruth, scoresPredictor)

print(len(np.unique(scoresPredictor)))
print(len(thresholds_PR))

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions