Skip to content

Control default behavior of PR curve #24976

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
edPauwels opened this issue Nov 18, 2022 · 2 comments
Closed

Control default behavior of PR curve #24976

edPauwels opened this issue Nov 18, 2022 · 2 comments
Labels
Needs Triage Issue requires triage New Feature

Comments

@edPauwels
Copy link

edPauwels commented Nov 18, 2022

Describe the workflow you want to enable

Display the recall as a function of the predicted positive rate (PP) using sklearn.metrics.precision_recall_curve to compute the recall and PP as a quantiles of the threshold scores. Currently not possible to perform consistently as sklearn.metrics.precision_recall_curve drops a lot of threshold values corresponding recall = 1. This behavior has been recently introduced.

Describe your proposed solution

Enable a drop_intermediate parameter in sklearn.metrics.precision_recall_curve similar to sklearn.metrics.roc_curve with default value = False and keep the extreme values of the threshold to avoid side effects.

Describe alternatives you've considered, if relevant

No response

Additional context

The documentation of sklearn.metrics.precision_recall_curve describes n_thresholds = len(np.unique(probas_pred)). This is not anymore the behavior of this function. This might cause a lot of backward incompatibility hence the suggested default value False.

Code to reproduce:

import numpy as np
import numpy.random as npr
import sklearn as sk
from sklearn.metrics import precision_recall_curve

scoresPredictor = np.arange(100)
groundTruth = np.concatenate((np.zeros(50), np.ones(50)))

precision_PR, recall_PR, thresholds_PR = precision_recall_curve(groundTruth, scoresPredictor)

print(len(np.unique(scoresPredictor)))
print(len(thresholds_PR))

@edPauwels edPauwels added Needs Triage Issue requires triage New Feature labels Nov 18, 2022
@betatim
Copy link
Member

betatim commented Nov 18, 2022

There is a PR #24668 that implements the suggested fix. The issue that lead to the PR is #21825.

However the issue of values being dropped at recall=1 was also reported in #23213 and fixed in #23214.

I don't know when the behaviour of dropping recall=1 values was introduced.

Overall I think one of the reported issues here was already fixed and for the idea of drop_intermediate there is a PR already.

Could you test with a nightly build if the problem has been resolved?

@edPauwels
Copy link
Author

edPauwels commented Nov 18, 2022

The issue is fixed in 1.2.dev0 and 1.1.2. I should have checked. This was the main purpose of the request, I think the issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issue requires triage New Feature
Projects
None yet
Development

No branches or pull requests

2 participants