Skip to content

FIX compute precision-recall at 100% recall #23214

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

stephanecollot
Copy link
Contributor

Reference Issues/PRs

Fixes #23213

What does this implement/fix? Explain your changes.

Remove the unnecessary dropping.

Any other comments?

Full disclosure, this PR modifies precision_recall_curve() that is only used by _binary_uninterpolated_average_precision() that is only used by average_precision_score()

precision, recall, _ = precision_recall_curve(

I think average_precision_score() should not be impacted by this change, and its is tested 54 times in unit tests

@glemaitre
Copy link
Member

You will need to fix failing tests. Basically, it should be docstring test in most cases.
We also need an additional non-regression test: I think that we should be a PR curve and check that the last point is equivalent to a decision rule that always predicts the positive class.

@stephanecollot
Copy link
Contributor Author

@glemaitre I added 3 commits:

  • fixing existing unit tests
  • fixing existing doc tests
  • adding regression test (I check that it is failing if I revert the change)

@glemaitre
Copy link
Member

Please add an entry to the change log at doc/whats_new/v1.1.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

@glemaitre glemaitre changed the title Fixes precision recall at 100% recall FIX compute precision-recall at 100% recall Apr 26, 2022
@@ -855,11 +855,11 @@ def precision_recall_curve(y_true, probas_pred, *, pos_label=None, sample_weight
>>> precision, recall, thresholds = precision_recall_curve(
... y_true, y_scores)
>>> precision
array([0.66666667, 0.5 , 1. , 1. ])
array([0.5 , 0.66666667, 0.5 , 1. , 1. ])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the docstring of thresholds to define properly n_thresholds.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from these 2 changes, LGTM

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. LGTM

stephanecollot and others added 3 commits April 29, 2022 15:24
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@stephanecollot
Copy link
Contributor Author

stephanecollot commented Apr 29, 2022

It seems the doc is failing, but I cannot understand why:

Sphinx Warnings in affected files
doc/whats_new/v1.1.rst:105: WARNING: Unexpected indentation.

nothing wrong at line 105.

@glemaitre
Copy link
Member

For sure this is not linked with your change. We can ignore it.

@glemaitre
Copy link
Member

I merge main into your branch. The problem was solved in main by the following PR: #23246

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @stephanecollot

@jeremiedbb jeremiedbb merged commit 32c53bc into scikit-learn:main May 2, 2022
@stephanecollot
Copy link
Contributor Author

It went very smoothly, I'm happy I did this first contribution, thank you @glemaitre.
I will most probably continue to contribute by adding uncertainty band on precision-recall curves soon.

@glemaitre
Copy link
Member

@stephanecollot I started a POC on the subject: #21211

The idea is to use cross-validation to get uncertainty bounds. I will probably find time to carry on some work in the next release. However, we will need to breakdown the PR into smaller PR to facilitate the review process:

  1. Add the return_indices parameter to cross_validate
  2. Then add a new from_cv_results class method for each display

glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request May 19, 2022
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr>
glemaitre added a commit that referenced this pull request May 19, 2022
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

precision_recall_curve() is not returning the full curve at high recall
3 participants