Implemented "precision at recall k" and "recall at precision k" #20877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

shubhraneel wants to merge 5 commits into scikit-learn:main from shubhraneel:precisionatrecallk

Contributor

shubhraneel commented Aug 28, 2021

Reference Issues/PRs

Fixes #20266

What does this implement/fix? Explain your changes.

This PR implements the functions of "precision at recall k" and "recall at precision k" in sklearn.metrics. As mentioned in the issue #20266 by Ryanglambert, these metrics are commonly used, for example, in facebook/mmf

shubhraneel added 3 commits

August 28, 2021 23:22


          Add precision_at_recall_k and recall_at_precision_k functions

e13de28


          Add tests

788527c


          Refactor code: no need to take list of tuples and avoid try block

95f25a6

github-actions bot added the module:metrics label

shubhraneel added 2 commits

August 29, 2021 08:39


          Add more tests and documentation

f143ee3


          Add changelog entry

c9cddf0

glemaitre reviewed

View reviewed changes

sklearn/metrics/__init__.py

@@ @@ -171,4 +173,6 @@ @@
                   "v_measure_score",
                   "zero_one_loss",
                   "brier_score_loss",
+                  "precision_at_recall_k",

Member

glemaitre Sep 1, 2021

I think that it would be better to call it max_precision_at_recall_k and max_recall_at_precision_k to make it obvious that this is the maximum that is taken.

Ryanglambert Dec 11, 2021

When I hear "precision_at_recall_k" I think of a single number singled out from the precision recall curve. (given a line, if I constrain X=x_i, then Y=y_i). If I agree with that logic then I think the original name is better to use.

glemaitre reviewed

View reviewed changes

sklearn/metrics/_classification.py



		def recall_at_precision_k(y_true, y_prob, k, *, pos_label=None, sample_weight=None):
		"""Computes maximum recall for the thresholds when precision is greater

Member

glemaitre Sep 1, 2021

We should make it fit on a single line:

Maximum recall for a precision greater than `k`.

glemaitre reviewed

View reviewed changes

Member

glemaitre left a comment

I made a partial review. I will give further comments a bit later.

sklearn/metrics/_classification.py

+                  true positives and ``fn`` the number of false negatives. The recall is
+                  intuitively the ability of the classifier to find all the positive samples.
+                  Read more in the :ref:`User Guide <precision_recall_f_measure_metrics>`.

Member

glemaitre Sep 1, 2021

We will need to add a section in the user guide documentation.
I think that we can add an example to show the meaning of the two metrics graphically on a precision-recall curve. We can then reuse the image of the example in the user guide.

sklearn/metrics/_classification.py

Comment on lines +2661 to +2668

+                  The precision is the ratio ``tp / (tp + fp)`` where ``tp`` is the number of
+                  true positives and ``fp`` the number of false positives. The precision is
+                  intuitively the ability of the classifier not to label as positive a sample
+                  that is negative.
+                  The recall is the ratio ``tp / (tp + fn)`` where ``tp`` is the number of
+                  true positives and ``fn`` the number of false negatives. The recall is
+                  intuitively the ability of the classifier to find all the positive samples.

Member

glemaitre Sep 1, 2021

I am thinking that we could avoid to repeat this description

sklearn/metrics/_classification.py

Comment on lines +2685 to +2686

		When ``pos_label=None``, if y_true is in {-1, 1} or {0, 1},
		``pos_label`` is set to 1, otherwise an error will be raised.

Member

glemaitre Sep 1, 2021

Suggested change

      
                    When ``pos_label=None``, if y_true is in {-1, 1} or {0, 1},
          
                    ``pos_label`` is set to 1, otherwise an error will be raised.
          
                    When `pos_label=None`, if y_true is in {-1, 1} or {0, 1},
          
                    `pos_label` is set to 1, otherwise an error will be raised.

sklearn/metrics/_classification.py

+                  Returns
+                  -------
+                  recall_at_precision_k : float
+                      Maximum recall when for the thresholds when precision is greater

Member

glemaitre Sep 1, 2021

There is something wrong with this sentence due to the twice "when"

sklearn/metrics/_classification.py

+                  See Also
+                  --------
+                  precision_recall_curve : Compute precision-recall curve.
+                  plot_precision_recall_curve : Plot Precision Recall Curve for binary

Member

glemaitre Sep 1, 2021

We should not link to the plot_precision_recall_curve because it will be deprecated soon.

sklearn/metrics/_classification.py

+                  precision_recall_curve : Compute precision-recall curve.
+                  plot_precision_recall_curve : Plot Precision Recall Curve for binary
+                      classifiers.
+                  PrecisionRecallDisplay : Precision Recall visualization.

Member

glemaitre Sep 1, 2021

In addition, we should add both the .from_estimator and .from_predictions method

sklearn/metrics/_classification.py

+                  >>> k = 0.75
+                  >>> recall_at_precision_k(y_true, y_prob, k)
+.0

Member

glemaitre Sep 1, 2021

You should remove this blank line.

sklearn/metrics/_classification.py

+                  >>> y_true = np.array([0, 0, 1, 1, 1, 1])
+                  >>> y_prob = np.array([0.1, 0.8, 0.9, 0.3, 1.0, 0.95])
+                  >>> k = 0.75
+                  >>> recall_at_precision_k(y_true, y_prob, k)

Member

glemaitre Sep 1, 2021

It might be better to take a threshold for which the score is not 1.0

glemaitre requested changes

View reviewed changes

Member

glemaitre left a comment

Just adding the flag "Request changes" to see that this PR has been reviewed

lorentzenchr mentioned this pull request

RFC Principled metrics for scoring and calibration of supervised learning #21718

Open

cmarmo added Stalled help wanted labels

This was referenced Oct 15, 2022

Precision @ Recall K || Recall @ Precision K #20266

Open

FEA implement max precision@recall K / recall@precision K #24671

Closed

cmarmo added Superseded and removed Stalled help wanted labels

Member

lorentzenchr commented Apr 29, 2023

I'm closing as it is superseeded by #24671. On top, we need a decision in #20266.

lorentzenchr closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:metrics Superseded