Skip to content

Implemented "precision at recall k" and "recall at precision k" #20877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

shubhraneel
Copy link
Contributor

Reference Issues/PRs

Fixes #20266

What does this implement/fix? Explain your changes.

This PR implements the functions of "precision at recall k" and "recall at precision k" in sklearn.metrics. As mentioned in the issue #20266 by Ryanglambert, these metrics are commonly used, for example, in facebook/mmf

@@ -171,4 +173,6 @@
"v_measure_score",
"zero_one_loss",
"brier_score_loss",
"precision_at_recall_k",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would be better to call it max_precision_at_recall_k and max_recall_at_precision_k to make it obvious that this is the maximum that is taken.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I hear "precision_at_recall_k" I think of a single number singled out from the precision recall curve. (given a line, if I constrain X=x_i, then Y=y_i). If I agree with that logic then I think the original name is better to use.



def recall_at_precision_k(y_true, y_prob, k, *, pos_label=None, sample_weight=None):
"""Computes maximum recall for the thresholds when precision is greater
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make it fit on a single line:

Maximum recall for a precision greater than `k`.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a partial review. I will give further comments a bit later.

true positives and ``fn`` the number of false negatives. The recall is
intuitively the ability of the classifier to find all the positive samples.

Read more in the :ref:`User Guide <precision_recall_f_measure_metrics>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to add a section in the user guide documentation.
I think that we can add an example to show the meaning of the two metrics graphically on a precision-recall curve. We can then reuse the image of the example in the user guide.

Comment on lines +2661 to +2668
The precision is the ratio ``tp / (tp + fp)`` where ``tp`` is the number of
true positives and ``fp`` the number of false positives. The precision is
intuitively the ability of the classifier not to label as positive a sample
that is negative.

The recall is the ratio ``tp / (tp + fn)`` where ``tp`` is the number of
true positives and ``fn`` the number of false negatives. The recall is
intuitively the ability of the classifier to find all the positive samples.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking that we could avoid to repeat this description

Comment on lines +2685 to +2686
When ``pos_label=None``, if y_true is in {-1, 1} or {0, 1},
``pos_label`` is set to 1, otherwise an error will be raised.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When ``pos_label=None``, if y_true is in {-1, 1} or {0, 1},
``pos_label`` is set to 1, otherwise an error will be raised.
When `pos_label=None`, if y_true is in {-1, 1} or {0, 1},
`pos_label` is set to 1, otherwise an error will be raised.

Returns
-------
recall_at_precision_k : float
Maximum recall when for the thresholds when precision is greater
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something wrong with this sentence due to the twice "when"

See Also
--------
precision_recall_curve : Compute precision-recall curve.
plot_precision_recall_curve : Plot Precision Recall Curve for binary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not link to the plot_precision_recall_curve because it will be deprecated soon.

precision_recall_curve : Compute precision-recall curve.
plot_precision_recall_curve : Plot Precision Recall Curve for binary
classifiers.
PrecisionRecallDisplay : Precision Recall visualization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, we should add both the .from_estimator and .from_predictions method

>>> k = 0.75
>>> recall_at_precision_k(y_true, y_prob, k)
1.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove this blank line.

>>> y_true = np.array([0, 0, 1, 1, 1, 1])
>>> y_prob = np.array([0.1, 0.8, 0.9, 0.3, 1.0, 0.95])
>>> k = 0.75
>>> recall_at_precision_k(y_true, y_prob, k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to take a threshold for which the score is not 1.0

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding the flag "Request changes" to see that this PR has been reviewed

@lorentzenchr
Copy link
Member

I'm closing as it is superseeded by #24671. On top, we need a decision in #20266.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:metrics Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Precision @ Recall K || Recall @ Precision K
5 participants