-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH PrecisionRecallDisplay add option to plot chance level #26019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Charlie-XIAO! This will certainly be a nice addition :) Here are just a couple of comments.
You will also have to solve the current conflicts with main.
Hi @ArturoAmorQ, thanks for your review! I have made your suggested changes and resolved conflicts with the main branch. The test cases have passed, but I haven't tested for the error messages I added yet. Please let me know if I should make any further changes and if I need to modify or test for those error messages. EDIT: Clearly I would need to add test cases for the error messages since Codecov did not pass. Will do soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a couple comments.
Hi @glemaitre, thanks for your review! I've made your suggested changes. Please let me know if there are any further modifications you want me to make. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be cool if we can add this plot_chance_level=True
inside one of the example where it makes sense.
Thanks for the review @glemaitre @betatim! I have made the suggested changes, except that I have not added the new test yet. I will do that as well as adding this new feature in some example ASAP. |
I have added this new feature into some of the examples @glemaitre. For instance, As you can see, then chance level line is always at the bottom. I think we may need to fix both axes to [0, 1], as suggested in Issue #25929. If you agree, I will open another PR for these visual changes (for both PR and ROC curve), similar to what I've done in PR #25972, then apply them to these examples. |
Yes we will need the other improvement but in another PR. |
Okay, will do after this PR is accepted (since I want to avoid too many merge conflicts). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM upton the 2 minors changes.
Hi @glemaitre, thanks for your review! One thing is that |
True, this is Python 3.10. So let's revert then. |
ping @betatim would you mind having a new look at this PR? This is good to be merged on my side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left one comment about making it an error to request the chance level line without providing the data needed.
Otherwise this looks good to me. Thanks for the work!
Thanks for the additional exception. Merged. |
Nice. Thanks @Charlie-XIAO |
…arn#26019) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
I am confused - what exactly does this line represent? The right-most point is the always-positive classifier as I understand it, which is indeed the baseline to beat in PR analysis - but it is not in any way random (and thus 'chance level' seems misleading in this context)? And how / why is the plotted horizontal extension of this point meaningful? Kull and Flach argue (see link above) that the comparison line to plot (analogous to the diagonal in the ROC curve) would be the F1 isometric curve associated with this classifier, which is hyperbolic and can also easily be computed: baserate = y_true.sum() / len(y_true)
F1_baseline = 2 * (baserate * 1) / (baserate + 1)
recall = np.arange(F1_baseline / (2 - F1_baseline), 1+1e-7, 0.05)
prec_baseline = F1_baseline * recall / (2 * recall - F1_baseline)
plt.plot(recall, prec_baseline, 'k--', label="$F_1$ baseline") |
Reference Issues/PRs
Towards #25929. Relevant PRs: #25972, #25987.
What does this implement/fix? Explain your changes.
This PR implements the following:
PrecisionRecallDisplay
class.Any other comments?
Unlike
RocCurveDisplay
in #25987, the chance level here depends ony
. Therefore, ifplot_chance_level=True
,plot
would require the prevalence of the positive label. It is okay if one usesfrom_estimator
orfrom_predictions
since we can compute the prevalence level fromy
that is originally required by these methods. Please let me know if I should approach differently.By the way, I
git grep
ed all files underexamples/
that includePrecisionRecallDisplay
, but none of them have plotted the chance level line, so I made no modifications.