Skip to content

ENH PrecisionRecallDisplay add option to plot chance level #26019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
May 22, 2023

Conversation

Charlie-XIAO
Copy link
Contributor

@Charlie-XIAO Charlie-XIAO commented Mar 29, 2023

Reference Issues/PRs

Towards #25929. Relevant PRs: #25972, #25987.

What does this implement/fix? Explain your changes.

This PR implements the following:

  • Add attribute chance_level_ to PrecisionRecallDisplay class.
  • Add option to plot the chance level line, and supports passing a dict to alter its style.

Any other comments?

Unlike RocCurveDisplay in #25987, the chance level here depends on y. Therefore, if plot_chance_level=True, plot would require the prevalence of the positive label. It is okay if one uses from_estimator or from_predictions since we can compute the prevalence level from y that is originally required by these methods. Please let me know if I should approach differently.

By the way, I git greped all files under examples/ that include PrecisionRecallDisplay, but none of them have plotted the chance level line, so I made no modifications.

@glemaitre glemaitre self-requested a review April 4, 2023 09:26
Copy link
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Charlie-XIAO! This will certainly be a nice addition :) Here are just a couple of comments.

You will also have to solve the current conflicts with main.

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Apr 11, 2023

Hi @ArturoAmorQ, thanks for your review! I have made your suggested changes and resolved conflicts with the main branch. The test cases have passed, but I haven't tested for the error messages I added yet. Please let me know if I should make any further changes and if I need to modify or test for those error messages.

EDIT: Clearly I would need to add test cases for the error messages since Codecov did not pass. Will do soon.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a couple comments.

@Charlie-XIAO
Copy link
Contributor Author

Hi @glemaitre, thanks for your review! I've made your suggested changes. Please let me know if there are any further modifications you want me to make.

@glemaitre glemaitre self-requested a review April 14, 2023 09:57
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be cool if we can add this plot_chance_level=True inside one of the example where it makes sense.

@Charlie-XIAO
Copy link
Contributor Author

Thanks for the review @glemaitre @betatim! I have made the suggested changes, except that I have not added the new test yet. I will do that as well as adding this new feature in some example ASAP.

@Charlie-XIAO
Copy link
Contributor Author

I have added this new feature into some of the examples @glemaitre. For instance,

As you can see, then chance level line is always at the bottom. I think we may need to fix both axes to [0, 1], as suggested in Issue #25929. If you agree, I will open another PR for these visual changes (for both PR and ROC curve), similar to what I've done in PR #25972, then apply them to these examples.

@glemaitre
Copy link
Member

I think we may need to fix both axes to [0, 1]

Yes we will need the other improvement but in another PR.

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Apr 14, 2023

Yes we will need the other improvement but in another PR.

Okay, will do after this PR is accepted (since I want to avoid too many merge conflicts).

@glemaitre glemaitre self-requested a review May 4, 2023 09:06
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM upton the 2 minors changes.

@glemaitre glemaitre added this to the 1.3 milestone May 4, 2023
@glemaitre glemaitre added the Waiting for Second Reviewer First reviewer is done, need a second one! label May 4, 2023
@Charlie-XIAO
Copy link
Contributor Author

Hi @glemaitre, thanks for your review! One thing is that Counter does not have total, and as suggested in the Counter docs, the common pattern for obtaining total counts is sum(c.values()), as currently implemented. Please let me know if this is okay.

@glemaitre
Copy link
Member

True, this is Python 3.10. So let's revert then.

@glemaitre
Copy link
Member

ping @betatim would you mind having a new look at this PR? This is good to be merged on my side.

@Charlie-XIAO Charlie-XIAO requested a review from betatim May 16, 2023 02:37
Copy link
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment about making it an error to request the chance level line without providing the data needed.

Otherwise this looks good to me. Thanks for the work!

@betatim betatim merged commit 6238968 into scikit-learn:main May 22, 2023
@betatim
Copy link
Member

betatim commented May 22, 2023

Thanks for the additional exception. Merged.

@glemaitre
Copy link
Member

Nice. Thanks @Charlie-XIAO

@Charlie-XIAO Charlie-XIAO deleted the pr-vis-enh branch September 23, 2023 13:28
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
…arn#26019)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@e-pet
Copy link

e-pet commented Nov 26, 2024

I am confused - what exactly does this line represent? The right-most point is the always-positive classifier as I understand it, which is indeed the baseline to beat in PR analysis - but it is not in any way random (and thus 'chance level' seems misleading in this context)? And how / why is the plotted horizontal extension of this point meaningful? Kull and Flach argue (see link above) that the comparison line to plot (analogous to the diagonal in the ROC curve) would be the F1 isometric curve associated with this classifier, which is hyperbolic and can also easily be computed:

  baserate = y_true.sum() / len(y_true)
  F1_baseline = 2 * (baserate * 1) / (baserate + 1)
  recall = np.arange(F1_baseline / (2 - F1_baseline), 1+1e-7, 0.05)  
  prec_baseline = F1_baseline * recall / (2 * recall - F1_baseline)
  plt.plot(recall, prec_baseline, 'k--', label="$F_1$ baseline")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:metrics Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants