Skip to content

MNT Adds CurveDisplayMixin _get_response_values #20999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 22 commits into from

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Sep 9, 2021

Supersede #18212
Supersede #18589
closes #18589

This is a new PR that bring back to life #18589. Too much diff has been created since, so it is better to restart fresh.

In a subsequent PRs, I will introduce:

  • remove the file sklearn/metrics/_plot/base.py
  • _check_response_method in the following classes/functions: plot_partial_dependence/PartialDependenceDisplay
  • _get_response in the following classes/functions: plot_precision_recall_curve/PrecisionRecallDisplay, plot_roc_curve/RocCurveDisplay and most probably the CalibrationDisplay.
  • Finally, _get_response will be used in the scorer API.

Previous summary in #18589

Refactor the scorer such that they make use of _get_response already define in the plotting function.
This refactoring can also be beneficial for #16525.

Summary of what was done:

  • Create a _check_response_method. Its job is to return the method of a classifier or a regressor to later predict. If the method does not exist, it raises an error. This function was already existing indeed.
  • Create a _get_response. A function that returns the prediction depending on the response method. We take into account the pos_label. Thus, it will allow to not make any mistake in the future by forgetting to inverse the decision function or select the right column of probabilities for binary classification. We hard-coded this behaviour in a lot of different places and this function reduces the amount of redundant code.
  • The rest of the code is just to replace the pre-existing code and use these 2 new functions.
  • And as a bonus, several units tests that are directly testing the 2 functions.

@glemaitre
Copy link
Member Author

@rth @thomasjpfan @ogrisel Here comes the PR that should refactor the code of the _get_response. For the moment I did not find and replace where is used to only focus on the tools. Indeed, there is nothing different from the original PR but I am thinking that it might be easier to review first this part, and then I could open a subsequent PR to find and replace the places where to use these tools.

WDYT?

@glemaitre
Copy link
Member Author

If you need to be convinced regarding where these two functions will be used, you can have a quick look at the older PR: #18589

@glemaitre
Copy link
Member Author

glemaitre commented Sep 10, 2021

I added 2 examples of where the method will be used for the display. Be aware that the main point of moving _get_response outside of the _plot module is that I will be able to use it in the scorer. The second advantage is that we will make sure that we have a proper handling of the pos_label.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the scope of this helper should be clarified in its docstring and in the tests.

Maybe we might instead even want to introduce several helper methods specialized for different kinds of estimators (binary classifiers, multiclass, multilabel classifiers and regressors) and call specific helpers in different contexts with different expectations / needs.

raise ValueError(f"{estimator.__class__.__name__} should be a classifier")
y_pred, pos_label = estimator.predict(X), None

return y_pred, pos_label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens for multiclass or multilabel classifiers with response_method in {'predict_proba', 'decision_function'}?

What happens for multitarget regressors?

We should at least raise an informative error message if this helper method is called in an unexpected context.

@glemaitre glemaitre changed the title MNT Refactor scorer using _get_response MNT Refactor scorer using _get_response_values Oct 26, 2021
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the BinaryClassifierCurveDisplayMixin refactor is making this PR quite big. What do you think of breaking this PR into two smaller ones?

  1. Add _get_response_values and _check_estimator_target and update places that can use it.
  2. Follow up PR for BinaryClassifierCurveDisplayMixin.

@glemaitre
Copy link
Member Author

What do you think of breaking this PR into two smaller ones?

I will try to do that in a new PR. I will keep this one as is to facilitate the rebasing later.

@jeremiedbb jeremiedbb added this to the 1.1 milestone Mar 25, 2022
@thomasjpfan thomasjpfan changed the title MNT Refactor scorer using _get_response_values MNT Adds CurveDisplayMixin _get_response_values Apr 7, 2022
@jeremiedbb
Copy link
Member

Let's move that to 1.2

@glemaitre
Copy link
Member Author

glemaitre commented Apr 1, 2023

close in favor of #25969

@glemaitre glemaitre closed this Apr 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants