-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MNT Adds CurveDisplayMixin _get_response_values #20999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@rth @thomasjpfan @ogrisel Here comes the PR that should refactor the code of the WDYT? |
If you need to be convinced regarding where these two functions will be used, you can have a quick look at the older PR: #18589 |
I added 2 examples of where the method will be used for the display. Be aware that the main point of moving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the scope of this helper should be clarified in its docstring and in the tests.
Maybe we might instead even want to introduce several helper methods specialized for different kinds of estimators (binary classifiers, multiclass, multilabel classifiers and regressors) and call specific helpers in different contexts with different expectations / needs.
raise ValueError(f"{estimator.__class__.__name__} should be a classifier") | ||
y_pred, pos_label = estimator.predict(X), None | ||
|
||
return y_pred, pos_label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for multiclass or multilabel classifiers with response_method in {'predict_proba', 'decision_function'}
?
What happens for multitarget regressors?
We should at least raise an informative error message if this helper method is called in an unexpected context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the BinaryClassifierCurveDisplayMixin
refactor is making this PR quite big. What do you think of breaking this PR into two smaller ones?
- Add
_get_response_values
and_check_estimator_target
and update places that can use it. - Follow up PR for
BinaryClassifierCurveDisplayMixin
.
I will try to do that in a new PR. I will keep this one as is to facilitate the rebasing later. |
Let's move that to 1.2 |
close in favor of #25969 |
Supersede #18212
Supersede #18589
closes #18589
This is a new PR that bring back to life #18589. Too much diff has been created since, so it is better to restart fresh.
In a subsequent PRs, I will introduce:
sklearn/metrics/_plot/base.py
_check_response_method
in the following classes/functions:plot_partial_dependence
/PartialDependenceDisplay
_get_response
in the following classes/functions:plot_precision_recall_curve
/PrecisionRecallDisplay
,plot_roc_curve
/RocCurveDisplay
and most probably theCalibrationDisplay
._get_response
will be used in the scorer API.Previous summary in #18589
Refactor the scorer such that they make use of
_get_response
already define in the plotting function.This refactoring can also be beneficial for #16525.
Summary of what was done:
_check_response_method
. Its job is to return the method of a classifier or a regressor to later predict. If the method does not exist, it raises an error. This function was already existing indeed._get_response
. A function that returns the prediction depending on the response method. We take into account thepos_label
. Thus, it will allow to not make any mistake in the future by forgetting to inverse the decision function or select the right column of probabilities for binary classification. We hard-coded this behaviour in a lot of different places and this function reduces the amount of redundant code.