MNT Adds CurveDisplayMixin _get_response_values #20999

glemaitre · 2021-09-09T16:13:08Z

Supersede #18212
Supersede #18589
closes #18589

This is a new PR that bring back to life #18589. Too much diff has been created since, so it is better to restart fresh.

In a subsequent PRs, I will introduce:

remove the file sklearn/metrics/_plot/base.py
_check_response_method in the following classes/functions: plot_partial_dependence/PartialDependenceDisplay
_get_response in the following classes/functions: plot_precision_recall_curve/PrecisionRecallDisplay, plot_roc_curve/RocCurveDisplay and most probably the CalibrationDisplay.
Finally, _get_response will be used in the scorer API.

Previous summary in #18589

Refactor the scorer such that they make use of _get_response already define in the plotting function.
This refactoring can also be beneficial for #16525.

Summary of what was done:

Create a _check_response_method. Its job is to return the method of a classifier or a regressor to later predict. If the method does not exist, it raises an error. This function was already existing indeed.
Create a _get_response. A function that returns the prediction depending on the response method. We take into account the pos_label. Thus, it will allow to not make any mistake in the future by forgetting to inverse the decision function or select the right column of probabilities for binary classification. We hard-coded this behaviour in a lot of different places and this function reduces the amount of redundant code.
The rest of the code is just to replace the pre-existing code and use these 2 new functions.
And as a bonus, several units tests that are directly testing the 2 functions.

glemaitre · 2021-09-09T16:48:30Z

@rth @thomasjpfan @ogrisel Here comes the PR that should refactor the code of the _get_response. For the moment I did not find and replace where is used to only focus on the tools. Indeed, there is nothing different from the original PR but I am thinking that it might be easier to review first this part, and then I could open a subsequent PR to find and replace the places where to use these tools.

WDYT?

glemaitre · 2021-09-09T16:53:44Z

If you need to be convinced regarding where these two functions will be used, you can have a quick look at the older PR: #18589

glemaitre · 2021-09-10T08:23:25Z

I added 2 examples of where the method will be used for the display. Be aware that the main point of moving _get_response outside of the _plot module is that I will be able to use it in the scorer. The second advantage is that we will make sure that we have a proper handling of the pos_label.

ogrisel

I think the scope of this helper should be clarified in its docstring and in the tests.

Maybe we might instead even want to introduce several helper methods specialized for different kinds of estimators (binary classifiers, multiclass, multilabel classifiers and regressors) and call specific helpers in different contexts with different expectations / needs.

sklearn/metrics/_plot/base.py

sklearn/utils/__init__.py

ogrisel · 2021-09-16T12:50:57Z

sklearn/utils/__init__.py

+            raise ValueError(f"{estimator.__class__.__name__} should be a classifier")
+        y_pred, pos_label = estimator.predict(X), None
+
+    return y_pred, pos_label


What happens for multiclass or multilabel classifiers with response_method in {'predict_proba', 'decision_function'}?

What happens for multitarget regressors?

We should at least raise an informative error message if this helper method is called in an unexpected context.

sklearn/metrics/_plot/det_curve.py

thomasjpfan

I think the BinaryClassifierCurveDisplayMixin refactor is making this PR quite big. What do you think of breaking this PR into two smaller ones?

Add _get_response_values and _check_estimator_target and update places that can use it.
Follow up PR for BinaryClassifierCurveDisplayMixin.

glemaitre · 2021-11-02T10:11:05Z

What do you think of breaking this PR into two smaller ones?

I will try to do that in a new PR. I will keep this one as is to facilitate the rebasing later.

jeremiedbb · 2022-04-19T13:03:15Z

Let's move that to 1.2

glemaitre · 2023-04-01T17:06:48Z

close in favor of #25969

MNT refactor code to use _get_response

88e71bc

github-actions bot added the module:utils label Sep 9, 2021

glemaitre added 3 commits September 9, 2021 18:22

reduce diff

b3b2ad6

Change the type of error and message

0a9d056

add _get_response and test

a7e44e0

glemaitre added the No Changelog Needed label Sep 9, 2021

Add information regarding the raised errors

da12422

glemaitre added 2 commits September 10, 2021 10:14

EXA add example of _check_response_method usage

92ed7af

EXA add example in DET curve

debcb25

glemaitre added 6 commits September 10, 2021 11:43

refactor plot function

372f6b1

iter

107bd1a

iter

3c140aa

iter

34d1eee

iter

6caa49c

iter

d274ffa

This was referenced Sep 13, 2021

Inconsistence between CalibrationDisplay and over Display #21027

Closed

TST add unit tests for current _get_response #21041

Merged

ogrisel reviewed Sep 16, 2021

View reviewed changes

sklearn/metrics/_plot/det_curve.py Outdated Show resolved Hide resolved

glemaitre added 6 commits October 26, 2021 14:42

Merge remote-tracking branch 'origin/main' into is/18212_again

e2d54f3

TST fix calibration test

6f17901

iter

35f020f

iter

24e7bf5

rename base class to mixin

e1af257

add doc + renaming

3c507c0

glemaitre changed the title ~~MNT Refactor scorer using _get_response~~ MNT Refactor scorer using _get_response_values Oct 26, 2021

iter

4d9366b

glemaitre added 2 commits October 27, 2021 14:04

Merge remote-tracking branch 'origin/main' into is/18212_again

c5a2838

iter

69266c8

thomasjpfan reviewed Nov 1, 2021

View reviewed changes

glemaitre mentioned this pull request Nov 3, 2021

MNT refactor _get_response_values #21538

Closed

jeremiedbb added this to the 1.1 milestone Mar 25, 2022

thomasjpfan changed the title ~~MNT Refactor scorer using _get_response_values~~ MNT Adds CurveDisplayMixin _get_response_values Apr 7, 2022

jeremiedbb modified the milestones: 1.1, 1.2 Apr 19, 2022

lorentzenchr mentioned this pull request Oct 24, 2022

MNT Refactor scorer using _get_response #18589

Closed

glemaitre modified the milestones: 1.2, 1.3 Nov 16, 2022

glemaitre mentioned this pull request Mar 24, 2023

MAINT Introduce BinaryClassifierCurveDisplayMixin #25969

Merged

2 tasks

glemaitre closed this Apr 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNT Adds CurveDisplayMixin _get_response_values #20999

MNT Adds CurveDisplayMixin _get_response_values #20999

glemaitre commented Sep 9, 2021 •

edited

Loading

glemaitre commented Sep 9, 2021

glemaitre commented Sep 9, 2021

glemaitre commented Sep 10, 2021 •

edited

Loading

ogrisel left a comment •

edited

Loading

ogrisel Sep 16, 2021

thomasjpfan left a comment

glemaitre commented Nov 2, 2021

jeremiedbb commented Apr 19, 2022

glemaitre commented Apr 1, 2023 •

edited

Loading

MNT Adds CurveDisplayMixin _get_response_values #20999

MNT Adds CurveDisplayMixin _get_response_values #20999

Conversation

glemaitre commented Sep 9, 2021 • edited Loading

glemaitre commented Sep 9, 2021

glemaitre commented Sep 9, 2021

glemaitre commented Sep 10, 2021 • edited Loading

ogrisel left a comment • edited Loading

Choose a reason for hiding this comment

ogrisel Sep 16, 2021

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

glemaitre commented Nov 2, 2021

jeremiedbb commented Apr 19, 2022

glemaitre commented Apr 1, 2023 • edited Loading

glemaitre commented Sep 9, 2021 •

edited

Loading

glemaitre commented Sep 10, 2021 •

edited

Loading

ogrisel left a comment •

edited

Loading

glemaitre commented Apr 1, 2023 •

edited

Loading