MAINT Refactor: use `_get_response_values` in `CalibratedClassifierCV` #27796

lucyleeow · 2023-11-17T05:00:30Z

What does this implement/fix? Explain your changes.

Refactors CalibratedClassifierCV to use _get_response_values, removing 2 similar functions in calibration.py.

Realised when working on #27700 that _get_response_values could also be used in CalibratedClassifierCV

Any other comments?

I think pos_label in _get_response_values for binary cases is not needed to ensure order as we are always passing self.classes around.

@glemaitre you may have suggestion for improvements, I am happy to change.

github-actions · 2023-11-17T05:01:43Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: fe00537. Link to the linter CI: here}

glemaitre

It makes sense to me.

It would not bother adding a function for the reshaping but directly add the if statement in the code.

glemaitre · 2023-11-17T10:26:14Z

sklearn/calibration.py

@@ -788,9 +743,8 @@ def predict_proba(self, X):
        proba : array, shape (n_samples, n_classes)
            The predicted probabilities. Can be exact zeros.
        """
+        predictions = _get_response_and_reshape(self.estimator, X)


I assume that I would remove the function and directly call _get_response_values and just add

if predictions.ndim == 1: predictions = prediction[:, np.newaxis]

Apparently using .ndim will also solve the last failure. It seems that we have 1D array in other cases than binary for some reason.

So that last failure (test_setting_request_on_sub_estimator_removes_error) fails because we use ConsumingClassifier, which will always output shape (n_samples,) (no matter the target type of y):

scikit-learn/sklearn/tests/metadata_routing_common.py

Lines 251 to 255 in 24086ae

def decision_function(self, X, sample_weight="default", metadata="default"):

record_metadata_not_default(

self, "predict_proba", sample_weight=sample_weight, metadata=metadata

)

return np.zeros(shape=(len(X),))

I think this isn't a problem for any other metaestimator because only CalibratedClassifierCV will call a 'predict' method during fit. Do you think we need to do anything about this @glemaitre ?

How do you decide between reshape and newaxis ? Having a quick look at: https://stackoverflow.com/questions/28385666/numpy-use-reshape-or-newaxis-to-add-dimensions I would lean towards reshape?

I find np.newaxis more explicit that the -1 and 1 meaning that might not be trivial to understand at first. However, if you are more comfortable with reshape, go with it. This is just a cosmetic and personal preference.

glemaitre · 2023-11-17T10:26:44Z

sklearn/calibration.py

-    n_classes = len(classes)
-    pred_method, method_name = _get_prediction_method(estimator)
-    predictions = _compute_predictions(pred_method, method_name, X_test, n_classes)
+    predictions = _get_response_and_reshape(estimator, X_test)


Here as well, the if statement is quite minimal.

lucyleeow · 2023-11-22T10:34:49Z

So codecov highlights that line 369 in calibration.py is not really tested. This is because test_calibration_prefit uses MultinomialNB which only has a predict_proba and always returns predictions with ndim > 1. I am happy to try and improve coverage here, even though this PR shouldn't have really changed it, or I can leave for a separate PR.

glemaitre

I checked codecov and the line that are reported partially covered. I run the test and this is only a false positive.

glemaitre · 2023-11-22T10:34:43Z

sklearn/calibration.py

+                # Reshape in the binary case
+                if len(self.classes_) == 2:


Suggested change

# Reshape in the binary case

if len(self.classes_) == 2:

if len(self.classes_) == 2:

# reshape from (n_samples,) to (n_samples, 1) for binary case

Hmm actually maybe I should use predictions.ndim here for consistency.

sklearn/calibration.py

glemaitre · 2023-11-22T10:35:37Z

sklearn/calibration.py

+        # Reshape binary output from `(n_samples,)` to `(n_samples, 1)`
+        if predictions.ndim == 1:


Suggested change

# Reshape binary output from `(n_samples,)` to `(n_samples, 1)`

if predictions.ndim == 1:

if predictions.ndim == 1:

# Reshape binary output from `(n_samples,)` to `(n_samples, 1)`

glemaitre · 2023-11-22T10:38:31Z

So codecov highlights that line 369 in calibration.py is not really tested.

This is weird because with a `print is see that the following tests pass by this line:

test_calibration_prefit
test_calibration_dict_pipeline
test_calibration_attributes
test_calibration_votingclassifier

lucyleeow · 2023-11-22T23:48:34Z

~~Really? I tried with a print and I didn't get any tests in test_calibration.py?~~

Ah nevermind, i tested wrong. Will ignore then!

glemaitre

LGTM on my side. Thanks @lucyleeow

glemaitre · 2023-11-23T09:38:00Z

Uhm actually there is an error that should be linked to the shape where we are missing a reshape. Since the tests were passing before, I would check the last commit done.

lucyleeow · 2023-11-24T04:58:42Z

So when we use cross_val_predict to get the predictions, we can't use if predictions.ndim == 1 because predict_proba will give ndim = 2. Reverted this one back to if len(self.classes_) == 2:

glemaitre · 2023-11-24T10:56:37Z

So when we use cross_val_predict to get the predictions, we can't use if predictions.ndim == 1 because predict_proba will give ndim = 2. Reverted this one back to if len(self.classes_) == 2:

OK this make sense. This is a viable solution then.

LGTM to me. Adding a flag to get a second review.

thomasjpfan

LGTM

lucyleeow · 2023-11-24T22:57:13Z

Thanks for the review!

refactor calbclass to use _get_response_values

577d0fd

lucyleeow added Refactor Code refactor No Changelog Needed labels Nov 17, 2023

lucyleeow added 2 commits November 17, 2023 16:14

black

a23cb40

ruff

35413ff

glemaitre self-requested a review November 17, 2023 10:21

glemaitre reviewed Nov 17, 2023

View reviewed changes

lucyleeow added 2 commits November 22, 2023 13:53

review

13fdb88

typo

3c9a6ed

glemaitre self-requested a review November 22, 2023 10:29

glemaitre reviewed Nov 22, 2023

View reviewed changes

review

0aa417c

glemaitre approved these changes Nov 23, 2023

View reviewed changes

lucyleeow added the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 23, 2023

use len classes in crossvalpred

fe00537

thomasjpfan approved these changes Nov 24, 2023

View reviewed changes

thomasjpfan merged commit 3287570 into scikit-learn:main Nov 24, 2023

lucyleeow deleted the refact_calbclass branch November 24, 2023 22:57

	def decision_function(self, X, sample_weight="default", metadata="default"):
	record_metadata_not_default(
	self, "predict_proba", sample_weight=sample_weight, metadata=metadata
	)
	return np.zeros(shape=(len(X),))

		# Reshape binary output from `(n_samples,)` to `(n_samples, 1)`
		if predictions.ndim == 1:

Uh oh!

MAINT Refactor: use _get_response_values in CalibratedClassifierCV #27796

MAINT Refactor: use _get_response_values in CalibratedClassifierCV #27796

Uh oh!

Conversation

lucyleeow commented Nov 17, 2023

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Nov 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented Nov 22, 2023

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Nov 22, 2023

Uh oh!

lucyleeow commented Nov 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Nov 23, 2023

Uh oh!

lucyleeow commented Nov 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Nov 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented Nov 24, 2023

Uh oh!

Uh oh!

MAINT Refactor: use `_get_response_values` in `CalibratedClassifierCV` #27796

MAINT Refactor: use `_get_response_values` in `CalibratedClassifierCV` #27796

github-actions bot commented Nov 17, 2023 •

edited

Loading

lucyleeow commented Nov 22, 2023 •

edited

Loading

lucyleeow commented Nov 24, 2023 •

edited

Loading

glemaitre commented Nov 24, 2023 •

edited

Loading