-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT Refactor: use _get_response_values
in CalibratedClassifierCV
#27796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to me.
It would not bother adding a function for the reshaping but directly add the if
statement in the code.
sklearn/calibration.py
Outdated
@@ -788,9 +743,8 @@ def predict_proba(self, X): | |||
proba : array, shape (n_samples, n_classes) | |||
The predicted probabilities. Can be exact zeros. | |||
""" | |||
predictions = _get_response_and_reshape(self.estimator, X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that I would remove the function and directly call _get_response_values
and just add
if predictions.ndim == 1:
predictions = prediction[:, np.newaxis]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently using .ndim
will also solve the last failure. It seems that we have 1D array in other cases than binary for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that last failure (test_setting_request_on_sub_estimator_removes_error
) fails because we use ConsumingClassifier
, which will always output shape (n_samples,)
(no matter the target type of y
):
scikit-learn/sklearn/tests/metadata_routing_common.py
Lines 251 to 255 in 24086ae
def decision_function(self, X, sample_weight="default", metadata="default"): | |
record_metadata_not_default( | |
self, "predict_proba", sample_weight=sample_weight, metadata=metadata | |
) | |
return np.zeros(shape=(len(X),)) |
I think this isn't a problem for any other metaestimator because only CalibratedClassifierCV
will call a 'predict' method during fit
. Do you think we need to do anything about this @glemaitre ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you decide between reshape
and newaxis
? Having a quick look at: https://stackoverflow.com/questions/28385666/numpy-use-reshape-or-newaxis-to-add-dimensions I would lean towards reshape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find np.newaxis
more explicit that the -1
and 1
meaning that might not be trivial to understand at first. However, if you are more comfortable with reshape
, go with it. This is just a cosmetic and personal preference.
sklearn/calibration.py
Outdated
n_classes = len(classes) | ||
pred_method, method_name = _get_prediction_method(estimator) | ||
predictions = _compute_predictions(pred_method, method_name, X_test, n_classes) | ||
predictions = _get_response_and_reshape(estimator, X_test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well, the if statement is quite minimal.
So codecov highlights that line 369 in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked codecov and the line that are reported partially covered. I run the test and this is only a false positive.
sklearn/calibration.py
Outdated
# Reshape in the binary case | ||
if len(self.classes_) == 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Reshape in the binary case | |
if len(self.classes_) == 2: | |
if len(self.classes_) == 2: | |
# reshape from (n_samples,) to (n_samples, 1) for binary case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm actually maybe I should use predictions.ndim
here for consistency.
sklearn/calibration.py
Outdated
# Reshape binary output from `(n_samples,)` to `(n_samples, 1)` | ||
if predictions.ndim == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Reshape binary output from `(n_samples,)` to `(n_samples, 1)` | |
if predictions.ndim == 1: | |
if predictions.ndim == 1: | |
# Reshape binary output from `(n_samples,)` to `(n_samples, 1)` |
This is weird because with a `print is see that the following tests pass by this line:
|
Ah nevermind, i tested wrong. Will ignore then! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on my side. Thanks @lucyleeow
Uhm actually there is an error that should be linked to the shape where we are missing a |
So when we use |
OK this make sense. This is a viable solution then. LGTM to me. Adding a flag to get a second review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for the review! |
What does this implement/fix? Explain your changes.
Refactors
CalibratedClassifierCV
to use_get_response_values
, removing 2 similar functions incalibration.py
.Realised when working on #27700 that
_get_response_values
could also be used inCalibratedClassifierCV
Any other comments?
I think
pos_label
in_get_response_values
for binary cases is not needed to ensure order as we are always passingself.classes
around.@glemaitre you may have suggestion for improvements, I am happy to change.