-
-
Notifications
You must be signed in to change notification settings - Fork 26k
FIX Add error when LeaveOneOut
used in CalibratedClassifierCV
#29545
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LeaveOneOut
used in CalibratedClassifierCV
assert np.all(proba[:, :i] > 0) | ||
assert np.all(proba[:, i + 1 :] > 0) | ||
else: | ||
# Check `proba` are all 1/n_classes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note proba
was 1/n_classes
here because the original test data was unique (consisted of 10 samples belonging to 10 classes) and this was not really related to train subset not containing all classes.
I think the estimator ended up being overfit and the calibrator did not respond well to low predict values, calibrating them all to the same value. Note proba was the same value even before normalization of the probabilities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we even have to check ensemble=False
as we use cross_val_predict
to get the predictions to use to calibrate and at predict, only one estimator is fit using all the data.
sklearn/calibration.py
Outdated
if isinstance(self.cv, LeaveOneOut): | ||
raise ValueError( | ||
"LeaveOneOut cross-validation does not allow" | ||
"all classes to be present in test splits." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be explicit by asking people to use an alternative cross-validation strategies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amended. Hopefully not too long now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nop this is good.
@@ -441,27 +456,30 @@ def test_calibration_prob_sum(ensemble): | |||
def test_calibration_less_classes(ensemble): | |||
# Test to check calibration works fine when train set in a test-train |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a use case where I'm really wondering if this is valid :). But this is here so let's go with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the scope of the PR, I'm happy to already have this in the codebase. I open a related PR regarding the cross-validation and the fact that we can get a subset of classes in training. I'm not sure that this is something that we should allow but it requires much more discussions and I might overlook some aspects.
I'll add this PR in the milestone for 1.5.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reference Issues/PRs
closes #29000
What does this implement/fix? Explain your changes.
LeaveOneOut
used inCalibratedClassifierCV
LeaveOneOut
Any other comments?