-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] Make it possible to pass an arbitrary classifier as method for CalibratedClassifierCV #22010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Note that I had to update the settings of the existing tests listed below to make them pass with this new method.
Other tests pass with this new method without changes. I don't think this is a problem specific to this new method as these tests were quite arbitrary in the first place both in the choice of the 10% margin and in the requirement that the brier score should decrease. For example, the existing test |
To avoid having to increase the dataset sizes in the test, maybe you can try with a more regularized model: gbdt_calibrator = HistGradientBoostingClassifier(monotonic_cst=[1], max_leaf_nodes=5, max_iter=10) Another advantage would be to reduce the test duration. if that does not work, then it's fine to increase the test data size as you did. EDIT: alternatively you can use Then |
# If binary classification, only return proba of the positive class | ||
if probas.shape[1] == 2: | ||
return probas[:, 1] | ||
return probas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a new test to cover for the multiclass case, for instance using LogisticRegression(C=1e12)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would require unplugging the One-vs-Rest reduction logic typically used for sigmoid and isotonic calibration.
|
||
# Discard samples with null weights | ||
mask = sample_weight > 0 | ||
X, y, sample_weight = X[mask], y[mask], sample_weight[mask] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? I would just pass the weights unchanged to the underlying estimator.
""" | ||
|
||
def __init__(self, method): | ||
self.method = clone(method) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather move the call to clone
to the fit
method to be consistent with other meta-estimators in scikit-learn (even though this one is not meant to be directly used by scikit-learn users).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rename method
to estimator
and then in fit do:
self.estimator_ = clone(self.estimator)
and then fit that instead.
Could you please add a new test that checks the approximate equivalence of fitting with |
@aperezlebel Once you addressed @ogrisel comments, you can ping me for a review. |
Reference Issues/PRs
Fixes #21280
What does this implement/fix? Explain your changes.
sigmoid
andisotonic
.Any other comments?
This PR replaces #21992 as issue #21280 was updated.