[WIP] Make it possible to pass an arbitrary classifier as method for CalibratedClassifierCV #22010

aperezlebel · 2021-12-17T14:31:25Z

Reference Issues/PRs

Fixes #21280

What does this implement/fix? Explain your changes.

Implement a third calibration method using a custom classifier in CalibratedClassifierCV.
Update existing tests with this new method.
Create new tests specific to this method.
Extend at least one calibration example to demonstrate this new capability.
Create a new example highlighting the advantages of this method compared to sigmoid and isotonic.
Update the Probability calibration page in the user guide with a paragraph on this method.

Any other comments?

This PR replaces #21992 as issue #21280 was updated.

aperezlebel · 2021-12-17T15:04:52Z

Note that I had to update the settings of the existing tests listed below to make them pass with this new method.

tests/test_calibration.py::test_calibration: I increased the number of total samples from 200 to 250, and the number of training samples from 100 to 200. Otherwise, the brier score was not decreasing compared to the uncalibrated one as required by this test.
tests/test_calibration.py::test_calibration_multiclass: I increased the number of total samples from 500 to 750. Otherwise the brier score was not in the 10% range of the uncalibrated one as required by this test.

Other tests pass with this new method without changes.

I don't think this is a problem specific to this new method as these tests were quite arbitrary in the first place both in the choice of the 10% margin and in the requirement that the brier score should decrease. For example, the existing test test_calibration_multiclass fails on the isotonic and sigmoid methods when increasing the number of samples from 500 to 1000. Similarly, test_calibration fails on isotonic and sigmoid when decreasing the number of training samples from 100 to 50 and the total number of samples from 200 to 75.

ogrisel · 2022-01-05T15:10:56Z

To avoid having to increase the dataset sizes in the test, maybe you can try with a more regularized model:

gbdt_calibrator = HistGradientBoostingClassifier(monotonic_cst=[1], max_leaf_nodes=5, max_iter=10)

Another advantage would be to reduce the test duration.

if that does not work, then it's fine to increase the test data size as you did.

EDIT: alternatively you can use LogisticRegression(C=1e12) instead of HistGradientBoostingClassifier(monotonic_cst=[1]) everywhere in the tests. Since this should be equivalent to sigmoid calibration, the existing test should all pass unchanged.

Then HistGradientBoostingClassifier(monotonic_cst=[1], max_leaf_nodes=5, max_iter=10) would only be used in the new example.

ogrisel · 2022-01-05T15:25:47Z

sklearn/calibration.py

+        # If binary classification, only return proba of the positive class
+        if probas.shape[1] == 2:
+            return probas[:, 1]
+        return probas


We need a new test to cover for the multiclass case, for instance using LogisticRegression(C=1e12).

That would require unplugging the One-vs-Rest reduction logic typically used for sigmoid and isotonic calibration.

ogrisel · 2022-01-05T15:27:00Z

sklearn/calibration.py

+
+        # Discard samples with null weights
+        mask = sample_weight > 0
+        X, y, sample_weight = X[mask], y[mask], sample_weight[mask]


Why do we need this? I would just pass the weights unchanged to the underlying estimator.

ogrisel · 2022-01-05T15:28:36Z

sklearn/calibration.py

+    """
+
+    def __init__(self, method):
+        self.method = clone(method)


I would rather move the call to clone to the fit method to be consistent with other meta-estimators in scikit-learn (even though this one is not meant to be directly used by scikit-learn users).

I would rename method to estimator and then in fit do:

self.estimator_ = clone(self.estimator)

and then fit that instead.

ogrisel · 2022-01-05T15:36:24Z

Could you please add a new test that checks the approximate equivalence of fitting with method="sigmoid" and method=LogisticRegression(C=1e12) by inspecting that the a_ and b_ attributes of the_SigmoidCalibration calibrator match the coef_ and intercept_ attributes of the estimator_ attribute of the fitted _CustomCalibration instance.

ogrisel · 2022-01-13T11:29:16Z

Another PR to keep in mind: #17541. If #17541 gets merged before #22010, then it will be nice to adapt #22010 accordingly.

glemaitre · 2023-01-11T12:57:49Z

@aperezlebel Once you addressed @ogrisel comments, you can ping me for a review.

Add custom classifier support in CalibratedClassifierCV and update tests

ac48aa6

aperezlebel mentioned this pull request Dec 17, 2021

[WIP] Make it possible to pass an arbitrary regressor as method for CalibratedClassifierCV #21992

Closed

7 tasks

ogrisel reviewed Jan 5, 2022

View reviewed changes

ogrisel added the module:calibration label Jan 24, 2022

glemaitre self-requested a review November 8, 2022 14:46

glemaitre removed their request for review January 11, 2023 12:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Make it possible to pass an arbitrary classifier as method for CalibratedClassifierCV #22010

[WIP] Make it possible to pass an arbitrary classifier as method for CalibratedClassifierCV #22010

aperezlebel commented Dec 17, 2021

aperezlebel commented Dec 17, 2021

ogrisel commented Jan 5, 2022 •

edited

Loading

ogrisel Jan 5, 2022

ogrisel Jan 5, 2022

ogrisel Jan 5, 2022

ogrisel Jan 5, 2022

ogrisel Jan 5, 2022

ogrisel commented Jan 5, 2022

ogrisel commented Jan 13, 2022

glemaitre commented Jan 11, 2023

[WIP] Make it possible to pass an arbitrary classifier as method for CalibratedClassifierCV #22010

Are you sure you want to change the base?

[WIP] Make it possible to pass an arbitrary classifier as method for CalibratedClassifierCV #22010

Conversation

aperezlebel commented Dec 17, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

aperezlebel commented Dec 17, 2021

ogrisel commented Jan 5, 2022 • edited Loading

ogrisel Jan 5, 2022

Choose a reason for hiding this comment

ogrisel Jan 5, 2022

Choose a reason for hiding this comment

ogrisel Jan 5, 2022

Choose a reason for hiding this comment

ogrisel Jan 5, 2022

Choose a reason for hiding this comment

ogrisel Jan 5, 2022

Choose a reason for hiding this comment

ogrisel commented Jan 5, 2022

ogrisel commented Jan 13, 2022

glemaitre commented Jan 11, 2023

ogrisel commented Jan 5, 2022 •

edited

Loading