Skip to content

Make it possible to pass an arbitrary probablistic classifier as method for CalibratedClassifierCV #21280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ogrisel opened this issue Oct 8, 2021 · 4 comments · May be fixed by #22010
Open

Comments

@ogrisel
Copy link
Member

ogrisel commented Oct 8, 2021

Describe the workflow you want to enable

In addition to method="sigmoid" and method="isotonic" it would be great to pass any scikit-learn compatible classifier with a predict_proba to CalibratedClassifierCV.

In particular I would like to be able to pass: method=HistGradientBoostingClassifier(monotonic_cst=[1]) (or #13649) to calibrate a model using a non-parametric method (similar to isotonic regression) but with an adjustable overfitting/underfitting tradeoff by setting max_leaf_nodes and max_iter (which is not possible with isotonic calibration).

Describe your proposed solution

The sklearn.calibration._fit_calibrator private helper would need to accept and clone scikit-learn regressors instead of raising ValueError.

Of course one would need to add new tests + extend at least one of the calibration example to demonstrate this new capability.

Edit: the first version of this issue used HistGradientBoostingRegressor instead of HistGradientBoostingClassifier

@aperezlebel
Copy link
Contributor

take

@ogrisel ogrisel changed the title Make it possible to pass an arbitrary regressor as method for CalibratedClassifierCV Make it possible to pass an arbitrary probablistic classifier as method for CalibratedClassifierCV Dec 16, 2021
@ogrisel
Copy link
Member Author

ogrisel commented Dec 16, 2021

Which means that the method="sigmoid" should mathematically equivalent to method=LogisticRegression(C=None) up to numerical rounding errors.

@ogrisel
Copy link
Member Author

ogrisel commented Dec 16, 2021

We could even try: method=BaggingClassifier(HistGradientBoostingClassifier(monotonic_cst=[1]), n_estimators=100).

@ogrisel
Copy link
Member Author

ogrisel commented Dec 17, 2021

As discussed yesterday IRL with @A-pl, another motivation for this PR is that it would allow for a more natural an principled way to calibrate multiclass classifers with method=LogisticRegression(C=None) which can natural minimize the validation negative loglikelihood with a multinomial target variable while the current code with method="sigmoid" is using a hackish one-vs-rest + renormalize strategy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment