CalibratedClassifierCV should return an estimator calibrated via CV, not an ensemble of calibrated estimators

The current cross validation procedure adopted in the `CalibratedClassifierCV` does not follow the cross validation procedure described in the original Platt paper:
[Platt99] Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods, J. Platt, (1999)

I checked also the other papers cited in the references for the `CalibratedClassifierCV` class and none of them describes the cross validation process it implements.

`CalibratedClassifierCV` currently fits and calibrates an estimator for each fold (calibration is performed on the test part of the fold).
All the estimators fit at each fold are kept in a list.
At prediction time, every estimator makes a prediction and the average of the returned values is the final prediction.

The estimator produced by `CalibratedClassifierCV` is thus an ensemble and not a single estimator calibrated on the whole training set via CV.
When using cross validation the original `base_estimator` is not used to make the prediction.

Platt99, describes a cross validation procedure that fits an estimator on each fold and the predictions for the test fold are saved. 
Then the predictions from all the folds are concatenated in a single list, and calibration parameters for the `base_estimator` are determined using such list.

Cross validation should be only a mean to calibrate the `base_estimator` on the same data it has been fit, not to fit a different estimator.

The procedure described in Platt99 is what one would expect from a proper application of a cross validation procedure, as the cross validation only determines the parameters of the calibration and does not fit the estimator.
It is also more efficient, as is does not store the estimators for each fold and requires a single predict at prediction time.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CalibratedClassifierCV should return an estimator calibrated via CV, not an ensemble of calibrated estimators #16145

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

CalibratedClassifierCV should return an estimator calibrated via CV, not an ensemble of calibrated estimators #16145

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions