Skip to content

CalibratedClassifierCV should return an estimator calibrated via CV, not an ensemble of calibrated estimators #16145

Closed
@aesuli

Description

@aesuli

The current cross validation procedure adopted in the CalibratedClassifierCV does not follow the cross validation procedure described in the original Platt paper:
[Platt99] Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods, J. Platt, (1999)

I checked also the other papers cited in the references for the CalibratedClassifierCV class and none of them describes the cross validation process it implements.

CalibratedClassifierCV currently fits and calibrates an estimator for each fold (calibration is performed on the test part of the fold).
All the estimators fit at each fold are kept in a list.
At prediction time, every estimator makes a prediction and the average of the returned values is the final prediction.

The estimator produced by CalibratedClassifierCV is thus an ensemble and not a single estimator calibrated on the whole training set via CV.
When using cross validation the original base_estimator is not used to make the prediction.

Platt99, describes a cross validation procedure that fits an estimator on each fold and the predictions for the test fold are saved.
Then the predictions from all the folds are concatenated in a single list, and calibration parameters for the base_estimator are determined using such list.

Cross validation should be only a mean to calibrate the base_estimator on the same data it has been fit, not to fit a different estimator.

The procedure described in Platt99 is what one would expect from a proper application of a cross validation procedure, as the cross validation only determines the parameters of the calibration and does not fit the estimator.
It is also more efficient, as is does not store the estimators for each fold and requires a single predict at prediction time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions