-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
CalibratedClassifierCV should return an estimator calibrated via CV, not an ensemble of calibrated estimators #16145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In other words, we should use
IMO, using the ensemble will be more stable and powerful than using a single learner. I assume that it should attenuate changes linked to model parameters found for different splits. ping @GaelVaroquaux @agramfort you should be better aware than me for commenting. |
@aesuli I thing I understand what you asking. Before suggesting any change to the way we do things 2 questions:
|
@glemaitre right!
The fit(X,y) of the CalibratedClassifierCV could be something like (rough pseudo code):
Yes the current CalibratedClassiferCV implementation differs from libsvm. |
got it. We could expose this strategy with a parameter ensemble True or
False or something
I see also that it would not reduce fit time but increase it as you would
fit on the full data too.
The predict time would be faster though.
I would be curious to see if it improves performance too. I would imagine
that the cross_val_predict
strategy has more variance due to the absence of ensemble.
can you give it a try?
… |
OK, I'll work on this in the next days. |
The current cross validation procedure adopted in the
CalibratedClassifierCV
does not follow the cross validation procedure described in the original Platt paper:[Platt99] Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods, J. Platt, (1999)
I checked also the other papers cited in the references for the
CalibratedClassifierCV
class and none of them describes the cross validation process it implements.CalibratedClassifierCV
currently fits and calibrates an estimator for each fold (calibration is performed on the test part of the fold).All the estimators fit at each fold are kept in a list.
At prediction time, every estimator makes a prediction and the average of the returned values is the final prediction.
The estimator produced by
CalibratedClassifierCV
is thus an ensemble and not a single estimator calibrated on the whole training set via CV.When using cross validation the original
base_estimator
is not used to make the prediction.Platt99, describes a cross validation procedure that fits an estimator on each fold and the predictions for the test fold are saved.
Then the predictions from all the folds are concatenated in a single list, and calibration parameters for the
base_estimator
are determined using such list.Cross validation should be only a mean to calibrate the
base_estimator
on the same data it has been fit, not to fit a different estimator.The procedure described in Platt99 is what one would expect from a proper application of a cross validation procedure, as the cross validation only determines the parameters of the calibration and does not fit the estimator.
It is also more efficient, as is does not store the estimators for each fold and requires a single predict at prediction time.
The text was updated successfully, but these errors were encountered: