-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Added libsvm-like calibration procedure as described in #16145 #16167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @aesuli , made a few minimal comments
From a quick glance I have the impression that you have based your changes on a not-so-up-to-date version of the package. There are a bunch of changes proposed here that are actually a regression (i.e. outdated style, etc.).
Thanks @NicolasHug for the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a first pass (I haven't checked the actual code, only tests and docs)
Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com>
This comment has been minimized.
This comment has been minimized.
Thanks for your patience @aesuli . I understand the need for I think we need to decouple 2 different things which are currently mixed in
For now, 1. is done in I think we should separate them. The training of the calibrator could just be a single function (I think). And we could have a new class that does 1., i.e. just chaining, not fitting. Here's a simplified example of what I have in mind: class _CalibratedClassiferPipeline:
# Simple pipeline chaining classifier and its calibrator.
# the calibrated_classifiers_ attribute would be a list of
# _CalibratedClassifierPipeline now. We can remove _CalibratedClassifier.
# That class has no fit method. Fitting of the classifier is done in
# CalibratedClassifierCV, and fitting of the regressor is done in
# _calibrate.
def __init__(self, fitted_classifier, fitted_calibrator):
pass
def predict_proba(self, X):
df = self.fitted_classifier.decision_function(X) # or predict_proba if it doesn't exist
return self.fitted_calibrator.predict_proba(df)
def predict(self, X):
pass # similar stuff
def _fit_calibrator(fitted_classifier, method, y, X=None, df=None):
# Take a fitted classifier as input, fit a calibrator, and return the
# corresponding pipeline.
# df is mutually exclusive with X
# y should always be passed
# This should handle the multiclass case too
if df is None:
df = fitted_classifier.decision_function(X) # or predict_proba
calibrator = _get_regressor(method)
calibrator.fit(df, y)
return _CalibratedClassiferPipeline(fitted_classifier, calibrator) Sorry I couldn't get to this earlier. LMK what you think |
@NicolasHug Ok I'll work on this. |
Hi @aesuli, are you still working on this? Thanks! |
@lucyleeow excuse me for my silence. I see that you correctly guessed that I cannot find time to work on this. Thank you for keeping this alive. |
Ah no problem. I started with the refactoring suggested by @NicolasHug - I think you were right to abandon, it was really complicated! Please let me know if you want to take over/be involved. Otherwise I am happy to finish, will credit you if it eventually gets merged! |
Reference Issues/PRs
Fixes #16145
What does this implement/fix? Explain your changes.
Implements the Platt99 cross-validation procedure, also adopted by libsvm, which is different from the ensemble-based one implemented in
CalibratedClassifierCV
until now.As suggested, I added an
ensemble
parameter to the class, so that both procedures are available.By default it will use the ensemble method, so that existing code would run exactly the same.
By setting
ensemble=False
it uses the Platt99/libsvm cross-validation procedure.