Skip to content

Added libsvm-like calibration procedure as described in #16145 #16167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

aesuli
Copy link
Contributor

@aesuli aesuli commented Jan 21, 2020

Reference Issues/PRs

Fixes #16145

What does this implement/fix? Explain your changes.

Implements the Platt99 cross-validation procedure, also adopted by libsvm, which is different from the ensemble-based one implemented in CalibratedClassifierCV until now.
As suggested, I added an ensemble parameter to the class, so that both procedures are available.
By default it will use the ensemble method, so that existing code would run exactly the same.
By setting ensemble=False it uses the Platt99/libsvm cross-validation procedure.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @aesuli , made a few minimal comments

From a quick glance I have the impression that you have based your changes on a not-so-up-to-date version of the package. There are a bunch of changes proposed here that are actually a regression (i.e. outdated style, etc.).

@aesuli
Copy link
Contributor Author

aesuli commented Jan 22, 2020

Thanks @NicolasHug for the review.
I made the corrections you suggested.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a first pass (I haven't checked the actual code, only tests and docs)

@levnikmyskin

This comment has been minimized.

@NicolasHug
Copy link
Member

Thanks for your patience @aesuli .

I understand the need for predictions_in_X, but this shows that the current _CalibratedClassifier design isn't ideal for this current PR. Changing the semantics of X is something we want to avoid. Since it's a private class, we should be able to modify it without trouble.

I think we need to decouple 2 different things which are currently mixed in _CalibratedClassifier:

  1. the chaining of the classifier and its corresponding calibrator, for prediction
  2. the training of the calibrator

For now, 1. is done in _CalibratedClassifier.predict_proba and 2. is done in _CalibratedClassifier.fit.

I think we should separate them. The training of the calibrator could just be a single function (I think). And we could have a new class that does 1., i.e. just chaining, not fitting. Here's a simplified example of what I have in mind:

class _CalibratedClassiferPipeline:
    # Simple pipeline chaining  classifier and its calibrator.

    #  the calibrated_classifiers_ attribute would be a list of
    #  _CalibratedClassifierPipeline now. We can remove _CalibratedClassifier.
    # That class has no fit method. Fitting of the classifier is done in
    # CalibratedClassifierCV, and fitting of the regressor is done in
    # _calibrate.

    def __init__(self, fitted_classifier, fitted_calibrator):
        pass

    def predict_proba(self, X):
        df = self.fitted_classifier.decision_function(X)  # or predict_proba if it doesn't exist
        return self.fitted_calibrator.predict_proba(df)
    
    def predict(self, X):
        pass  # similar stuff


def _fit_calibrator(fitted_classifier, method, y, X=None, df=None):
    # Take a fitted classifier as input, fit a calibrator, and return the
    # corresponding pipeline.
    # df is mutually exclusive with X
    # y should always be passed
    # This should handle the multiclass case too

    if df is None:
        df = fitted_classifier.decision_function(X)  # or predict_proba
    
    calibrator = _get_regressor(method)
    calibrator.fit(df, y)

    return _CalibratedClassiferPipeline(fitted_classifier, calibrator)

Sorry I couldn't get to this earlier. LMK what you think

@aesuli
Copy link
Contributor Author

aesuli commented May 19, 2020

@NicolasHug Ok I'll work on this.

@lucyleeow
Copy link
Member

Hi @aesuli, are you still working on this? Thanks!

@cmarmo cmarmo added Superseded PR has been replace by a newer PR Stalled and removed Superseded PR has been replace by a newer PR labels Jun 30, 2020
@aesuli
Copy link
Contributor Author

aesuli commented Jul 1, 2020

@lucyleeow excuse me for my silence. I see that you correctly guessed that I cannot find time to work on this. Thank you for keeping this alive.

@aesuli aesuli closed this Jul 1, 2020
@lucyleeow
Copy link
Member

Ah no problem. I started with the refactoring suggested by @NicolasHug - I think you were right to abandon, it was really complicated! Please let me know if you want to take over/be involved. Otherwise I am happy to finish, will credit you if it eventually gets merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CalibratedClassifierCV should return an estimator calibrated via CV, not an ensemble of calibrated estimators
5 participants