Added libsvm-like calibration procedure as described in #16145 #16167

aesuli · 2020-01-21T10:22:25Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Implements the Platt99 cross-validation procedure, also adopted by libsvm, which is different from the ensemble-based one implemented in CalibratedClassifierCV until now.
As suggested, I added an ensemble parameter to the class, so that both procedures are available.
By default it will use the ensemble method, so that existing code would run exactly the same.
By setting ensemble=False it uses the Platt99/libsvm cross-validation procedure.

…16145

NicolasHug

Thanks for the PR @aesuli , made a few minimal comments

From a quick glance I have the impression that you have based your changes on a not-so-up-to-date version of the package. There are a bunch of changes proposed here that are actually a regression (i.e. outdated style, etc.).

sklearn/calibration.py

examples/calibration/plot_calibration_curve.py

sklearn/calibration.py

aesuli · 2020-01-22T09:45:13Z

Thanks @NicolasHug for the review.
I made the corrections you suggested.

NicolasHug

a first pass (I haven't checked the actual code, only tests and docs)

doc/modules/calibration.rst

examples/calibration/plot_calibration_curve.py

sklearn/calibration.py

sklearn/tests/test_calibration.py

Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com>

NicolasHug · 2020-05-11T20:49:44Z

Thanks for your patience @aesuli .

I understand the need for predictions_in_X, but this shows that the current _CalibratedClassifier design isn't ideal for this current PR. Changing the semantics of X is something we want to avoid. Since it's a private class, we should be able to modify it without trouble.

I think we need to decouple 2 different things which are currently mixed in _CalibratedClassifier:

the chaining of the classifier and its corresponding calibrator, for prediction
the training of the calibrator

For now, 1. is done in _CalibratedClassifier.predict_proba and 2. is done in _CalibratedClassifier.fit.

I think we should separate them. The training of the calibrator could just be a single function (I think). And we could have a new class that does 1., i.e. just chaining, not fitting. Here's a simplified example of what I have in mind:

class _CalibratedClassiferPipeline:
    # Simple pipeline chaining  classifier and its calibrator.

    #  the calibrated_classifiers_ attribute would be a list of
    #  _CalibratedClassifierPipeline now. We can remove _CalibratedClassifier.
    # That class has no fit method. Fitting of the classifier is done in
    # CalibratedClassifierCV, and fitting of the regressor is done in
    # _calibrate.

    def __init__(self, fitted_classifier, fitted_calibrator):
        pass

    def predict_proba(self, X):
        df = self.fitted_classifier.decision_function(X)  # or predict_proba if it doesn't exist
        return self.fitted_calibrator.predict_proba(df)
    
    def predict(self, X):
        pass  # similar stuff


def _fit_calibrator(fitted_classifier, method, y, X=None, df=None):
    # Take a fitted classifier as input, fit a calibrator, and return the
    # corresponding pipeline.
    # df is mutually exclusive with X
    # y should always be passed
    # This should handle the multiclass case too

    if df is None:
        df = fitted_classifier.decision_function(X)  # or predict_proba
    
    calibrator = _get_regressor(method)
    calibrator.fit(df, y)

    return _CalibratedClassiferPipeline(fitted_classifier, calibrator)

Sorry I couldn't get to this earlier. LMK what you think

…into platt

aesuli · 2020-05-19T10:51:29Z

@NicolasHug Ok I'll work on this.

lucyleeow · 2020-06-18T10:38:49Z

Hi @aesuli, are you still working on this? Thanks!

aesuli · 2020-07-01T09:50:05Z

@lucyleeow excuse me for my silence. I see that you correctly guessed that I cannot find time to work on this. Thank you for keeping this alive.

lucyleeow · 2020-07-01T09:53:30Z

Ah no problem. I started with the refactoring suggested by @NicolasHug - I think you were right to abandon, it was really complicated! Please let me know if you want to take over/be involved. Otherwise I am happy to finish, will credit you if it eventually gets merged!

aesuli added 4 commits January 21, 2020 11:09

Added libsvm-like calibration procedure as described in scikit-learn#…

7297446

…16145

linting

a9451a3

linting

e99636e

linting

7a709dd

NicolasHug reviewed Jan 21, 2020

View reviewed changes

PR review

39025ba

aesuli and others added 2 commits January 22, 2020 11:54

Merge branch 'master' into platt

94fb0ab

linting

33c2290

aesuli requested a review from NicolasHug January 24, 2020 10:03

NicolasHug mentioned this pull request Jan 24, 2020

DOC Better UG for calibration #16175

Merged

NicolasHug reviewed Jan 24, 2020

View reviewed changes

aesuli and others added 5 commits January 24, 2020 15:06

Update examples/calibration/plot_calibration_curve.py

c6bf1a6

Co-Authored-By: Nicolas Hug <contact@nicolas-hug.com>

Merge

cf0d308

code review

2200fa9

linting

890d557

Merge branch 'master' of github.com:scikit-learn/scikit-learn into platt

ffcf8b5

aesuli requested a review from NicolasHug February 3, 2020 15:12

aesuli mentioned this pull request Feb 5, 2020

GridSearchCV and RandomizedSearchCV: option to compute a single score from all CV splits #16353

Open

Merge branch 'master' into platt

5b4c9dc

This comment has been minimized.

Sign in to view

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

1e1032b

…into platt

NicolasHug mentioned this pull request Jun 13, 2020

ENH Add CalibrationDisplay plotting class #17443

Merged

cmarmo added Superseded PR has been replace by a newer PR Stalled and removed Superseded PR has been replace by a newer PR labels Jun 30, 2020

lucyleeow mentioned this pull request Jul 1, 2020

Refactor CalibratedClassifierCV #17803

Closed

aesuli closed this Jul 1, 2020

lucyleeow mentioned this pull request Jul 7, 2020

ENH Add libsvm-like calibration procedure to CalibratedClassifierCV #17856

Merged

Uh oh!

Added libsvm-like calibration procedure as described in #16145 #16167

Added libsvm-like calibration procedure as described in #16145 #16167

Uh oh!

Conversation

aesuli commented Jan 21, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aesuli commented Jan 22, 2020

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

NicolasHug commented May 11, 2020

Uh oh!

aesuli commented May 19, 2020

Uh oh!

lucyleeow commented Jun 18, 2020

Uh oh!

aesuli commented Jul 1, 2020

Uh oh!

lucyleeow commented Jul 1, 2020

Uh oh!

Uh oh!