Skip to content

Refactor CalibratedClassifierCV #17803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

lucyleeow
Copy link
Member

@lucyleeow lucyleeow commented Jul 1, 2020

Reference Issues/PRs

First step to implement #16145
Follows from #16167, implements the refactoring suggested.

What does this implement/fix? Explain your changes.

Add ensemble option to CalibratedClassifierCV.
ensemble=True would be the default and would be the current implementation.
ensemble=False would be the same as the libsvm implementation (used when probability=True):

When cv is not 'prefit':

  • Concatenate predictions from each cv. Use all predictions to train calibrator.
  • Classifiers fit after each cv would be destroyed
  • Use classifier fit on all data at prediction time

Note ensemble=False is the procedure described in the original Platt paper (see #16145 (comment))

Performs refactoring suggested by @NicolasHug (#16167 (comment)) before implementing the new option for easier reviewing. Details in next comment.

Any other comments?

cc @ogrisel @agramfort

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THanks @lucyleeow , haven't reviewed in details yet

Do you think it would make sense to open another PR that branches off this one and that implements the new calibration procedure? It might make it easier to judge whether the refactoring that I proposed is appropriate


return df, idx_pos_class
Output of the `decision_function` method of the `clf_fitted` is used for
calibration. If this method does not exist, `predict_proba` method used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
calibration. If this method does not exist, `predict_proba` method used.
calibration. If this method does not exist, `predict_proba` method is used.

Comment on lines 401 to 404
if method == 'isotonic':
calibrator = IsotonicRegression(out_of_bounds='clip')
elif method == 'sigmoid':
calibrator = _SigmoidCalibration()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this bloc be removed? It looks like it's also inside of the loop below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, whoops!

elif self.method == 'sigmoid':
calibrator = _SigmoidCalibration()
else:
raise ValueError('method should be "sigmoid" or '
Copy link
Member

@NicolasHug NicolasHug Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer validating the input here instead of having an if/elif with no else clause. In general it's better to validate the input right before it is used.



class _CalibratedClassiferPipeline:
"""Pipeline chaining a fitted classifier and it's fitted calibrators.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Pipeline chaining a fitted classifier and it's fitted calibrators.
"""Pipeline-like chaining a fitted classifier and its fitted calibrators.

Comment on lines 432 to 433
'_SigmoidCalibration'). Number of calibrators equals the number of
classes. However, if there are 2 classes, list contains only one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'_SigmoidCalibration'). Number of calibrators equals the number of
classes. However, if there are 2 classes, list contains only one
'_SigmoidCalibration'). The number of calibrators equals the number of
classes. However, if there are 2 classes, the list contains only one


def predict_proba(self, X):
"""Posterior probabilities of classification
"""Calculate posterior (calibrated) probabilities.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether "posterior" is a term that makes sense here. Should we just talk about calibrated probabilities instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good to me! I was just keeping the old term.

calibrator.fit(this_df, Y[:, k], sample_weight)
self.calibrators_.append(calibrator)
calibrated_classifiers = []
for idx, this_df in zip(idx_pos_class, df.T):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest using class_idx instead of just idx. (k is commonly used to define a class so it'd be fine too though maybe more cryptic )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't realise the k terminology otherwise I would have kept it. I will use class_idx for others like me who aren't familiar with the convention.

----------
.. [1] Obtaining calibrated probability estimates from decision trees
and naive Bayesian classifiers, B. Zadrozny & C. Elkan, ICML 2001
idx_pos_class : array-like, shape (n_classes,)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
idx_pos_class : array-like, shape (n_classes,)
pos_class_indices : array-like, shape (n_classes,)

otherwise the name idx suggests it's a single value

Though it seems that in the binary case this isn't the pos_class index?

Copy link
Member Author

@lucyleeow lucyleeow Jul 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the binary case we only output the probability of clf_fitted.classes[1]:

    elif hasattr(clf_fitted, "predict_proba"):
        df = clf_fitted.predict_proba(X)
        if len(label_encoder_.classes_) == 2:
            df = df[:, 1:]

though this can be considered the 'positive class'? (the other prob is calculated later as 1 - prob)

idx_pos_class indicates which classes are 'present' in a particular train/test subset

@lucyleeow
Copy link
Member Author

lucyleeow commented Jul 1, 2020

(thanks for the review, you beat me to it while I was writing this long text)

@NicolasHug suggested the following refactoring (copied over):

I think we need to decouple 2 different things which are currently mixed in _CalibratedClassifier:
1. the chaining of the classifier and its corresponding calibrator, for prediction
2. the training of the calibrator

with suggested pseudocode: #16167 (comment)

Details of implementation in this PR:

  • _get_predictions performs the same as the old _preproc. It returns predictions for X and index of classes present in X
  • _fit_calibrator performs the same as the old _CalibratedClassifier.fit(). It fit calibrator(s) and returns a _CalibratedClassiferPipeline instance.
  • New class _CalibratedClassiferPipeline. Instance consists of a fitted classifier and a list of fitted calibrators for every cv (or single calibrator when ensemble option implemented). _CalibratedClassiferPipeline.predict_proba() performs the same as old _CalibratedClassifier.predict_proba(). It returns the calibrated probas (for one cv fold), shape (n_samples, n_classes)
  • As before CalibratedClassifierCV.predict_proba averages the probas from each cv fold.

Classes
I changed one thing not related to the refactoring: how to we keep track of classes. We need to keep track of classes in the case where y[test] contains less classes than y[train] (original PR). Changes:

This is the only place that le is used and AFAICT LabelBinarizer.fit(y).classes_ would be the same as LabelEncoder.fit(y).classes_.

  • The PR scale_params for linear models #779 added classes to _CalibratedClassifier and instantiates a new LabelEncoder at every step. I'm not sure if this is the best solution. I think it would be best to pass the fitted LabelEncoder instance instead of passing classes and fitting a new LabelEncoder instance every time.

(Edited for succinctness)

@lucyleeow
Copy link
Member Author

Do you think it would make sense to open another PR that branches off this one and that implements the new calibration procedure? It might make it easier to judge whether the refactoring that I proposed is appropriate

Happy to do this but could you describe the steps? Locally, I know how to make branch (e.g., 'add_ensemble') that branches off this branch (e.g., 'refactor'). Would I then just push 'add_ensemble' to make a new PR? Is the purpose such that you can look at the diff between 'add_ensemble' and 'refactor' ?

@NicolasHug
Copy link
Member

Would I then just push 'add_ensemble' to make a new PR?

yes, the new branch would be just like any other branch. It would not branch from master, unlike most branches we use, but other than that it's really just a branch

Is the purpose such that you can look at the diff between 'add_ensemble' and 'refactor' ?

Honestly I'm not sure my suggestion of having 2 PRs makes sense. The thing is, right now it's hard to review the refactoring because we cannot really tell whether it will help implementing the ensemble option. So I suggested having 2 PRs so that we can take a look at the "final" implementation as well, but maybe we should just go for that right away? It's really up to you what your preferred workflow would be

@lucyleeow
Copy link
Member Author

No problem @NicolasHug, happy to make a branch from this branch. I can practice my git!

@lucyleeow
Copy link
Member Author

The thing is, right now it's hard to review the refactoring because we cannot really tell whether it will help implementing the ensemble option.

I think you were right here. The refactoring needs to be re-jigged for it to work/help the implementation (so it doesn't use 'predictions_in_X' - from #16167). However, even with re-jigging, I can't think of a good way. This is what I got but I think it uses more memory (by saving the outputs form each cv fold) the current implementation and I don't think it's very elegant:

def _get_predictions(base_estimator, X, cv, ensemble,
                     base_estimator_supports_sw, label_encoder_):
    # returns a list of prediction tuples of form (`df`, `pos_class_indices`, `test_indicies`)
    # for `ensemble=True` list would contain 1 tuple, for `ensemble=False` the
    # list wold contain a tuple for each cv
    # Note cv loop done here
    # The `test_indicies` are required, since we need `sample_weight[test]` to fit the calibration
    # and the cv loop is inside here.


class CalibratedClassifierCV(BaseEstimator, ClassifierMixin,
                             MetaEstimatorMixin):
    def fit(self, X, y, sample_weight=None):
        # Checking
        # `_get_predictions`. Loop through list of prediction tuples and perform `_fit_calibrator`
        # return list of `_CalibratedClassiferPipeline` instances.

Any suggestions?

@NicolasHug
Copy link
Member

Sorry I don't understand why _get_predictions needs cv or base_estimator_supports_sw: these are fitting-related parameters, and my understanding is that _get_predictions only predicts?

@lucyleeow
Copy link
Member Author

Yes I guess they don't. I just I wanted to avoid repetition but it is not necessary. We can do, as with the original PR:

class CalibratedClassifierCV():
    ...
    if ensemble:
        # do cv for loop as before
    else:
        # use `cross_val_predict` and fit classifier (some repetition here)

I guess my question is did you just want an implementation without the use of predictions_in_X? I think we can do this by changing how _fit_calibrator() works - though this would get messy.

@NicolasHug
Copy link
Member

We can do, as with the original PR:...

I think this is what I had in mind yes. From what I can tell, what we need to change is mostly _CalibratedClassifier. Not so much CalibratedClassifierCV. But it's easy to miss things when you're not the one implementing, so you'll be the judge

I guess my question is did you just want an implementation without the use of predictions_in_X?

yes

I think we can do this by changing how _fit_calibrator() works - though this would get messy.

My suggestion (pasted below) doesn't work? My "vision" is that when ensemble is True you pass X and you call _fit_calibrator inside the manual CV loop. And when ensemble is False, _fit_calibrator is called only once, and you pass df directly (df would basically be the output of cross_val_predict). X and df are mutually exclusive.

def _fit_calibrator(fitted_classifier, method, y, X=None, df=None):
    # Take a fitted classifier as input, fit a calibrator, and return the
    # corresponding pipeline.
    # df is mutually exclusive with X
    # y should always be passed
    # This should handle the multiclass case too

    if df is None:
        df = fitted_classifier.decision_function(X)  # or predict_proba
    
    calibrator = _get_regressor(method)
    calibrator.fit(df, y)

@lucyleeow lucyleeow changed the title Add libsvm-like calibration procedure to CalibratedClassifierCV Refactor CalibratedClassifierCV Jul 7, 2020
@lucyleeow
Copy link
Member Author

lucyleeow commented Jul 7, 2020

Thanks, I was making it more difficult than it needs to be. First attempt here: #17856

I will amend documentation and add tests once the implementation is okay.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the refactoring but I think we could do a better job at preserving previous attribute names for the calibrated classifiers as it could be considered semi-public API (from a duck-typing point of view) and would not cost us much from a maintenance point of view. See the details in the review comments below:

@@ -311,145 +326,150 @@ def _more_tags(self):
}


class _CalibratedClassifier:
"""Probability calibration with isotonic regression or sigmoid.
def _get_predictions(clf_fitted, X, label_encoder_):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick renaming suggestion: _compute_predictions to emphasize that this is actually calling the pred_method (rather that just retrieving some precomputed predictions).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this in #17856 as it's easier..

classes. However, if there are 2 classes, the list contains only one
fitted calibrator.
"""
def __init__(self, clf_fitted, calibrators_fitted, label_encoder_):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer to use the names:

  • clf_fitted => base_estimator
  • calibrators_fitted = > calibrators
  • label_encoder_ => label_encoder

and just mention in the docstring that all of them are expected to be fitted.

Also we could add a property named calibrators_ that would just return calibrators and label_encoder_ pointing to label_encoder to ensure backward compat with the previous attribute names, maybe with a deprecation warning.

While this is a private class, calibrated classifier instances are stored in a public attribute of the public CalibratedClassifierCV class, therefore could be considered semi-public API, at least from a duck-typing point of view.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NicolasHug reminded me of keeping the attributes constant and it's updated #17856
I've also removed the attribute label_encoder.

Also we could add a property named calibrators_ that would just return calibrators

This may be tricky/confusing because for multiclass, there is more than one calibrator per classifier. e.g., for 3 classes each pair would be (1 classifier, 3 calibrators). I guess we could return a list of tuples, each tuple containing the calibrators for a 'pair' ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I realised that _CalibratedClassifier has a calibrators attribute, which can be accessed via calibrated_classifiers_.calibrators

@NicolasHug NicolasHug added the Superseded PR has been replace by a newer PR label Jul 31, 2020
@lucyleeow lucyleeow deleted the IS/16145 branch October 30, 2020 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants