Skip to content

API Rename base_estimator in CalibratedClassifierCV #22054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 30, 2022
10 changes: 5 additions & 5 deletions doc/modules/grid_search.rst
Original file line number Diff line number Diff line change
Expand Up @@ -602,17 +602,17 @@ parameters of composite or nested estimators such as
>>> from sklearn.datasets import make_moons
>>> X, y = make_moons()
>>> calibrated_forest = CalibratedClassifierCV(
... base_estimator=RandomForestClassifier(n_estimators=10))
... estimator=RandomForestClassifier(n_estimators=10))
>>> param_grid = {
... 'base_estimator__max_depth': [2, 4, 6, 8]}
... 'estimator__max_depth': [2, 4, 6, 8]}
>>> search = GridSearchCV(calibrated_forest, param_grid, cv=5)
>>> search.fit(X, y)
GridSearchCV(cv=5,
estimator=CalibratedClassifierCV(...),
param_grid={'base_estimator__max_depth': [2, 4, 6, 8]})
param_grid={'estimator__max_depth': [2, 4, 6, 8]})

Here, ``<estimator>`` is the parameter name of the nested estimator,
in this case ``base_estimator``.
in this case ``estimator``.
If the meta-estimator is constructed as a collection of estimators as in
`pipeline.Pipeline`, then ``<estimator>`` refers to the name of the estimator,
see :ref:`pipeline_nested_parameters`. In practice, there can be several
Expand All @@ -625,7 +625,7 @@ levels of nesting::
... ('model', calibrated_forest)])
>>> param_grid = {
... 'select__k': [1, 2],
... 'model__base_estimator__max_depth': [2, 4, 6, 8]}
... 'model__estimator__max_depth': [2, 4, 6, 8]}
>>> search = GridSearchCV(pipe, param_grid, cv=5).fit(X, y)

Please refer to :ref:`pipeline` for performing parameter searches over
Expand Down
8 changes: 8 additions & 0 deletions doc/whats_new/v1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,14 @@ Changelog
:pr:`123456` by :user:`Joe Bloggs <joeongithub>`.
where 123456 is the *pull request* number, not the issue number.

:mod:`sklearn.calibration`
..........................

- |API| Rename `base_estimator` to `estimator` in
:class:`CalibratedClassifierCV` to improve readability and consistency. The
parameter `base_estimator` is deprecated and will be removed in 1.4.
:pr:`22054` by :user:`Kevin Roice <kevroi>`.

:mod:`sklearn.cluster`
......................

Expand Down
104 changes: 63 additions & 41 deletions sklearn/calibration.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,17 +72,19 @@ class CalibratedClassifierCV(ClassifierMixin, MetaEstimatorMixin, BaseEstimator)
for model fitting and calibration are disjoint.

The calibration is based on the :term:`decision_function` method of the
`base_estimator` if it exists, else on :term:`predict_proba`.
`estimator` if it exists, else on :term:`predict_proba`.

Read more in the :ref:`User Guide <calibration>`.

Parameters
----------
base_estimator : estimator instance, default=None
estimator : estimator instance, default=None
The classifier whose output need to be calibrated to provide more
accurate `predict_proba` outputs. The default classifier is
a :class:`~sklearn.svm.LinearSVC`.

.. versionadded:: 1.2

method : {'sigmoid', 'isotonic'}, default='sigmoid'
The method to use for calibration. Can be 'sigmoid' which
corresponds to Platt's method (i.e. a logistic regression model) or
Expand All @@ -108,7 +110,7 @@ class CalibratedClassifierCV(ClassifierMixin, MetaEstimatorMixin, BaseEstimator)
Refer to the :ref:`User Guide <cross_validation>` for the various
cross-validation strategies that can be used here.

If "prefit" is passed, it is assumed that `base_estimator` has been
If "prefit" is passed, it is assumed that `estimator` has been
fitted already and all data is used for calibration.

.. versionchanged:: 0.22
Expand All @@ -130,7 +132,7 @@ class CalibratedClassifierCV(ClassifierMixin, MetaEstimatorMixin, BaseEstimator)
Determines how the calibrator is fitted when `cv` is not `'prefit'`.
Ignored if `cv='prefit'`.

If `True`, the `base_estimator` is fitted using training data, and
If `True`, the `estimator` is fitted using training data, and
calibrated using testing data, for each `cv` fold. The final estimator
is an ensemble of `n_cv` fitted classifier and calibrator pairs, where
`n_cv` is the number of cross-validation folds. The output is the
Expand All @@ -139,39 +141,46 @@ class CalibratedClassifierCV(ClassifierMixin, MetaEstimatorMixin, BaseEstimator)
If `False`, `cv` is used to compute unbiased predictions, via
:func:`~sklearn.model_selection.cross_val_predict`, which are then
used for calibration. At prediction time, the classifier used is the
`base_estimator` trained on all the data.
`estimator` trained on all the data.
Note that this method is also internally implemented in
:mod:`sklearn.svm` estimators with the `probabilities=True` parameter.

.. versionadded:: 0.24

base_estimator : estimator instance
This parameter is deprecated. Use `estimator` instead.

.. deprecated:: 1.2
The parameter `base_estimator` is deprecated in 1.2 and will be
removed in 1.4. Use `estimator` instead.

Attributes
----------
classes_ : ndarray of shape (n_classes,)
The class labels.

n_features_in_ : int
Number of features seen during :term:`fit`. Only defined if the
underlying base_estimator exposes such an attribute when fit.
underlying estimator exposes such an attribute when fit.

.. versionadded:: 0.24

feature_names_in_ : ndarray of shape (`n_features_in_`,)
Names of features seen during :term:`fit`. Only defined if the
underlying base_estimator exposes such an attribute when fit.
underlying estimator exposes such an attribute when fit.

.. versionadded:: 1.0

calibrated_classifiers_ : list (len() equal to cv or 1 if `cv="prefit"` \
or `ensemble=False`)
The list of classifier and calibrator pairs.

- When `cv="prefit"`, the fitted `base_estimator` and fitted
- When `cv="prefit"`, the fitted `estimator` and fitted
calibrator.
- When `cv` is not "prefit" and `ensemble=True`, `n_cv` fitted
`base_estimator` and calibrator pairs. `n_cv` is the number of
`estimator` and calibrator pairs. `n_cv` is the number of
cross-validation folds.
- When `cv` is not "prefit" and `ensemble=False`, the `base_estimator`,
- When `cv` is not "prefit" and `ensemble=False`, the `estimator`,
fitted on all the data, and fitted calibrator.

.. versionchanged:: 0.24
Expand Down Expand Up @@ -204,9 +213,9 @@ class CalibratedClassifierCV(ClassifierMixin, MetaEstimatorMixin, BaseEstimator)
>>> X, y = make_classification(n_samples=100, n_features=2,
... n_redundant=0, random_state=42)
>>> base_clf = GaussianNB()
>>> calibrated_clf = CalibratedClassifierCV(base_estimator=base_clf, cv=3)
>>> calibrated_clf = CalibratedClassifierCV(base_clf, cv=3)
>>> calibrated_clf.fit(X, y)
CalibratedClassifierCV(base_estimator=GaussianNB(), cv=3)
CalibratedClassifierCV(...)
>>> len(calibrated_clf.calibrated_classifiers_)
3
>>> calibrated_clf.predict_proba(X)[:5, :]
Expand All @@ -224,12 +233,9 @@ class CalibratedClassifierCV(ClassifierMixin, MetaEstimatorMixin, BaseEstimator)
>>> base_clf = GaussianNB()
>>> base_clf.fit(X_train, y_train)
GaussianNB()
>>> calibrated_clf = CalibratedClassifierCV(
... base_estimator=base_clf,
... cv="prefit"
... )
>>> calibrated_clf = CalibratedClassifierCV(base_clf, cv="prefit")
>>> calibrated_clf.fit(X_calib, y_calib)
CalibratedClassifierCV(base_estimator=GaussianNB(), cv='prefit')
CalibratedClassifierCV(...)
>>> len(calibrated_clf.calibrated_classifiers_)
1
>>> calibrated_clf.predict_proba([[-0.5, 0.5]])
Expand All @@ -238,18 +244,20 @@ class CalibratedClassifierCV(ClassifierMixin, MetaEstimatorMixin, BaseEstimator)

def __init__(
self,
base_estimator=None,
estimator=None,
*,
method="sigmoid",
cv=None,
n_jobs=None,
ensemble=True,
base_estimator="deprecated",
):
self.base_estimator = base_estimator
self.estimator = estimator
self.method = method
self.cv = cv
self.n_jobs = n_jobs
self.ensemble = ensemble
self.base_estimator = base_estimator

def fit(self, X, y, sample_weight=None, **fit_params):
"""Fit the calibrated model.
Expand Down Expand Up @@ -282,25 +290,39 @@ def fit(self, X, y, sample_weight=None, **fit_params):
for sample_aligned_params in fit_params.values():
check_consistent_length(y, sample_aligned_params)

if self.base_estimator is None:
# TODO(1.4): Remove when base_estimator is removed
if self.base_estimator != "deprecated":
if self.estimator is not None:
raise ValueError(
"Both `base_estimator` and `estimator` are set. Only set "
"`estimator` since `base_estimator` is deprecated."
)
warnings.warn(
"`base_estimator` was renamed to `estimator` in version 1.2 and "
"will be removed in 1.4.",
FutureWarning,
)
estimator = self.base_estimator
else:
estimator = self.estimator

if estimator is None:
# we want all classifiers that don't expose a random_state
# to be deterministic (and we don't want to expose this one).
base_estimator = LinearSVC(random_state=0)
else:
base_estimator = self.base_estimator
estimator = LinearSVC(random_state=0)

self.calibrated_classifiers_ = []
if self.cv == "prefit":
# `classes_` should be consistent with that of base_estimator
check_is_fitted(self.base_estimator, attributes=["classes_"])
self.classes_ = self.base_estimator.classes_
# `classes_` should be consistent with that of estimator
check_is_fitted(self.estimator, attributes=["classes_"])
self.classes_ = self.estimator.classes_

pred_method, method_name = _get_prediction_method(base_estimator)
pred_method, method_name = _get_prediction_method(estimator)
n_classes = len(self.classes_)
predictions = _compute_predictions(pred_method, method_name, X, n_classes)

calibrated_classifier = _fit_calibrator(
base_estimator,
estimator,
predictions,
y,
self.classes_,
Expand All @@ -315,10 +337,10 @@ def fit(self, X, y, sample_weight=None, **fit_params):
n_classes = len(self.classes_)

# sample_weight checks
fit_parameters = signature(base_estimator.fit).parameters
fit_parameters = signature(estimator.fit).parameters
supports_sw = "sample_weight" in fit_parameters
if sample_weight is not None and not supports_sw:
estimator_name = type(base_estimator).__name__
estimator_name = type(estimator).__name__
warnings.warn(
f"Since {estimator_name} does not appear to accept sample_weight, "
"sample weights will only be used for the calibration itself. This "
Expand Down Expand Up @@ -351,7 +373,7 @@ def fit(self, X, y, sample_weight=None, **fit_params):
parallel = Parallel(n_jobs=self.n_jobs)
self.calibrated_classifiers_ = parallel(
delayed(_fit_classifier_calibrator_pair)(
clone(base_estimator),
clone(estimator),
X,
y,
train=train,
Expand All @@ -365,7 +387,7 @@ def fit(self, X, y, sample_weight=None, **fit_params):
for train, test in cv.split(X, y)
)
else:
this_estimator = clone(base_estimator)
this_estimator = clone(estimator)
_, method_name = _get_prediction_method(this_estimator)
fit_params = (
{"sample_weight": sample_weight}
Expand Down Expand Up @@ -402,7 +424,7 @@ def fit(self, X, y, sample_weight=None, **fit_params):
)
self.calibrated_classifiers_.append(calibrated_classifier)

first_clf = self.calibrated_classifiers_[0].base_estimator
first_clf = self.calibrated_classifiers_[0].estimator
if hasattr(first_clf, "n_features_in_"):
self.n_features_in_ = first_clf.n_features_in_
if hasattr(first_clf, "feature_names_in_"):
Expand All @@ -418,7 +440,7 @@ def predict_proba(self, X):
Parameters
----------
X : array-like of shape (n_samples, n_features)
The samples, as accepted by `base_estimator.predict_proba`.
The samples, as accepted by `estimator.predict_proba`.

Returns
-------
Expand Down Expand Up @@ -446,7 +468,7 @@ def predict(self, X):
Parameters
----------
X : array-like of shape (n_samples, n_features)
The samples, as accepted by `base_estimator.predict`.
The samples, as accepted by `estimator.predict`.

Returns
-------
Expand Down Expand Up @@ -570,7 +592,7 @@ def _get_prediction_method(clf):
return method, "predict_proba"
else:
raise RuntimeError(
"'base_estimator' has no 'decision_function' or 'predict_proba' method."
"'estimator' has no 'decision_function' or 'predict_proba' method."
)


Expand Down Expand Up @@ -669,7 +691,7 @@ class _CalibratedClassifier:

Parameters
----------
base_estimator : estimator instance
estimator : estimator instance
Fitted classifier.

calibrators : list of fitted estimator instances
Expand All @@ -687,8 +709,8 @@ class _CalibratedClassifier:
non-parametric approach based on isotonic regression.
"""

def __init__(self, base_estimator, calibrators, *, classes, method="sigmoid"):
self.base_estimator = base_estimator
def __init__(self, estimator, calibrators, *, classes, method="sigmoid"):
self.estimator = estimator
self.calibrators = calibrators
self.classes = classes
self.method = method
Expand All @@ -710,11 +732,11 @@ def predict_proba(self, X):
The predicted probabilities. Can be exact zeros.
"""
n_classes = len(self.classes)
pred_method, method_name = _get_prediction_method(self.base_estimator)
pred_method, method_name = _get_prediction_method(self.estimator)
predictions = _compute_predictions(pred_method, method_name, X, n_classes)

label_encoder = LabelEncoder().fit(self.classes)
pos_class_indices = label_encoder.transform(self.base_estimator.classes_)
pos_class_indices = label_encoder.transform(self.estimator.classes_)

proba = np.zeros((_num_samples(X), n_classes))
for class_idx, this_pred, calibrator in zip(
Expand Down
Loading