Skip to content

FIX CalibratedClassifierCV to handle correctly sample_weight when ensemble=False #20638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
fe5d1a6
Fixing the usage of sample_weights in CalibratedClassifierCV with
JulienB-78 Jul 30, 2021
e42eac2
replace custom error by brier score
JulienB-78 Jul 30, 2021
3b962cc
removing unnecessary newline
JulienB-78 Jul 30, 2021
23b32e5
Reintroducing the old test. Reduce diff size of the bug fix. Add chan…
JulienB-78 Jul 31, 2021
4d2e271
Merge remote-tracking branch 'upstream/main' into fix_calibrationclas…
JulienB-78 Jul 31, 2021
ea8acd3
Update sklearn/tests/test_calibration.py
JulienB-78 Aug 8, 2021
40fc953
Update doc/whats_new/v1.0.rst
JulienB-78 Aug 8, 2021
bd4045e
Merge remote-tracking branch 'upstream/main' into fix_calibrationclas…
JulienB-78 Aug 8, 2021
916eeb3
Merge branch 'fix_calibrationclassifier_with_weights' of github.com:J…
JulienB-78 Aug 8, 2021
ee92ef3
Merge remote-tracking branch 'upstream/main' into fix_calibrationclas…
JulienB-78 Sep 22, 2021
da6970a
Added standardscaler before using SVC
JulienB-78 Sep 22, 2021
421b992
Merge remote-tracking branch 'upstream/main' into fix_calibrationclas…
JulienB-78 Sep 23, 2021
4f93e38
edited whats_new/v1.0.rst
JulienB-78 Sep 23, 2021
57d3aa6
correcting whats_new/v1.0.rst
JulienB-78 Sep 23, 2021
1d0187b
correcting whats_new/v1.0.rst
JulienB-78 Sep 23, 2021
81e8bd3
Update sklearn/tests/test_calibration.py
JulienB-78 Sep 26, 2021
cf99fba
Update sklearn/tests/test_calibration.py
JulienB-78 Sep 26, 2021
e719952
Update sklearn/tests/test_calibration.py
JulienB-78 Sep 26, 2021
71de6f7
Update sklearn/tests/test_calibration.py
JulienB-78 Sep 26, 2021
9b021c3
Update sklearn/tests/test_calibration.py
JulienB-78 Sep 26, 2021
734870c
correcting indentation
JulienB-78 Sep 26, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions doc/whats_new/v1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,23 @@

.. currentmodule:: sklearn

.. _changes_1_0_1:

Version 1.0.1
=============

**In Development**

Changelog
---------

:mod:`sklearn.calibration`
..........................

- |Fix| Fixed :class:`calibration.CalibratedClassifierCV` to handle correctly
`sample_weight` when `ensemble=False`.
:pr:`20638` by :user:`Julien Bohné <JulienB-78>`.

.. _changes_1_0:

Version 1.0.0
Expand Down
6 changes: 6 additions & 0 deletions sklearn/calibration.py
Original file line number Diff line number Diff line change
Expand Up @@ -351,6 +351,11 @@ def fit(self, X, y, sample_weight=None):
else:
this_estimator = clone(base_estimator)
_, method_name = _get_prediction_method(this_estimator)
fit_params = (
{"sample_weight": sample_weight}
if sample_weight is not None and supports_sw
else None
)
pred_method = partial(
cross_val_predict,
estimator=this_estimator,
Expand All @@ -359,6 +364,7 @@ def fit(self, X, y, sample_weight=None):
cv=cv,
method=method_name,
n_jobs=self.n_jobs,
fit_params=fit_params,
)
predictions = _compute_predictions(
pred_method, method_name, X, n_classes
Expand Down
68 changes: 68 additions & 0 deletions sklearn/tests/test_calibration.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,12 @@ def test_sample_weight(data, method, ensemble):
X_train, y_train, sw_train = X[:n_samples], y[:n_samples], sample_weight[:n_samples]
X_test = X[n_samples:]

scaler = StandardScaler()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scaler = StandardScaler()
# FIXME: ideally we should create a `Pipeline` with the `StandardScaler`
# followed by the `LinearSVC`. However, `Pipeline` does not expose
# `sample_weight` and it will be silently ignored.
scaler = StandardScaler()

X_train = scaler.fit_transform(
X_train
) # compute mean, std and transform training data as well
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can new remove this comment as well

X_test = scaler.transform(X_test)

base_estimator = LinearSVC(random_state=42)
calibrated_clf = CalibratedClassifierCV(
base_estimator, method=method, ensemble=ensemble
Expand All @@ -182,6 +188,68 @@ def test_sample_weight(data, method, ensemble):
assert diff > 0.1


@pytest.mark.parametrize("method", ["sigmoid", "isotonic"])
@pytest.mark.parametrize("ensemble", [True, False])
def test_sample_weight_class_imbalanced(method, ensemble):
"""Use an imbalanced dataset to check that `sample_weight` is taken into
account in the calibration estimator."""
X, y = make_blobs((100, 1000), center_box=(-1, 1), random_state=42)

# Compute weights to compensate for the unbalance of the dataset
weights = np.array([0.9, 0.1])
sample_weight = weights[(y == 1).astype(int)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sample_weight = weights[(y == 1).astype(int)]
sample_weight = weights[(y == 1).astype(np.int64)]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sample_weight = weights[(y == 1).astype(int)]
sample_weight = weights[(y == 1).astype(np.int64)]


X_train, X_test, y_train, y_test, sw_train, sw_test = train_test_split(
X, y, sample_weight, stratify=y, random_state=42
)

# FIXME: ideally we should create a `Pipeline` with the `StandardScaler`
# followed by the `LinearSVC`. However, `Pipeline` does not expose
# `sample_weight` and it will be silently ignored.
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

base_estimator = LinearSVC(random_state=42)
calibrated_clf = CalibratedClassifierCV(
base_estimator, method=method, ensemble=ensemble
)
calibrated_clf.fit(X_train, y_train, sample_weight=sw_train)
predictions = calibrated_clf.predict_proba(X_test)[:, 1]

assert brier_score_loss(y_test, predictions, sample_weight=sw_test) < 0.2


@pytest.mark.parametrize("method", ["sigmoid", "isotonic"])
def test_sample_weight_class_imbalanced_ensemble_equivalent(method):
X, y = make_blobs((100, 1000), center_box=(-1, 1), random_state=42)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a small docstring mentioning what do we try to achieve here


# Compute weigths to compensate the unbalance of the dataset
sample_weight = 9 * (y == 0) + 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you make the same change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sample_weight = 9 * (y == 0) + 1
weights = np.array([0.9, 0.1])
sample_weight = weights[(y == 1).astype(np.int64)]


X_train, X_test, y_train, y_test, sw_train, sw_test = train_test_split(
X, y, sample_weight, stratify=y, random_state=42
)

scaler = StandardScaler()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add a comment as before

X_train = scaler.fit_transform(
X_train
) # compute mean, std and transform training data as well
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the comment

X_test = scaler.transform(X_test)
Comment on lines +234 to +238
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
scaler = StandardScaler()
X_train = scaler.fit_transform(
X_train
) # compute mean, std and transform training data as well
X_test = scaler.transform(X_test)
# FIXME: ideally we should create a `Pipeline` with the `StandardScaler`
# followed by the `LinearSVC`. However, `Pipeline` does not expose
# `sample_weight` and it will be silently ignored.
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


predictions = []
for ensemble in [True, False]:
base_estimator = LinearSVC(random_state=42)
calibrated_clf = CalibratedClassifierCV(
base_estimator, method=method, ensemble=ensemble
)
calibrated_clf.fit(X_train, y_train, sample_weight=sw_train)
predictions.append(calibrated_clf.predict_proba(X_test)[:, 1])

diff = np.linalg.norm(predictions[0] - predictions[1])
assert diff < 1.5


@pytest.mark.parametrize("method", ["sigmoid", "isotonic"])
@pytest.mark.parametrize("ensemble", [True, False])
def test_parallel_execution(data, method, ensemble):
Expand Down