FIX CalibratedClassifierCV to handle correctly sample_weight when ensemble=False #20638

JulienB-78 · 2021-07-30T21:42:55Z

Reference Issues/PRs

Fixes #20610

What does this implement/fix? Explain your changes.

sample_weight is now passed to cross_val_predict via the fit_params dict such that the weights are used when training the classifier for each cv split.

The test checking that the calibration is correct when passing sample_weight has been updated to verify that the calibration is actually effective. Previously, the test was only checking that the predictions of CalibratedClassifierCV with and without sample_weight were different but not that the calibration was actually working.

Any other comments?

ensemble=False + updating the corresponding test

JulienB-78 · 2021-07-30T22:15:14Z

There are currently 4 check failures:

3 in the Azure pipeline
1 related to Changelog

Am I supposed to fix them?
If yes, suggestions are welcome because I don't really know where to start for any of them...

glemaitre · 2021-07-31T08:44:57Z

The failure regarding OpenMP can be ignored (see #20640).

Please add an entry to the change log at doc/whats_new/v1.0.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:. This will solve the failure with "Check Changelog".

sklearn/tests/test_calibration.py

sklearn/calibration.py

…gelog

…sifier_with_weights

glemaitre

It looks good. I am thinking that we could add an additional check

sklearn/tests/test_calibration.py

doc/whats_new/v1.0.rst

add stratify=y when splitting train and test sets Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…sifier_with_weights

…ulienB-78/scikit-learn into fix_calibrationclassifier_with_weights

JulienB-78 · 2021-09-10T12:33:00Z

The pull request is still open and seems stuck. I asked questions about a month ago regarding some requests that were made and received no answer. (summer break I guess)

What's next?

glemaitre

I will tag this PR for 1.0.1 to integrate it in the next bug fix release

glemaitre · 2021-09-16T09:38:33Z

doc/whats_new/v1.0.rst

@@ -146,6 +146,10 @@ Changelog
  as `base_estimator` in ::class:`calibration.CalibratedClassifierCV`.
  :pr:`20087` by :user:`Clément Fauchereau <clement-f>`.

+- |Fix| Fixed ::class:`calibration.CalibratedClassifierCV` to handle correctly


We would need to move this fix under a section 1.0.1 because it will be a bug fix that will be integrated in the next bug fix release

v1.0.1.rst is not created yet.

I should wait for its creation to add:

|Fix| Fixed :class:calibration.CalibratedClassifierCV to handle correctly
sample_weight when ensemble=False.
:pr:20638 by :user:Julien Bohné <JulienB-78>.

or there is another way?

v1.0.1.rst is not created yet.

It should be in v1.0.rst file. It will only be a new section. You can have a look at the 0.24.rst file to see the 0.24.2 section.

I have created the 1.0.1 section.

Some builds fail but I guess it is unrelated. Should I just wait?

Indeed, I relaunch the build. It was due to an error at the install of the docker file. This is a random failure on Azure side I assume.

…sifier_with_weights

Added test to check that results are similar with ensemble is False or True

…sifier_with_weights

sklearn/tests/test_calibration.py

glemaitre

OK, so here are a couple of more comments and I found an underlying quite important bug indeed. I will open a subsequent issue.

sklearn/tests/test_calibration.py

glemaitre · 2021-09-24T09:48:58Z

sklearn/tests/test_calibration.py

+    X, y = make_blobs((100, 1000), center_box=(-1, 1), random_state=42)
+
+    # Compute weigths to compensate the unbalance of the dataset
+    sample_weight = 9 * (y == 0) + 1


could you make the same change

glemaitre · 2021-09-24T09:49:15Z

sklearn/tests/test_calibration.py

+
+@pytest.mark.parametrize("method", ["sigmoid", "isotonic"])
+def test_sample_weight_class_imbalanced_ensemble_equivalent(method):
+    X, y = make_blobs((100, 1000), center_box=(-1, 1), random_state=42)


could you add a small docstring mentioning what do we try to achieve here

glemaitre · 2021-09-24T09:49:35Z

sklearn/tests/test_calibration.py

+        X, y, sample_weight, stratify=y, random_state=42
+    )
+
+    scaler = StandardScaler()


you can add a comment as before

glemaitre · 2021-09-24T09:49:41Z

sklearn/tests/test_calibration.py

+    scaler = StandardScaler()
+    X_train = scaler.fit_transform(
+        X_train
+    )  # compute mean, std and transform training data as well


remove the comment

glemaitre · 2021-09-24T09:49:57Z

sklearn/tests/test_calibration.py

+    scaler = StandardScaler()
+    X_train = scaler.fit_transform(
+        X_train
+    )  # compute mean, std and transform training data as well


remove the comment

can new remove this comment as well

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

glemaitre · 2021-09-27T10:19:54Z

sklearn/tests/test_calibration.py

@@ -166,6 +166,12 @@ def test_sample_weight(data, method, ensemble):
    X_train, y_train, sw_train = X[:n_samples], y[:n_samples], sample_weight[:n_samples]
    X_test = X[n_samples:]

+    scaler = StandardScaler()


Suggested change

scaler = StandardScaler()

# FIXME: ideally we should create a `Pipeline` with the `StandardScaler`

# followed by the `LinearSVC`. However, `Pipeline` does not expose

# `sample_weight` and it will be silently ignored.

scaler = StandardScaler()

glemaitre · 2021-09-27T10:20:55Z

sklearn/tests/test_calibration.py

+    X, y = make_blobs((100, 1000), center_box=(-1, 1), random_state=42)
+
+    # Compute weigths to compensate the unbalance of the dataset
+    sample_weight = 9 * (y == 0) + 1


Suggested change

sample_weight = 9 * (y == 0) + 1

weights = np.array([0.9, 0.1])

sample_weight = weights[(y == 1).astype(np.int64)]

glemaitre · 2021-09-27T10:21:07Z

sklearn/tests/test_calibration.py

+
+    # Compute weights to compensate for the unbalance of the dataset
+    weights = np.array([0.9, 0.1])
+    sample_weight = weights[(y == 1).astype(int)]


Suggested change

sample_weight = weights[(y == 1).astype(int)]

sample_weight = weights[(y == 1).astype(np.int64)]

glemaitre · 2021-09-27T10:21:31Z

sklearn/tests/test_calibration.py

+
+    # Compute weights to compensate for the unbalance of the dataset
+    weights = np.array([0.9, 0.1])
+    sample_weight = weights[(y == 1).astype(int)]


Suggested change

sample_weight = weights[(y == 1).astype(int)]

sample_weight = weights[(y == 1).astype(np.int64)]

glemaitre · 2021-09-27T10:22:19Z

sklearn/tests/test_calibration.py

+    scaler = StandardScaler()
+    X_train = scaler.fit_transform(
+        X_train
+    )  # compute mean, std and transform training data as well
+    X_test = scaler.transform(X_test)


Suggested change

scaler = StandardScaler()

X_train = scaler.fit_transform(

X_train

) # compute mean, std and transform training data as well

X_test = scaler.transform(X_test)

# FIXME: ideally we should create a `Pipeline` with the `StandardScaler`

# followed by the `LinearSVC`. However, `Pipeline` does not expose

# `sample_weight` and it will be silently ignored.

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

glemaitre · 2021-09-29T16:29:36Z

@JulienB-78 While reviewing your PR, I found some other bugs and to find those bugs, I wrote a couple of tests that were better suited to test the sample_weight behaviour. Since it takes your work, I added your changelog entry and I added you as a co-author while committing: #21179

I will close this PR since we need your fix at the same time as the fix in the sigmoid calibration. I will close this PR then but you will get acknowledged in the other PR.

We will have to open another PR most probably to check the behaviour with the Pipeline.

JulienB-78 added 3 commits July 30, 2021 23:01

Fixing the usage of sample_weights in CalibratedClassifierCV with

fe5d1a6

ensemble=False + updating the corresponding test

replace custom error by brier score

e42eac2

removing unnecessary newline

3b962cc

glemaitre reviewed Jul 31, 2021

View reviewed changes

sklearn/tests/test_calibration.py Show resolved Hide resolved

glemaitre reviewed Jul 31, 2021

View reviewed changes

sklearn/calibration.py Outdated Show resolved Hide resolved

JulienB-78 added 2 commits July 31, 2021 14:18

Reintroducing the old test. Reduce diff size of the bug fix. Add chan…

23b32e5

…gelog

Merge remote-tracking branch 'upstream/main' into fix_calibrationclas…

4d2e271

…sifier_with_weights

glemaitre reviewed Aug 2, 2021

View reviewed changes

sklearn/tests/test_calibration.py Outdated Show resolved Hide resolved

sklearn/tests/test_calibration.py Show resolved Hide resolved

sklearn/tests/test_calibration.py Show resolved Hide resolved

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved

JulienB-78 and others added 4 commits August 8, 2021 14:40

Update sklearn/tests/test_calibration.py

ea8acd3

add stratify=y when splitting train and test sets Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update doc/whats_new/v1.0.rst

40fc953

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Merge remote-tracking branch 'upstream/main' into fix_calibrationclas…

bd4045e

…sifier_with_weights

Merge branch 'fix_calibrationclassifier_with_weights' of github.com:J…

916eeb3

…ulienB-78/scikit-learn into fix_calibrationclassifier_with_weights

glemaitre requested changes Sep 16, 2021

View reviewed changes

glemaitre added this to the 1.0.1 milestone Sep 16, 2021

JulienB-78 added 6 commits September 22, 2021 20:44

Merge remote-tracking branch 'upstream/main' into fix_calibrationclas…

ee92ef3

…sifier_with_weights

Added standardscaler before using SVC

da6970a

Added test to check that results are similar with ensemble is False or True

Merge remote-tracking branch 'upstream/main' into fix_calibrationclas…

421b992

…sifier_with_weights

edited whats_new/v1.0.rst

4f93e38

correcting whats_new/v1.0.rst

57d3aa6

correcting whats_new/v1.0.rst

1d0187b

glemaitre self-requested a review September 24, 2021 08:40

glemaitre reviewed Sep 24, 2021

View reviewed changes

sklearn/tests/test_calibration.py Outdated Show resolved Hide resolved

glemaitre reviewed Sep 24, 2021

View reviewed changes

glemaitre changed the title ~~Fix CalibratedClassifierCV to handle correctly sample_weight when ensemble=False~~ FIX CalibratedClassifierCV to handle correctly sample_weight when ensemble=False Sep 24, 2021

JulienB-78 and others added 3 commits September 26, 2021 09:56

Update sklearn/tests/test_calibration.py

81e8bd3

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update sklearn/tests/test_calibration.py

cf99fba

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update sklearn/tests/test_calibration.py

e719952

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

JulienB-78 and others added 3 commits September 26, 2021 09:57

Update sklearn/tests/test_calibration.py

71de6f7

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update sklearn/tests/test_calibration.py

9b021c3

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

correcting indentation

734870c

glemaitre reviewed Sep 27, 2021

View reviewed changes

glemaitre self-requested a review September 27, 2021 14:27

ogrisel mentioned this pull request Sep 29, 2021

TST check equivalence sample_weight in CalibratedClassifierCV #21179

Merged

2 tasks

glemaitre closed this Sep 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX CalibratedClassifierCV to handle correctly sample_weight when ensemble=False #20638

FIX CalibratedClassifierCV to handle correctly sample_weight when ensemble=False #20638

JulienB-78 commented Jul 30, 2021

JulienB-78 commented Jul 30, 2021

glemaitre commented Jul 31, 2021 •

edited

Loading

glemaitre left a comment

JulienB-78 commented Sep 10, 2021

glemaitre left a comment

glemaitre Sep 16, 2021

JulienB-78 Sep 22, 2021

glemaitre Sep 23, 2021

JulienB-78 Sep 23, 2021

glemaitre Sep 24, 2021

glemaitre left a comment

glemaitre Sep 24, 2021

glemaitre Sep 24, 2021

glemaitre Sep 24, 2021

glemaitre Sep 24, 2021

glemaitre Sep 24, 2021

glemaitre Sep 27, 2021

glemaitre Sep 27, 2021

glemaitre Sep 27, 2021

glemaitre Sep 27, 2021

glemaitre Sep 27, 2021

glemaitre Sep 27, 2021

glemaitre commented Sep 29, 2021

-    scaler = StandardScaler()
+    # FIXME: ideally we should create a `Pipeline` with the `StandardScaler`
+    # followed by the `LinearSVC`. However, `Pipeline` does not expose
+    # `sample_weight` and it will be silently ignored.
+    scaler = StandardScaler()

	sample_weight = 9 * (y == 0) + 1
	weights = np.array([0.9, 0.1])
	sample_weight = weights[(y == 1).astype(np.int64)]

FIX CalibratedClassifierCV to handle correctly sample_weight when ensemble=False #20638

FIX CalibratedClassifierCV to handle correctly sample_weight when ensemble=False #20638

Conversation

JulienB-78 commented Jul 30, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

JulienB-78 commented Jul 30, 2021

glemaitre commented Jul 31, 2021 • edited Loading

glemaitre left a comment

Choose a reason for hiding this comment

JulienB-78 commented Sep 10, 2021

glemaitre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented Sep 29, 2021

glemaitre commented Jul 31, 2021 •

edited

Loading