-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX CalibratedClassifierCV to handle correctly sample_weight when ensemble=False #20638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX CalibratedClassifierCV to handle correctly sample_weight when ensemble=False #20638
Conversation
ensemble=False + updating the corresponding test
There are currently 4 check failures:
Am I supposed to fix them? |
The failure regarding Please add an entry to the change log at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good. I am thinking that we could add an additional check
add stratify=y when splitting train and test sets Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…sifier_with_weights
…ulienB-78/scikit-learn into fix_calibrationclassifier_with_weights
The pull request is still open and seems stuck. I asked questions about a month ago regarding some requests that were made and received no answer. (summer break I guess) What's next? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will tag this PR for 1.0.1 to integrate it in the next bug fix release
doc/whats_new/v1.0.rst
Outdated
@@ -146,6 +146,10 @@ Changelog | |||
as `base_estimator` in ::class:`calibration.CalibratedClassifierCV`. | |||
:pr:`20087` by :user:`Clément Fauchereau <clement-f>`. | |||
|
|||
- |Fix| Fixed ::class:`calibration.CalibratedClassifierCV` to handle correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to move this fix under a section 1.0.1 because it will be a bug fix that will be integrated in the next bug fix release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1.0.1.rst is not created yet.
I should wait for its creation to add:
- |Fix| Fixed :class:
calibration.CalibratedClassifierCV
to handle correctly
sample_weight
whenensemble=False
.
:pr:20638
by :user:Julien Bohné <JulienB-78>
.
or there is another way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v1.0.1.rst is not created yet.
It should be in v1.0.rst
file. It will only be a new section. You can have a look at the 0.24.rst
file to see the 0.24.2
section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created the 1.0.1 section.
Some builds fail but I guess it is unrelated. Should I just wait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, I relaunch the build. It was due to an error at the install of the docker file. This is a random failure on Azure side I assume.
…sifier_with_weights
Added test to check that results are similar with ensemble is False or True
…sifier_with_weights
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so here are a couple of more comments and I found an underlying quite important bug indeed. I will open a subsequent issue.
X, y = make_blobs((100, 1000), center_box=(-1, 1), random_state=42) | ||
|
||
# Compute weigths to compensate the unbalance of the dataset | ||
sample_weight = 9 * (y == 0) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you make the same change
|
||
@pytest.mark.parametrize("method", ["sigmoid", "isotonic"]) | ||
def test_sample_weight_class_imbalanced_ensemble_equivalent(method): | ||
X, y = make_blobs((100, 1000), center_box=(-1, 1), random_state=42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a small docstring mentioning what do we try to achieve here
X, y, sample_weight, stratify=y, random_state=42 | ||
) | ||
|
||
scaler = StandardScaler() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can add a comment as before
scaler = StandardScaler() | ||
X_train = scaler.fit_transform( | ||
X_train | ||
) # compute mean, std and transform training data as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the comment
scaler = StandardScaler() | ||
X_train = scaler.fit_transform( | ||
X_train | ||
) # compute mean, std and transform training data as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can new remove this comment as well
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@@ -166,6 +166,12 @@ def test_sample_weight(data, method, ensemble): | |||
X_train, y_train, sw_train = X[:n_samples], y[:n_samples], sample_weight[:n_samples] | |||
X_test = X[n_samples:] | |||
|
|||
scaler = StandardScaler() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scaler = StandardScaler() | |
# FIXME: ideally we should create a `Pipeline` with the `StandardScaler` | |
# followed by the `LinearSVC`. However, `Pipeline` does not expose | |
# `sample_weight` and it will be silently ignored. | |
scaler = StandardScaler() |
X, y = make_blobs((100, 1000), center_box=(-1, 1), random_state=42) | ||
|
||
# Compute weigths to compensate the unbalance of the dataset | ||
sample_weight = 9 * (y == 0) + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_weight = 9 * (y == 0) + 1 | |
weights = np.array([0.9, 0.1]) | |
sample_weight = weights[(y == 1).astype(np.int64)] |
|
||
# Compute weights to compensate for the unbalance of the dataset | ||
weights = np.array([0.9, 0.1]) | ||
sample_weight = weights[(y == 1).astype(int)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_weight = weights[(y == 1).astype(int)] | |
sample_weight = weights[(y == 1).astype(np.int64)] |
|
||
# Compute weights to compensate for the unbalance of the dataset | ||
weights = np.array([0.9, 0.1]) | ||
sample_weight = weights[(y == 1).astype(int)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_weight = weights[(y == 1).astype(int)] | |
sample_weight = weights[(y == 1).astype(np.int64)] |
scaler = StandardScaler() | ||
X_train = scaler.fit_transform( | ||
X_train | ||
) # compute mean, std and transform training data as well | ||
X_test = scaler.transform(X_test) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scaler = StandardScaler() | |
X_train = scaler.fit_transform( | |
X_train | |
) # compute mean, std and transform training data as well | |
X_test = scaler.transform(X_test) | |
# FIXME: ideally we should create a `Pipeline` with the `StandardScaler` | |
# followed by the `LinearSVC`. However, `Pipeline` does not expose | |
# `sample_weight` and it will be silently ignored. | |
scaler = StandardScaler() | |
X_train = scaler.fit_transform(X_train) | |
X_test = scaler.transform(X_test) |
@JulienB-78 While reviewing your PR, I found some other bugs and to find those bugs, I wrote a couple of tests that were better suited to test the I will close this PR since we need your fix at the same time as the fix in the sigmoid calibration. I will close this PR then but you will get acknowledged in the other PR. We will have to open another PR most probably to check the behaviour with the |
Reference Issues/PRs
Fixes #20610
What does this implement/fix? Explain your changes.
sample_weight
is now passed tocross_val_predict
via thefit_params
dict such that the weights are used when training the classifier for each cv split.The test checking that the calibration is correct when passing
sample_weight
has been updated to verify that the calibration is actually effective. Previously, the test was only checking that the predictions ofCalibratedClassifierCV
with and withoutsample_weight
were different but not that the calibration was actually working.Any other comments?