[MRG] Add class_weight parameter to CalibratedClassifierCV #17541

amascia · 2020-06-09T10:00:07Z

Reference Issues/PRs

To the best of my knowledge, no issues/PRs are addressing this added feature.

What does this implement/fix? Explain your changes.

This PR adds the class_weight parameter to the CalibratedClassifierCV class to supplement the already existing sample_weight parameter in its fit method.
Passing class_weight='balanced' to CalibratedClassifierCV will balance the dataset on which the calibration method (sigmoid or isotonic) will be trained (if no sample_weight is passed).

Any other comments?

Please find attached 4 scripts that test this added parameter : 2 for binary classifications with cv=prefit and cv=2 and 2 for multiclass classifications with cv=prefit and cv=2.

class_weighted_calibratedClassifierCV.zip

# Conflicts: # sklearn/calibration.py

This reverts commit 6af110d.

adrinjalali · 2020-07-14T16:49:42Z

Overall this looks okay. Try merging with the master branch and some of the tests should pass.

You can also parameterize and refactor the test.

It would also be nice if you could add a section to the user guide or add an example to the example gallery regarding the effect of this parameter.

# Conflicts: # doc/whats_new/v0.24.rst

amascia · 2020-07-17T16:28:33Z

Hi,
Thank you for your comments, I've merged my branch with sklearn's master branch and I've updated the user guide with this added feature in the calibration section. Please tell me if there is anything else to do.

adrinjalali

Tests are failing.

doc/modules/calibration.rst

sklearn/calibration.py

adrinjalali · 2020-08-13T16:05:55Z

@joshuacwnewton would you feel comfortable reviewing this one?

joshuacwnewton

Please note: I'm not a core maintainer. I'm leaving my review to help move things along, but two core maintainers will need to approve this PR for it to be merged. Thanks for your patience!

Thanks for the contribution, @amascia! This is a great start. I've left a few suggestions below. 🙂

I've also noticed that some continuous integration tests are failing for this PR. To fix some of the tests, please see my comments below. However, for some of the other tests, I think the FutureWarnings come from #16474, which added a @_deprecate_positional_args decorator to CalibratedClassifierCV. In other words, I believe those other failing tests are unrelated to your changes.

sklearn/calibration.py

sklearn/tests/test_calibration.py

doc/whats_new/v0.24.rst

# Conflicts: # sklearn/calibration.py # sklearn/tests/test_calibration.py

… in calibration

amascia · 2020-09-01T11:09:02Z

I have updated the PR with your comments, thank you @joshuacwnewton and @adrinjalali

jjerphan

LGTM.
Thank you, @amascia. This is a nice first time contribution.

sklearn/calibration.py

cmarmo · 2021-12-16T04:27:24Z

@lucyleeow might be interested in reviewing?

lucyleeow

This is a useful addition, thanks! Just had some questions.

doc/modules/calibration.rst

sklearn/calibration.py

cmarmo

Thanks @amascia . I have some comments about documentation. I'm afraid I don't have "acceptance clearance".
Perhaps @adrinjalali who first checked your PR might want to finalize the review? Thanks!

sklearn/calibration.py

doc/modules/calibration.rst

ogrisel · 2022-01-13T10:58:26Z

Thanks to both the author and the reviewers for the PR. It looks very clean. However, it's not obvious to me as to why passing class_weight="balanced" to the calibrator is a good idea from a mathematical point of view. So before considering merging the I would really like to see an empirical study that shows that passing class_weight="balanced" to the calibrator (especially with cv="prefit") is actually improving the shape of a calibration curve (both with uniform and quantile binning) and the Brier score of a binary classifier it-self fitted with class_weight="balanced".

To make the study interesting the dataset should be large enough with a severe class imbalance (e.g. 1 to 10 or more between the positive and binary class) and not too easy. One way to generate such a dataset would be to use something like:

X, y = make_classification(
    n_samples=100_000,
    weights=[0.95, 0.05],  # ~20x more negative than positive samples
    n_features=10,
    n_informative=10,
    n_redundant=0,
    n_repeated=0,
)
X = X[:, :8]  # hide some informative features to make the problem more difficult

Then one could evaluate estimators such as RandomForestClassifier or make_pipeline(SplineTransformer(), LogisticRegression()).

Ideally, this study can be done with a notebook shared on gist.github.com.

If we fine situations where passing class_weight="balanced" to the calibrator improves the shape of the calibration curve or calibration-sensitive metrics such as the log loss or the Brier score (or threshold-sensitive "hard" metrics such as f1_score), then we can probably proceed to merge the PR as is, maybe after converting the notebook into an example.

If on the other hand with find out that passing class_weight="balanced" is always detrimental to calibration, I think I would still be in favor of accepting the PR but we should consider documenting in the docstring that using class_weight to calibrated classifier is rarely a good way to improve the calibration of classifiers for imbalanced classification problems.

In my experience class_weight="balanced" to the raw classifiers (without CalibratedClassifierCV) themselves does not seem to make a significant change in in the Brier score of such imbalanced datasets with the classifier proposed above (either it's in the CV sampling noise or it is actually detrimental!). So we are already misleading our users with other part of scikit-learn with this class_weight="balanced" option.

Somewhat related discussion: #13227 (in particular the example).

adrinjalali · 2022-04-07T14:11:26Z

anybody working on the study? removing from the milestone for now, but happy to be pinged for reviews once the study is done.

@ogrisel could you please write why you think it might not be mathematically sane?

daustria · 2024-08-16T22:26:51Z

is this stalled PR available? would like to try and pick up the work , making the study described above and putting @amascia 's changes in the new codebase (ive already taken a look and it seems it can go in without many modifications)

lucyleeow · 2024-08-17T07:48:26Z

Hi @daustria, please take a look at #17541 (comment), this PR is stalled but there is uncertainty about whether we add this feature, which needs to be decided first. Thank you.

daustria · 2024-08-17T11:32:26Z

yes, i saw the comment and i was thinking of trying my hand and making such a study. unless maybe it is not something for volunteers to do?

edit: no longer working on this, feel free to take over the work

lucyleeow · 2024-08-17T15:37:49Z

Please go ahead, look forward to seeing your findings!

Achille Mascia and others added 15 commits March 27, 2020 09:53

first working version

8bdabc7

minor change

b2f8ec0

handles multiclass classifiaction

0ff63f9

added documentation for class_weight attribute

3559ab5

Merge branch 'master' into balanced_calibrated_classifier

a5d925e

Merge branch 'master' into balanced_calibrated_classifier

5a85a32

# Conflicts: # sklearn/calibration.py

added test for CalibratedClassifierCV with class_weight

9e3f002

Merge branch 'master' into balanced_calibrated_classifier

0c4236a

Merge branch 'master' into balanced_calibrated_classifier

88d0b90

Merge branch 'master' into balanced_calibrated_classifier

de249c8

minor doc change

6af110d

Revert "minor doc change"

0d299df

This reverts commit 6af110d.

class_weight doc change

4a2b549

added change to changelog

808f24e

Merge branch 'master' into balanced_calibrated_classifier

cc3a1f7

amascia added 3 commits July 17, 2020 11:30

Merge branch 'master' into balanced_calibrated_classifier

362d67d

# Conflicts: # doc/whats_new/v0.24.rst

updated user guide with new feature

6e39e10

Merge branch 'master' into balanced_calibrated_classifier

f4c9961

adrinjalali reviewed Jul 20, 2020

View reviewed changes

doc/modules/calibration.rst Outdated Show resolved Hide resolved

doc/modules/calibration.rst Outdated Show resolved Hide resolved

doc/modules/calibration.rst Outdated Show resolved Hide resolved

sklearn/calibration.py Outdated Show resolved Hide resolved

amascia added 2 commits July 23, 2020 11:20

merged upstream/master

a93acfc

updated calibration user guide and handled failed test

72a175c

amascia commented Jul 23, 2020

View reviewed changes

sklearn/calibration.py Outdated Show resolved Hide resolved

amascia requested a review from adrinjalali August 3, 2020 12:48

joshuacwnewton reviewed Aug 15, 2020

View reviewed changes

sklearn/calibration.py Outdated Show resolved Hide resolved

sklearn/calibration.py Outdated Show resolved Hide resolved

sklearn/tests/test_calibration.py Outdated Show resolved Hide resolved

doc/whats_new/v0.24.rst Outdated Show resolved Hide resolved

amascia added 2 commits September 1, 2020 11:39

Merge branch 'master' into balanced_calibrated_classifier

3d5d807

# Conflicts: # sklearn/calibration.py # sklearn/tests/test_calibration.py

parameterized tests in test_calibration, updated doc for class_weight…

4dd0277

… in calibration

jjerphan approved these changes Nov 3, 2021

View reviewed changes

jjerphan mentioned this pull request Nov 4, 2021

TST Check correct interactions of class_weight and sample_weight #21504

Open

10 tasks

cmarmo added the module:calibration label Dec 16, 2021

cmarmo reviewed Dec 16, 2021

View reviewed changes

sklearn/calibration.py Show resolved Hide resolved

lucyleeow reviewed Dec 18, 2021

View reviewed changes

doc/modules/calibration.rst Outdated Show resolved Hide resolved

sklearn/calibration.py Outdated Show resolved Hide resolved

sklearn/calibration.py Outdated Show resolved Hide resolved

cmarmo removed the Waiting for Reviewer label Jan 5, 2022

merged upstream/main and updated with reviewer comments

cd24a01

amascia requested review from lucyleeow and cmarmo January 10, 2022 10:39

amascia commented Jan 10, 2022

View reviewed changes

sklearn/calibration.py Show resolved Hide resolved

cmarmo reviewed Jan 10, 2022

View reviewed changes

sklearn/calibration.py Outdated Show resolved Hide resolved

doc/modules/calibration.rst Outdated Show resolved Hide resolved

cmarmo added the Waiting for Reviewer label Jan 10, 2022

cmarmo added this to the 1.1 milestone Jan 10, 2022

amascia added 3 commits January 11, 2022 15:28

Merge branch 'main' into balanced_calibrated_classifier

64b7b68

update docs

7b0d6c3

Merge branch 'main' into balanced_calibrated_classifier

20b1665

This was referenced Jan 13, 2022

[WIP] Make it possible to pass an arbitrary classifier as method for CalibratedClassifierCV #22010

Draft

ENH Adds class_weight to HistGradientBoostingClassifier #22014

Merged

jeremiedbb added Needs work and removed Waiting for Reviewer labels Mar 22, 2022

adrinjalali removed this from the 1.1 milestone Apr 7, 2022

cmarmo added Stalled help wanted labels May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Add class_weight parameter to CalibratedClassifierCV #17541

[MRG] Add class_weight parameter to CalibratedClassifierCV #17541

amascia commented Jun 9, 2020

adrinjalali commented Jul 14, 2020

amascia commented Jul 17, 2020

adrinjalali left a comment

adrinjalali commented Aug 13, 2020

joshuacwnewton left a comment

amascia commented Sep 1, 2020

jjerphan left a comment

cmarmo commented Dec 16, 2021

lucyleeow left a comment

cmarmo left a comment

ogrisel commented Jan 13, 2022 •

edited

Loading

adrinjalali commented Apr 7, 2022

daustria commented Aug 16, 2024

lucyleeow commented Aug 17, 2024 •

edited

Loading

daustria commented Aug 17, 2024 •

edited

Loading

lucyleeow commented Aug 17, 2024

[MRG] Add class_weight parameter to CalibratedClassifierCV #17541

Are you sure you want to change the base?

[MRG] Add class_weight parameter to CalibratedClassifierCV #17541

Conversation

amascia commented Jun 9, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

adrinjalali commented Jul 14, 2020

amascia commented Jul 17, 2020

adrinjalali left a comment

Choose a reason for hiding this comment

adrinjalali commented Aug 13, 2020

joshuacwnewton left a comment

Choose a reason for hiding this comment

amascia commented Sep 1, 2020

jjerphan left a comment

Choose a reason for hiding this comment

cmarmo commented Dec 16, 2021

lucyleeow left a comment

Choose a reason for hiding this comment

cmarmo left a comment

Choose a reason for hiding this comment

ogrisel commented Jan 13, 2022 • edited Loading

adrinjalali commented Apr 7, 2022

daustria commented Aug 16, 2024

lucyleeow commented Aug 17, 2024 • edited Loading

daustria commented Aug 17, 2024 • edited Loading

lucyleeow commented Aug 17, 2024

ogrisel commented Jan 13, 2022 •

edited

Loading

lucyleeow commented Aug 17, 2024 •

edited

Loading

daustria commented Aug 17, 2024 •

edited

Loading