-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Add class_weight parameter to CalibratedClassifierCV #17541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[MRG] Add class_weight parameter to CalibratedClassifierCV #17541
Conversation
# Conflicts: # sklearn/calibration.py
This reverts commit 6af110d.
Overall this looks okay. Try merging with the master branch and some of the tests should pass. You can also parameterize and refactor the test. It would also be nice if you could add a section to the user guide or add an example to the example gallery regarding the effect of this parameter. |
# Conflicts: # doc/whats_new/v0.24.rst
Hi, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are failing.
@joshuacwnewton would you feel comfortable reviewing this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note: I'm not a core maintainer. I'm leaving my review to help move things along, but two core maintainers will need to approve this PR for it to be merged. Thanks for your patience!
Thanks for the contribution, @amascia! This is a great start. I've left a few suggestions below. 🙂
I've also noticed that some continuous integration tests are failing for this PR. To fix some of the tests, please see my comments below. However, for some of the other tests, I think the FutureWarnings
come from #16474, which added a @_deprecate_positional_args
decorator to CalibratedClassifierCV
. In other words, I believe those other failing tests are unrelated to your changes.
# Conflicts: # sklearn/calibration.py # sklearn/tests/test_calibration.py
I have updated the PR with your comments, thank you @joshuacwnewton and @adrinjalali |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thank you, @amascia. This is a nice first time contribution.
@lucyleeow might be interested in reviewing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a useful addition, thanks! Just had some questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amascia . I have some comments about documentation. I'm afraid I don't have "acceptance clearance".
Perhaps @adrinjalali who first checked your PR might want to finalize the review? Thanks!
Thanks to both the author and the reviewers for the PR. It looks very clean. However, it's not obvious to me as to why passing To make the study interesting the dataset should be large enough with a severe class imbalance (e.g. 1 to 10 or more between the positive and binary class) and not too easy. One way to generate such a dataset would be to use something like: X, y = make_classification(
n_samples=100_000,
weights=[0.95, 0.05], # ~20x more negative than positive samples
n_features=10,
n_informative=10,
n_redundant=0,
n_repeated=0,
)
X = X[:, :8] # hide some informative features to make the problem more difficult Then one could evaluate estimators such as Ideally, this study can be done with a notebook shared on gist.github.com. If we fine situations where passing If on the other hand with find out that passing In my experience Somewhat related discussion: #13227 (in particular the example). |
anybody working on the study? removing from the milestone for now, but happy to be pinged for reviews once the study is done. @ogrisel could you please write why you think it might not be mathematically sane? |
is this stalled PR available? would like to try and pick up the work , making the study described above and putting @amascia 's changes in the new codebase (ive already taken a look and it seems it can go in without many modifications) |
Hi @daustria, please take a look at #17541 (comment), this PR is stalled but there is uncertainty about whether we add this feature, which needs to be decided first. Thank you. |
yes, i saw the comment and i was thinking of trying my hand and making such a study. unless maybe it is not something for volunteers to do? edit: no longer working on this, feel free to take over the work |
Please go ahead, look forward to seeing your findings! |
Reference Issues/PRs
To the best of my knowledge, no issues/PRs are addressing this added feature.
What does this implement/fix? Explain your changes.
This PR adds the
class_weight
parameter to the CalibratedClassifierCV class to supplement the already existingsample_weight
parameter in itsfit
method.Passing
class_weight='balanced'
to CalibratedClassifierCV will balance the dataset on which the calibration method (sigmoid or isotonic) will be trained (if nosample_weight
is passed).Any other comments?
Please find attached 4 scripts that test this added parameter : 2 for binary classifications with
cv=prefit
andcv=2
and 2 for multiclass classifications withcv=prefit
andcv=2
.class_weighted_calibratedClassifierCV.zip