Skip to content

ENH Adds class_weight to HistGradientBoostingClassifier #22014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Fixes #14735

What does this implement/fix? Explain your changes.

This PR adds class_weights to HistGradientBoostingClassifier.

Any other comments?

A _finalize_sample_weight is added to BaseHistGradientBoosting which is used by HistGradientBoostingClassifier to modify or return sample weights based on class_weight.

@thomasjpfan thomasjpfan changed the title ENH Adds class_weights to HistGradientBoostingClassifier ENH Adds class_weight to HistGradientBoostingClassifier Dec 17, 2021
@ogrisel
Copy link
Member

ogrisel commented Jan 14, 2022

Thanks for the PR. While reviewing #17541 I did some experiment with a synthetic imbalanced dataset with RFs or nonlinear preproc + logistic regression, or even just logreg, and each time it seemed that class_weight="balanced" would either change nothing or actually hurt the performance, both for threshold-insensitive metrics (e.g. ROC AUC and average precision), log loss, Brier score or threshold sensitive metrics balanced accuracy, f1 score, matthrews corrcoef and others.

So while I am not opposed to implement this for scikit-learn's HGBDT, I think we should better warn the users that class_weight="balanced" is not necessarily a good idea.

Furthermore, there exist a weird pitfall with class_weight="balanced" documented in #10233 although I can probably not affect HGBDT as they are likely to be underfitting to the point of just learning an constant intercept.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this PR. The concerns expressed above a probably better dealt with there own independent PR.

@cmarmo cmarmo added this to the 1.2 milestone May 16, 2022
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I synced with main and updated to target 1.2. LGTM, thanks @thomasjpfan

@jeremiedbb jeremiedbb merged commit 7fda68d into scikit-learn:main Sep 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement class_weight in HistGradientBoostingClassifier
4 participants