Skip to content

Implement class_weight in HistGradientBoostingClassifier #14735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
plpxsk opened this issue Aug 22, 2019 · 6 comments · Fixed by #22014
Closed

Implement class_weight in HistGradientBoostingClassifier #14735

plpxsk opened this issue Aug 22, 2019 · 6 comments · Fixed by #22014

Comments

@plpxsk
Copy link
Contributor

plpxsk commented Aug 22, 2019

Just a reminder/note to implement class_weight [="balanced"] for this new algorithm.

Looking forward to when this will be implemented.

If anyone has any interim suggestions for dealing with imbalanced data with this algorithm, include them below.

Cheers.

@Sandy4321
Copy link

great idea,
they really need to do this
meantime you can use
https://imbalanced-learn.org/stable/auto_examples/applications/plot_impact_imbalanced_classes.html

@NicolasHug
Copy link
Member

sample weights support will be available in 0.23 which should be out soon.
So you can have class weights through that.

Just a reminder

they really need to do this

Let's avoid that kind of phrasing please.

@Sandy4321
Copy link

great news thanks
by the way will this coming solution to have the same problem what lightgbm and xgboost have?
microsoft/LightGBM#2107
scale_pos_weight is used to adjust the imbalance will spoil probability?

@Sandy4321
Copy link

as it is written : result in poor estimates of the individual class probabilities
https://lightgbm.readthedocs.io/en/latest/Parameters.html

is_unbalance 🔗︎, default = false, type = bool, aliases: unbalance, unbalanced_sets

used only in binary and multiclassova applications

set this to true if training data are unbalanced

Note: while enabling this should increase the overall performance metric of your model, it will also result in poor estimates of the individual class probabilities

@Sandy4321
Copy link

or
consider performing probability calibration (https://scikit-learn.org/stable/modules/calibration.html)

https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html
class_weight (dict, 'balanced' or None, optional (default=None)) – Weights associated with classes in the form {class_label: weight}. Use this parameter only for multi-class classification task; for binary classification task you may use is_unbalance or scale_pos_weight parameters. Note, that the usage of all these parameters will result in poor estimates of the individual class probabilities. You may want to consider performing probability calibration (https://scikit-learn.org/stable/modules/calibration.html) of your model. The ‘balanced’ mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)). If None, all classes are supposed to have weight one. Note, that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified

@zeromh
Copy link

zeromh commented Dec 10, 2021

Came here to look for this Issue and add my +1!

@NicolasHug wrote:

sample weights support will be available in 0.23 which should be out soon. So you can have class weights through that.

Happy to have sample_weights support, but that doesn't really help for my use case. Class weights generally need to be tuned like a hyperparameter, and I can't do that using, say, GridSearchCV because there is no class_weight param.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants