Skip to content

MIN_CAT_SUPPORT in HGBT #19008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lorentzenchr opened this issue Dec 15, 2020 · 2 comments
Open

MIN_CAT_SUPPORT in HGBT #19008

lorentzenchr opened this issue Dec 15, 2020 · 2 comments
Labels
help wanted Moderate Anything that requires some knowledge of conventions and best practices module:ensemble Needs Benchmarks A tag for the issues and PRs which require some benchmarks

Comments

@lorentzenchr
Copy link
Member

Background

One leftover of categorical support in HistGradientBoostingRegressor and HistGradientBoostingClassifier, merged in #18394, is the MIN_CAT_SUPPORT in splitting.pyx, see #18394 (comment). It acts as a smoothing parameter to categorical variables and is currently fixed to the value 10.

Proposal

I propose to investigate the impact of MIN_CAT_SUPPORT. In case of a larger impact it would be desirable to pass this as an option to the user.

Additional context

AFAIK, very little reproducible results are known to date. For references, see the comment linked above.

@lorentzenchr lorentzenchr added New Feature Moderate Anything that requires some knowledge of conventions and best practices help wanted Needs Benchmarks A tag for the issues and PRs which require some benchmarks labels Dec 15, 2020
@lorentzenchr
Copy link
Member Author

@thomasjpfan @NicolasHug @ogrisel might be interested - at least in knowing this issue ;-)

@ogrisel
Copy link
Member

ogrisel commented Dec 15, 2020

Thanks @lorentzenchr, I would also be very interested in the results of such a study of the impact of this magic constant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Moderate Anything that requires some knowledge of conventions and best practices module:ensemble Needs Benchmarks A tag for the issues and PRs which require some benchmarks
Projects
Development

No branches or pull requests

4 participants