-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Uses _openmp_effective_n_threads to get the number of threads in HistGradientBoosting* #20477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Uses _openmp_effective_n_threads to get the number of threads in HistGradientBoosting* #20477
Conversation
8c788e4
to
2d66bc6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for fixing this!
Just a few suggestions:
Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>
I made a slight change at bec6f51 where the caller needs to pass in |
Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>
61775f7
to
645cb9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Just a few remarks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @thomasjpfan !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @thomasjpfan and sorry for the late review.
I think we should document this in our parallelization docs: https://scikit-learn.org/stable/computing/parallelism.html#openmp-based-parallelism
Also, for consistency, shouldn't we start using _openmp_effective_n_threads
in all other estimators as well? This would make the documentation easier to write, which is usually a good indicator that it's something reasonable to do
avoids performance problems caused by over-subscription when using those | ||
classes in a docker container for instance. :pr:`20477` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when using those classes in a docker container for instance
This is unrelated to docker, isn't it?
Some docker images will set CPU quotas, but some don't. I think that the over-generalization to all docker containers is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this only affects linux machines right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this only affects linux machines right?
This will affect most machines because: the docker deamon needs a Linux kernel (to use cgroups and other features of it); this kernel generally is the host's OS's, or a virtual machine's running Linux.
I think Docker started developing support for Windows-native images, but this is rather niche.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment about linux was dissociated from the one about docker.
Basically, we should probably clarify that this change does not affect Windows or OSX users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it still affect Windows or OSX users because a Linux VM has to be used for Docker in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not talking about docker.
I am saying that as far as I understand, this entry will not affect users using scikit-learn on Windows or OSX.
(as long as they don't use docker, which is probably the vast majority of them).
@@ -262,6 +262,13 @@ Changelog | |||
:mod:`sklearn.ensemble` | |||
....................... | |||
|
|||
- |Enhancement| :class:`~sklearn.ensemble.HistGradientBoostingClassifier` and | |||
:class:`~sklearn.ensemble.HistGradientBoostingRegressor` take cgroups quotas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth adding a link to https://en.wikipedia.org/wiki/Cgroups for further context. I'm not sure we can expect readers to know what cgroups is
…HistGradientBoosting* (scikit-learn#20477)
Reference Issues/PRs
Fixes #16016
What does this implement/fix? Explain your changes.
This PR uses
_openmp_effective_n_threads
to assignnum_threads
inprange
.TreeGrower
,Splitter
, andHistogramBuilder
,_BinMapper
,BaseLoss
,n_threads
is set during__init__
._openmp_effective_n_threads
is queried and passed in because its environment can be different, this logic is in:_predict_iterations
.