ENH Uses _openmp_effective_n_threads to get the number of threads in HistGradientBoosting* #20477

thomasjpfan · 2021-07-06T15:30:15Z

Reference Issues/PRs

Fixes #16016

What does this implement/fix? Explain your changes.

This PR uses _openmp_effective_n_threads to assign num_threads in prange.

For objects that are used openmp for fit: TreeGrower, Splitter, and HistogramBuilder, _BinMapper, BaseLoss, n_threads is set during __init__.
For the predictor, _openmp_effective_n_threads is queried and passed in because its environment can be different, this logic is in: _predict_iterations.

ogrisel

Thanks so much for fixing this!

Just a few suggestions:

doc/whats_new/v1.0.rst

sklearn/ensemble/_hist_gradient_boosting/binning.py

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

sklearn/ensemble/_hist_gradient_boosting/_binning.pyx

Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>

thomasjpfan · 2021-07-07T16:37:07Z

I made a slight change at bec6f51 where the caller needs to pass in n_threads to _predict_iterations. I made this change to reduce the number of calls to _openmp_effective_n_threads in _staged_raw_predict, which loops through all the predictors.

doc/whats_new/v1.0.rst

Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>

jjerphan

LGTM.

jeremiedbb

lgtm. Just a few remarks

doc/whats_new/v1.0.rst

sklearn/ensemble/_hist_gradient_boosting/binning.py

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

sklearn/ensemble/_hist_gradient_boosting/grower.py

jeremiedbb

LGTM, thanks @thomasjpfan !

NicolasHug

Thanks @thomasjpfan and sorry for the late review.

I think we should document this in our parallelization docs: https://scikit-learn.org/stable/computing/parallelism.html#openmp-based-parallelism

Also, for consistency, shouldn't we start using _openmp_effective_n_threads in all other estimators as well? This would make the documentation easier to write, which is usually a good indicator that it's something reasonable to do

NicolasHug · 2021-07-12T07:14:08Z

doc/whats_new/v1.0.rst

+  avoids performance problems caused by over-subscription when using those
+  classes in a docker container for instance. :pr:`20477`


when using those classes in a docker container for instance

This is unrelated to docker, isn't it?
Some docker images will set CPU quotas, but some don't. I think that the over-generalization to all docker containers is confusing.

Also, this only affects linux machines right?

Also, this only affects linux machines right?

This will affect most machines because: the docker deamon needs a Linux kernel (to use cgroups and other features of it); this kernel generally is the host's OS's, or a virtual machine's running Linux.

I think Docker started developing support for Windows-native images, but this is rather niche.

My comment about linux was dissociated from the one about docker.

Basically, we should probably clarify that this change does not affect Windows or OSX users.

I think that it still affect Windows or OSX users because a Linux VM has to be used for Docker in those cases.

I am not talking about docker.

I am saying that as far as I understand, this entry will not affect users using scikit-learn on Windows or OSX.
(as long as they don't use docker, which is probably the vast majority of them).

NicolasHug · 2021-07-12T07:25:25Z

doc/whats_new/v1.0.rst

@@ -262,6 +262,13 @@ Changelog
 :mod:`sklearn.ensemble`
 .......................

+- |Enhancement| :class:`~sklearn.ensemble.HistGradientBoostingClassifier` and
+  :class:`~sklearn.ensemble.HistGradientBoostingRegressor` take cgroups quotas


Might be worth adding a link to https://en.wikipedia.org/wiki/Cgroups for further context. I'm not sure we can expect readers to know what cgroups is

…HistGradientBoosting* (scikit-learn#20477)

ENH Uses _openmp_effective_n_threads to get the number of threads

2d66bc6

thomasjpfan force-pushed the hist_gradient_threads branch from 8c788e4 to 2d66bc6 Compare July 6, 2021 15:30

thomasjpfan added the module:ensemble label Jul 6, 2021

github-actions bot added the cython label Jul 6, 2021

DOC Adds PR number

52f60c2

ogrisel approved these changes Jul 7, 2021

View reviewed changes

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved

sklearn/ensemble/_hist_gradient_boosting/binning.py Show resolved Hide resolved

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py Outdated Show resolved Hide resolved

ogrisel reviewed Jul 7, 2021

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py Show resolved Hide resolved

ogrisel mentioned this pull request Jul 7, 2021

Use _openmp_effective_n_threads to take cgroups CPU quotas into account into account in Elkan and Minibatch KMeans #20483

Closed

jjerphan reviewed Jul 7, 2021

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/_binning.pyx Outdated Show resolved Hide resolved

thomasjpfan changed the title ~~ENH Uses _openmp_effective_n_threads to get the number of threads~~ ENH Uses _openmp_effective_n_threads to get the number of threads in HistGradientBoosting* Jul 7, 2021

thomasjpfan and others added 3 commits July 7, 2021 12:11

Update doc/whats_new/v1.0.rst

c392533

Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>

CLN Address comments

6313149

CLN Adds comment for prediciton vs fit time

bec6f51

ogrisel reviewed Jul 7, 2021

View reviewed changes

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved

Update doc/whats_new/v1.0.rst

645cb9b

Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>

thomasjpfan force-pushed the hist_gradient_threads branch from 61775f7 to 645cb9b Compare July 7, 2021 18:47

Merge remote-tracking branch 'upstream/main' into hist_gradient_threads

569bca6

jjerphan approved these changes Jul 8, 2021

View reviewed changes

jeremiedbb reviewed Jul 8, 2021

View reviewed changes

CLN Address comments

1feb57f

jeremiedbb approved these changes Jul 9, 2021

View reviewed changes

jeremiedbb merged commit 99410b1 into scikit-learn:main Jul 9, 2021

ogrisel mentioned this pull request Jul 9, 2021

Make HistGradientBoostingRegressor/Classifer use _openmp_effective_n_threads to set the default maximum number of threads to use #16016

Closed

NicolasHug reviewed Jul 12, 2021

View reviewed changes

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021

ENH Uses _openmp_effective_n_threads to set the number of threads in …

ae513e8

…HistGradientBoosting* (scikit-learn#20477)

ogrisel mentioned this pull request Jan 25, 2022

[MRG] ENH add option to set the number of OpenMP threads in HistGBDT #16070

Closed

ogrisel mentioned this pull request Feb 21, 2022

Unexpected slowness of code execution in the JupyterHub deployment (OpenMP oversubscription) INRIA/scikit-learn-mooc#586

Closed

		avoids performance problems caused by over-subscription when using those
		classes in a docker container for instance. :pr:`20477`

Uh oh!

ENH Uses _openmp_effective_n_threads to get the number of threads in HistGradientBoosting* #20477

ENH Uses _openmp_effective_n_threads to get the number of threads in HistGradientBoosting* #20477

Uh oh!

Conversation

thomasjpfan commented Jul 6, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thomasjpfan commented Jul 7, 2021

Uh oh!

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jul 12, 2021

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jul 12, 2021

Choose a reason for hiding this comment

Uh oh!

jjerphan Jul 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jul 12, 2021

Choose a reason for hiding this comment

Uh oh!

jjerphan Jul 12, 2021

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jul 12, 2021

Choose a reason for hiding this comment

Uh oh!

NicolasHug Jul 12, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jjerphan Jul 12, 2021 •

edited

Loading