Skip to content

Oversubscription in HistGradientBoosting with pytest-xdist #15078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rth opened this issue Sep 24, 2019 · 11 comments
Closed

Oversubscription in HistGradientBoosting with pytest-xdist #15078

rth opened this issue Sep 24, 2019 · 11 comments

Comments

@rth
Copy link
Member

rth commented Sep 24, 2019

When running tests with pytest-xdist on a machine with 12 (physical) CPU machine, the use of OpenMP in HistGradientBoosting seem to lead to significant over-subscription,

pytest sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py  -v

for me takes 0.85s. This runs 2 docstrings on training GBDT classifier and regressor on iris and boston datasets respectively.

  • Running thin on 2 parallel processes with (-n 2) takes 56s (and 50 threads are created).
  • Running with 2 processes and OMP_NUM_THREADS=2 takes 0.52s

While I understand the case of catastrophic oversubscription when N_CPU_THREADS**2 threads are created on a machine with many cores, here we create 2*N_CPU_THREADS only as compared to 1*N_CPU_THREADS and get a 10x slowdown.

Can someone reproduce it? Here using scikit-learn master, and a conda env on Linux with latest numpy scipy nomkl python=3.7.

Because pytest-xdist uses its own parallelism system (not sure what it does exactly) I guess this won't be addressed by threadpoolctl #14979?

Edit: Originally reported in https://github.com/tomMoral/loky/issues/224

@ogrisel
Copy link
Member

ogrisel commented Oct 2, 2019

There is something weird. On my laptop (2 cores, 4 hyperthreads):

  • pytest -v sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => 0.70s (no xdist)
  • pytest -v -n 1 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => 1.46s (1 xdist worker)
  • pytest -v -n 2 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => 11s to 48s (2 xdist workers)
  • OMP_NUM_THREADS=2 pytest -v -n 2 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => 1.15s
  • OMP_NUM_THREADS=4 pytest -v -n 2 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => between 7.8s and 34s

So this seems to be a really extreme case of over-subscription.

@ogrisel
Copy link
Member

ogrisel commented Oct 2, 2019

I have also tried to use -d --tx 2*popen//python=python instead of -n 2 and I get similar results.

@ogrisel
Copy link
Member

ogrisel commented Oct 2, 2019

I have also tried to add -k Classifier to only run the classification test and even with OMP_NUM_THREADS=4 and 2 xdist workers the run time does not go beyond 1.5s (because there is only 1 worker running the test, the other is idle.

@ogrisel
Copy link
Member

ogrisel commented Oct 2, 2019

I don't see any easy way to workaround this issue. I think we should just set the OMP_NUM_THREADS=1 / OPENBLAS_NUM_THREADS=1 / MKL_NUM_THREADS=1 environment variables in the CI configuration.

@rth
Copy link
Member Author

rth commented Oct 2, 2019

Thanks for investigating @ogrisel and @tomMoral !

I know, it's not really a big issue, more some evidence about how things could go wrong when using different parallelism mechanisms is seemingly innocuous settings. It's also interested that this happens with OpenMP in gradient boosting but not threading in BLAS.

Closing as "wont fix", with the solution being to use a smaller OMP_NUM_THREADS as indicated above.

@rth rth closed this as completed Oct 2, 2019
@ogrisel
Copy link
Member

ogrisel commented Oct 2, 2019

The fact that is so catastrophic even on a small number of cores is intriguing though. @jeremiedbb @NicolasHug maybe you have an idea why this is happening more specifically for HistGradientBoostingClassifier/Regressor?

I wonder why we don't have a similarly scaled over-subscription problem with MKL or OpenBLAS thread pools.

@NicolasHug
Copy link
Member

I can reproduce everything on my laptop.

from sklearn.experimental import enable_hist_gradient_boosting  # noqa
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.ensemble._hist_gradient_boosting.utils import (
    get_equivalent_estimator)
from sklearn.datasets import load_boston, load_iris


def test_reg():

    X, y = load_boston(return_X_y=True)
    est = HistGradientBoostingRegressor()
    # est = get_equivalent_estimator(est, lib='lightgbm')
    est.fit(X, y)
    est.score(X, y)

def test_classif():

    X, y = load_iris(return_X_y=True)
    est = HistGradientBoostingClassifier(loss='categorical_crossentropy')
    # est = get_equivalent_estimator(est, lib='lightgbm')
    est.fit(X, y)
    est.score(X, y)

As noted by Olivier commenting out one of the tests will make it run fast.

Note that LightGBM estimators also have a big slow-down with -n 2, but much less (about 10s).

@ogrisel
Copy link
Member

ogrisel commented Oct 2, 2019

I observed a huge variability across oversubscribed runs, typically between 10s and 50s with -n 2 and OMP_NUM_THREADS=4 on my laptop with 2 cores 4 hyperthreads.

@thomasjpfan
Copy link
Member

thomasjpfan commented Oct 2, 2019

I am considering using a pytest.mark to label tests as "serial" and run pytest with and without -n.

Reference: pytest-dev/pytest-xdist#385 (comment)

Edit: For the CI

@NicolasHug
Copy link
Member

Regardless of oversubscription, I observed a pretty high variability in the hist estimators too. Typically when running one of the benchmarks (higgs for example), the first few runs are always significantly slower than the subsequent ones :/

It's not the case for the LGBM estimators

@ogrisel
Copy link
Member

ogrisel commented Oct 2, 2019

I am considering using a pytest.mark to label tests as "serial" and run pytest with and without -n.
Reference: pytest-dev/pytest-xdist#385 (comment)

That's a possibility indeed.

Note that I have just encountered another issue with pytest-xdist in this PR #14264.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants