Oversubscription in HistGradientBoosting with pytest-xdist #15078

rth · 2019-09-24T10:43:55Z

When running tests with pytest-xdist on a machine with 12 (physical) CPU machine, the use of OpenMP in HistGradientBoosting seem to lead to significant over-subscription,

pytest sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py  -v

for me takes 0.85s. This runs 2 docstrings on training GBDT classifier and regressor on iris and boston datasets respectively.

Running thin on 2 parallel processes with (-n 2) takes 56s (and 50 threads are created).
Running with 2 processes and OMP_NUM_THREADS=2 takes 0.52s

While I understand the case of catastrophic oversubscription when N_CPU_THREADS**2 threads are created on a machine with many cores, here we create 2*N_CPU_THREADS only as compared to 1*N_CPU_THREADS and get a 10x slowdown.

Can someone reproduce it? Here using scikit-learn master, and a conda env on Linux with latest numpy scipy nomkl python=3.7.

Because pytest-xdist uses its own parallelism system (not sure what it does exactly) I guess this won't be addressed by threadpoolctl #14979?

Edit: Originally reported in https://github.com/tomMoral/loky/issues/224

The text was updated successfully, but these errors were encountered:

ogrisel · 2019-10-02T12:42:16Z

There is something weird. On my laptop (2 cores, 4 hyperthreads):

pytest -v sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => 0.70s (no xdist)
pytest -v -n 1 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => 1.46s (1 xdist worker)
pytest -v -n 2 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => 11s to 48s (2 xdist workers)
OMP_NUM_THREADS=2 pytest -v -n 2 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => 1.15s
OMP_NUM_THREADS=4 pytest -v -n 2 sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py => between 7.8s and 34s

So this seems to be a really extreme case of over-subscription.

ogrisel · 2019-10-02T12:44:32Z

I have also tried to use -d --tx 2*popen//python=python instead of -n 2 and I get similar results.

ogrisel · 2019-10-02T12:48:21Z

I have also tried to add -k Classifier to only run the classification test and even with OMP_NUM_THREADS=4 and 2 xdist workers the run time does not go beyond 1.5s (because there is only 1 worker running the test, the other is idle.

ogrisel · 2019-10-02T13:53:12Z

I don't see any easy way to workaround this issue. I think we should just set the OMP_NUM_THREADS=1 / OPENBLAS_NUM_THREADS=1 / MKL_NUM_THREADS=1 environment variables in the CI configuration.

rth · 2019-10-02T14:09:30Z

Thanks for investigating @ogrisel and @tomMoral !

I know, it's not really a big issue, more some evidence about how things could go wrong when using different parallelism mechanisms is seemingly innocuous settings. It's also interested that this happens with OpenMP in gradient boosting but not threading in BLAS.

Closing as "wont fix", with the solution being to use a smaller OMP_NUM_THREADS as indicated above.

ogrisel · 2019-10-02T14:13:11Z

The fact that is so catastrophic even on a small number of cores is intriguing though. @jeremiedbb @NicolasHug maybe you have an idea why this is happening more specifically for HistGradientBoostingClassifier/Regressor?

I wonder why we don't have a similarly scaled over-subscription problem with MKL or OpenBLAS thread pools.

NicolasHug · 2019-10-02T15:25:50Z

I can reproduce everything on my laptop.

from sklearn.experimental import enable_hist_gradient_boosting  # noqa
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.ensemble._hist_gradient_boosting.utils import (
    get_equivalent_estimator)
from sklearn.datasets import load_boston, load_iris


def test_reg():

    X, y = load_boston(return_X_y=True)
    est = HistGradientBoostingRegressor()
    # est = get_equivalent_estimator(est, lib='lightgbm')
    est.fit(X, y)
    est.score(X, y)

def test_classif():

    X, y = load_iris(return_X_y=True)
    est = HistGradientBoostingClassifier(loss='categorical_crossentropy')
    # est = get_equivalent_estimator(est, lib='lightgbm')
    est.fit(X, y)
    est.score(X, y)

As noted by Olivier commenting out one of the tests will make it run fast.

Note that LightGBM estimators also have a big slow-down with -n 2, but much less (about 10s).

ogrisel · 2019-10-02T15:41:10Z

I observed a huge variability across oversubscribed runs, typically between 10s and 50s with -n 2 and OMP_NUM_THREADS=4 on my laptop with 2 cores 4 hyperthreads.

thomasjpfan · 2019-10-02T15:43:30Z

I am considering using a pytest.mark to label tests as "serial" and run pytest with and without -n.

Reference: pytest-dev/pytest-xdist#385 (comment)

Edit: For the CI

NicolasHug · 2019-10-02T15:45:19Z

Regardless of oversubscription, I observed a pretty high variability in the hist estimators too. Typically when running one of the benchmarks (higgs for example), the first few runs are always significantly slower than the subsequent ones :/

It's not the case for the LGBM estimators

ogrisel · 2019-10-02T15:52:45Z

I am considering using a pytest.mark to label tests as "serial" and run pytest with and without -n.
Reference: pytest-dev/pytest-xdist#385 (comment)

That's a possibility indeed.

Note that I have just encountered another issue with pytest-xdist in this PR #14264.

rth mentioned this issue Sep 24, 2019

Deadlock or over-subscription when running scikit-learn test suite in parallel joblib/loky#224

Closed

rth closed this as completed Oct 2, 2019

rth mentioned this issue Oct 14, 2019

MNT New helper for effective number of OpenMP threads #14196

Merged

rth mentioned this issue Dec 9, 2019

Really slow tests on aarch64 #15824

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oversubscription in HistGradientBoosting with pytest-xdist #15078

Oversubscription in HistGradientBoosting with pytest-xdist #15078

rth commented Sep 24, 2019 •

edited

Loading

ogrisel commented Oct 2, 2019

ogrisel commented Oct 2, 2019

ogrisel commented Oct 2, 2019

ogrisel commented Oct 2, 2019

rth commented Oct 2, 2019

ogrisel commented Oct 2, 2019

NicolasHug commented Oct 2, 2019

ogrisel commented Oct 2, 2019

thomasjpfan commented Oct 2, 2019 •

edited

Loading

NicolasHug commented Oct 2, 2019

ogrisel commented Oct 2, 2019 •

edited

Loading

Oversubscription in HistGradientBoosting with pytest-xdist #15078

Oversubscription in HistGradientBoosting with pytest-xdist #15078

Comments

rth commented Sep 24, 2019 • edited Loading

ogrisel commented Oct 2, 2019

ogrisel commented Oct 2, 2019

ogrisel commented Oct 2, 2019

ogrisel commented Oct 2, 2019

rth commented Oct 2, 2019

ogrisel commented Oct 2, 2019

NicolasHug commented Oct 2, 2019

ogrisel commented Oct 2, 2019

thomasjpfan commented Oct 2, 2019 • edited Loading

NicolasHug commented Oct 2, 2019

ogrisel commented Oct 2, 2019 • edited Loading

rth commented Sep 24, 2019 •

edited

Loading

thomasjpfan commented Oct 2, 2019 •

edited

Loading

ogrisel commented Oct 2, 2019 •

edited

Loading