-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Oversubscription in HistGradientBoosting with pytest-xdist #15078
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There is something weird. On my laptop (2 cores, 4 hyperthreads):
So this seems to be a really extreme case of over-subscription. |
I have also tried to use |
I have also tried to add |
I don't see any easy way to workaround this issue. I think we should just set the OMP_NUM_THREADS=1 / OPENBLAS_NUM_THREADS=1 / MKL_NUM_THREADS=1 environment variables in the CI configuration. |
Thanks for investigating @ogrisel and @tomMoral ! I know, it's not really a big issue, more some evidence about how things could go wrong when using different parallelism mechanisms is seemingly innocuous settings. It's also interested that this happens with OpenMP in gradient boosting but not threading in BLAS. Closing as "wont fix", with the solution being to use a smaller |
The fact that is so catastrophic even on a small number of cores is intriguing though. @jeremiedbb @NicolasHug maybe you have an idea why this is happening more specifically for HistGradientBoostingClassifier/Regressor? I wonder why we don't have a similarly scaled over-subscription problem with MKL or OpenBLAS thread pools. |
I can reproduce everything on my laptop. from sklearn.experimental import enable_hist_gradient_boosting # noqa
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.ensemble._hist_gradient_boosting.utils import (
get_equivalent_estimator)
from sklearn.datasets import load_boston, load_iris
def test_reg():
X, y = load_boston(return_X_y=True)
est = HistGradientBoostingRegressor()
# est = get_equivalent_estimator(est, lib='lightgbm')
est.fit(X, y)
est.score(X, y)
def test_classif():
X, y = load_iris(return_X_y=True)
est = HistGradientBoostingClassifier(loss='categorical_crossentropy')
# est = get_equivalent_estimator(est, lib='lightgbm')
est.fit(X, y)
est.score(X, y) As noted by Olivier commenting out one of the tests will make it run fast. Note that LightGBM estimators also have a big slow-down with |
I observed a huge variability across oversubscribed runs, typically between 10s and 50s with |
I am considering using a Reference: pytest-dev/pytest-xdist#385 (comment) Edit: For the CI |
Regardless of oversubscription, I observed a pretty high variability in the hist estimators too. Typically when running one of the benchmarks (higgs for example), the first few runs are always significantly slower than the subsequent ones :/ It's not the case for the LGBM estimators |
That's a possibility indeed. Note that I have just encountered another issue with pytest-xdist in this PR #14264. |
When running tests with pytest-xdist on a machine with 12 (physical) CPU machine, the use of OpenMP in HistGradientBoosting seem to lead to significant over-subscription,
for me takes 0.85s. This runs 2 docstrings on training GBDT classifier and regressor on iris and boston datasets respectively.
-n 2
) takes 56s (and 50 threads are created).OMP_NUM_THREADS=2
takes 0.52sWhile I understand the case of catastrophic oversubscription when
N_CPU_THREADS**2
threads are created on a machine with many cores, here we create2*N_CPU_THREADS
only as compared to1*N_CPU_THREADS
and get a 10x slowdown.Can someone reproduce it? Here using scikit-learn master, and a conda env on Linux with latest
numpy scipy nomkl python=3.7
.Because pytest-xdist uses its own parallelism system (not sure what it does exactly) I guess this won't be addressed by threadpoolctl #14979?
Edit: Originally reported in https://github.com/tomMoral/loky/issues/224
The text was updated successfully, but these errors were encountered: