-
Notifications
You must be signed in to change notification settings - Fork 561
Description
As reported on the forum, the execution of some cells are significantly slower than expected (40s or more instead ~2s):
https://mooc-forums.inria.fr/moocsl/t/cross-validation-accuracy-reproducibility/7379/3
discussing the cross-validation of an HGB Classifier model in Exercise M1.05
:
I found the following problem on the configuration of the jupyterhub server:
- There are 4 cores on the machine according to
threadpoolctl.threadpool_info()
- However the CFS quota (
/sys/fs/cgroup/cpu/cpu.cfs_quota_us
) is set to 1x the CFS period (/sys/fs/cgroup/cpu/cpu.cfs_period_us
) which means that only 1 CPU is usable per container.
I think we should allow for at least 2 CPUs per-container in the kubernetes CFS config (or even 4), even though we know they will be underused most of the time. And we should set the following environment variables accordingly:
OMP_NUM_THREADS=2
OPENBLAS_NUM_THREADS=2
LOKY_CPU_COUNT=2
but if cfs_quota_us
is left to 1 x cfs_period_us
, then we should instead set:
OMP_NUM_THREADS=1
OPENBLAS_NUM_THREADS=1
LOKY_CPU_COUNT=1
in the environment config to avoid any potential oversubscription problem.
I have also observed that the anti-oversubscription protection for HBG Classifier implemented in scikit-learn/scikit-learn#20477 and released as part of scikit-learn 1.0 is not working as expected because setting OMP_NUM_THREADS=1
at the beginning of the notebook or using threadpoolctl.threadpool_limit(limits=1)
can change the duration from ~40s to ~6s in my tests. So OpenMP oversubscription is the main culprit here.
I would not have expected this because sklearn.utils._openmp_helpers._openmp_effective_n_threads()
returns 1 (as expected) and I have checked that _openmp_effective_n_threads
is called where appropriate in HistGradientBoostingClassifier.fit
in the source code of the version of scikit-learn deployed on jupyterhub...