Unexpected slowness of code execution in the JupyterHub deployment (OpenMP oversubscription)

As reported on the forum, the execution of some cells are significantly slower than expected (40s or more instead ~2s):

https://mooc-forums.inria.fr/moocsl/t/cross-validation-accuracy-reproducibility/7379/3

discussing the cross-validation of an HGB Classifier model in `Exercise M1.05`:

https://lms.fun-mooc.fr/courses/course-v1:inria+41026+session02/jump_to_id/2081c92e3a4d4cc3b14db6e7e4220d58

I found the following problem on the configuration of the jupyterhub server:

- There are 4 cores on the machine according to `threadpoolctl.threadpool_info()`
- However the CFS quota (` /sys/fs/cgroup/cpu/cpu.cfs_quota_us`) is set to 1x the CFS period (`/sys/fs/cgroup/cpu/cpu.cfs_period_us`) which means that only 1 CPU is usable per container.

I think we should allow for at least 2 CPUs per-container in the kubernetes CFS config (or even 4), even though we know they will be underused most of the time. And we should set the following environment variables accordingly:

```
OMP_NUM_THREADS=2
OPENBLAS_NUM_THREADS=2
LOKY_CPU_COUNT=2
```

but if `cfs_quota_us` is left to `1 x cfs_period_us`, then we should instead set:

```
OMP_NUM_THREADS=1
OPENBLAS_NUM_THREADS=1
LOKY_CPU_COUNT=1
```

in  the environment config to avoid any potential oversubscription problem.

I have also observed that the anti-oversubscription protection for HBG Classifier implemented in https://github.com/scikit-learn/scikit-learn/pull/20477/ and released as part of scikit-learn 1.0 is not working as expected because setting `OMP_NUM_THREADS=1` at the beginning of the notebook or using `threadpoolctl.threadpool_limit(limits=1)` can change the duration from ~40s to ~6s in my tests. So OpenMP oversubscription is the main culprit here.

I would not have expected this because `sklearn.utils._openmp_helpers._openmp_effective_n_threads()` returns 1 (as expected) and I have checked that `_openmp_effective_n_threads` is called where appropriate in `HistGradientBoostingClassifier.fit` in the source code of the version of scikit-learn deployed on jupyterhub...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected slowness of code execution in the JupyterHub deployment (OpenMP oversubscription) #586

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected slowness of code execution in the JupyterHub deployment (OpenMP oversubscription) #586

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions