Parallelization limited to 2 CPUs when importing torch before joblib on a slurm cluster #1420

aperezlebel · 2023-04-14T14:13:55Z

Hello, I encountered an issue importing torch before joblib on a slurm cluster. Instead of using all available CPUs, it only uses two (regardless of the total CPU count). On my laptop, both work as expected.

The problem occurs on all backends. On the "threading" backend, the reduced load is spread across CPUs, each having a few percent instead of 100%.

1. Normal behavior:

from joblib import Parallel, delayed
import torch

def heavy_func():
    for _ in range(10000):
        [i for i in range(10000)]


n_jobs = 40
Parallel(n_jobs=n_jobs)(delayed(heavy_func)() for _ in range(2*n_jobs))

2. Problematic behavior:

import torch
from joblib import Parallel, delayed


def heavy_func():
    for _ in range(10000):
        [i for i in range(10000)]


n_jobs = 40
Parallel(n_jobs=n_jobs)(delayed(heavy_func)() for _ in range(2*n_jobs))

Versions

python                    3.10.10         he550d4f_0_cpython    conda-forge
joblib                    1.2.0              pyhd8ed1ab_0    conda-forge
pytorch                   2.0.0              py3.10_cpu_0    pytorch
slurm                     20.11.9

The text was updated successfully, but these errors were encountered:

ogrisel · 2023-04-18T12:24:54Z

Thanks for the report, this is indeed quite unexpected.

I am not sure I can reproduce on my local machine. EDIT: I had not read "On my laptop, both work as expected.".

Can you please try the following:

import torch
from joblib import Parallel, delayed
import os


def heavy_func():
    for _ in range(10000):
        [i for i in range(10000)]
    return os.getpid()


n_jobs = 40
results = Parallel(n_jobs=n_jobs)(delayed(heavy_func)() for _ in range(2*n_jobs))
print(len(set(results)))

aperezlebel · 2023-04-18T15:30:12Z

This gave me different numbers each time I ran the script.

When torch is imported first: on 5 runs I got: 27, 23, 20, 17, 16.

When joblib is imported first: on 5 runs I got: 4, 2, 4, 3, 4.

I am not sure I can reproduce on my local machine

The example of my first message was on the Margaret cluster. Maybe you can try to reproduce the result on a Margaret node?

wondey-sh · 2023-07-26T04:42:27Z

Hi, I encounter the same issue in an ubuntu 20.04 machine:

python 3.9.16 h2782a2a_0_cpython conda-forge
pytorch 1.13.1 py3.9_cuda11.7_cudnn8.5.0_0 pytorch
joblib 1.3.0 pyhd8ed1ab_1 conda-forge

The issue occurs with my own code, and I can also reproduce the issue with @aperezlebel 's code.

wurining · 2023-11-09T06:00:35Z

Hi, same here.

python 3.8.17
torch 1.12.1
joblib 1.3.2

Importing joblib before torch can resolve it.

tomMoral · 2023-11-09T11:40:00Z

I think this is an issue where importing torch modify the CPU affinity. It seems to be related to llvm-openmp>=16 ( pytorch/pytorch#101850 (comment)), could you check if you have this installed? if so could you install the earlier version and check it works properly? Le jeu. 9 nov. 2023 à 07:00, Rining Wu ***@***.***> a écrit :

…

Hi, same here. - python 3.8.17 - torch 1.12.1 - joblib 1.3.2 Importing joblib before torch can resolve it. — Reply to this email directly, view it on GitHub <#1420 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZKZ6M33UQ2THUEIAPKFZ3YDRWRBAVCNFSM6AAAAAAW6PLJ4GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBTGIYTENRTGM> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

wurining · 2023-11-09T22:14:46Z

@tomMoral Thank you for this ref. It is useful.

The llvm-openmp version is 16.0.6, and I can set KMP_AFFINITY=disabled to make it run well as well.

pytorch/pytorch#99625 (comment)

lesteve · 2025-02-19T12:33:13Z

Let's close it looks like there is a work-around for this.

lesteve closed this as completed Feb 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization limited to 2 CPUs when importing torch before joblib on a slurm cluster #1420

Parallelization limited to 2 CPUs when importing torch before joblib on a slurm cluster #1420

aperezlebel commented Apr 14, 2023 •

edited

Loading

ogrisel commented Apr 18, 2023 •

edited

Loading

aperezlebel commented Apr 18, 2023

wondey-sh commented Jul 26, 2023

wurining commented Nov 9, 2023

tomMoral commented Nov 9, 2023 via email

wurining commented Nov 9, 2023

lesteve commented Feb 19, 2025

Parallelization limited to 2 CPUs when importing torch before joblib on a slurm cluster #1420

Parallelization limited to 2 CPUs when importing torch before joblib on a slurm cluster #1420

Comments

aperezlebel commented Apr 14, 2023 • edited Loading

1. Normal behavior:

2. Problematic behavior:

Versions

ogrisel commented Apr 18, 2023 • edited Loading

aperezlebel commented Apr 18, 2023

wondey-sh commented Jul 26, 2023

wurining commented Nov 9, 2023

tomMoral commented Nov 9, 2023 via email

wurining commented Nov 9, 2023

lesteve commented Feb 19, 2025

aperezlebel commented Apr 14, 2023 •

edited

Loading

ogrisel commented Apr 18, 2023 •

edited

Loading