`nogil` multi-threading is slower than multi-threading with gil for CPU bound #118749

Venkat2811 · 2024-05-08T00:31:29Z

Bug report

Bug description:

Hello Team,

Thanks for the great work so far and recent nogil efforts. I wanted to explore performance of asyncio with CPU-GPU bound multi-threading in nogil setup. So, started with simple benchmark as seen below:

import sys
from concurrent.futures import ThreadPoolExecutor
def fib(n):
    if n < 2: return 1
    return fib(n-1) + fib(n-2)

threads = 8
if len(sys.argv) > 1:
    threads = int(sys.argv[1])

with ThreadPoolExecutor(max_workers=threads) as executor:
    for _ in range(threads):
        executor.submit(lambda: print(fib(34)))

Original Source

Results:

python 3.13.a06 with gil:
htop shows 2 running tasks, and only one core is utilized for 100%

$ time /usr/bin/python3.13 fib.py      
9227465
9227465
9227465
9227465
9227465
9227465
9227465
9227465
/usr/bin/python3.13 fib.py  6,46s user 0,04s system 100% cpu 6,447 total

libpython3.13-nogil amd64 3.13.0~a6-1+jammy2 source nogil:
htop shows 9 running tasks, and 8 cores close to 100% utilization, but slower.

$ time /usr/bin/python3.13-nogil -X gil=0 fib.py
9227465
9227465
9227465
9227465
9227465
9227465
9227465
9227465
/usr/bin/python3.13-nogil -X gil=0 fib.py  168,81s user 0,06s system 788% cpu 21,410 total

My CPU: AMD Ryzen 7 5800X 8-Core Processor
OS: Ubuntu 22.04.4 LTS

So looks like there is overhead when using multiple cores. Is this expected with this version ? Are results similar with Intel & M1 CPUs as well ?

Results documented here in version 3.9.12 on Intel is better.

CPython versions tested on:

3.13

Operating systems tested on:

Linux

The text was updated successfully, but these errors were encountered:

corona10 · 2024-05-08T00:45:05Z

FYI, the performance issue is not a surprise because we turned off the specialization, and deferred reference counting is not yet implemented for free-threading version.

But worth to checkout if there is unexpected bottleneck here.

colesbury · 2024-05-08T16:11:46Z

Please use 3.13 beta 1. There were a number of scaling bottlenecks fixed between 3.13 alpha 6 and beta 1 (#118527).

On my machine, I see a speedup in the free-threaded build vs. the default bulid (2.4s vs. 8.2s)

corona10 · 2024-05-08T16:30:36Z

With ed2b0fb
number of thread: os.cpu_count()
task: fib(5)

Venkat2811 · 2024-05-09T07:41:47Z

Thanks for confirming that things have been fixed in beta1 @colesbury

I'm waiting for beta1 build to be available on deadsnakes. I tried building from source, but unfortunately after trying several --with-ssl solutions, still getting not found errors.

Venkat2811 · 2024-05-10T00:16:18Z

@colesbury I installed 3.13.0b1 via pyenv.

env PYTHON_CONFIGURE_OPTS='--disable-gil' pyenv install 3.13.0b1

Is this the way to create free-threaded build ?

On my M1 MBP:
--disable-gil vs default build: 15.5s vs 5.26s

On my AMD Ryzen 7 5800X 8-Core Processor, Ubuntu:
--disable-gil vs default build: 10.5s vs 6.4s

It has improved since 3.13.a06 for sure

stonebig · 2024-05-13T19:35:42Z

cpython-3.13.0b1 on Windows 11, for 20 threads, on a 4 cores/8 threads intel cpu: (faster is better)

free-threading = 12 s
standard = 19 s

FeldrinH · 2024-05-19T12:38:53Z

This is a slightly different benchmark, but there are some potentially useful data points about the speedup with free-threading here: winpython/winpython#1339.

ArtemIsmagilov · 2024-07-30T23:32:30Z

CPython3.13-b3 without GIL is slower in 2x than with GIL.

System Ubuntu24.
python3.13.b03-nogil was downloaded from apt.

The program actually works in multi-core mode with nogil.

I'm doing true concurrency testing for Python3.13.0b3. And according to my results, the result is deplorable.
Let me know if I'm not testing multithreading correctly. What could be the problem with reduced productivity?

from array import array
from concurrent.futures import ThreadPoolExecutor
from time import time

array_size = 100_000
a = array('b', [0 for i in range(array_size)])


def write_by_index(array, indx):
    array[indx] = 1


start = time()
with ThreadPoolExecutor(max_workers=6) as executor:
    for index in range(array_size ):
        executor.submit(write_by_index, a, index)
end = time() - start
print(end)

My results.

python3.13 main.py 
>>>1.0930209159851074

python3.13-nogil main.py
>>>1.9819214344024658

Next test where array_size = 10_000_000

python3.13 main.py 
>>>117.05215215682983

python3.13-nogil main.py 
>>>211.70072531700134

PS.

There's a funny moment here. It seems I was using global data, which may have entailed locking overhead, mutexes, if I can judge correctly. I tried testing with local data with my scope. This is where I got a 3x performance increase without GIL.

from concurrent.futures import ThreadPoolExecutor
from time import time

array_size = 1_000_000
count_tasks = 100


def write_by_index(size):
    # some working with local data
    array = [0 for i in range(size)]
    return array


start = time()
with ThreadPoolExecutor(max_workers=6) as executor:
    for index in range(count_tasks):
        executor.submit(write_by_index, array_size)
end = time() - start
print(end)

time python3.13 main.py
>>>2.7294483184814453

real	0m2.782s
user	0m2.798s
sys	0m0.095s

time python3.13-nogil main.py
>>>1.062140703201294

real	0m1.154s
user	0m5.511s
sys	0m0.674s

ArtemIsmagilov · 2024-07-31T06:14:52Z

It looks like it really works, you just need to wait for stable versions. It is also necessary to know well what exactly is happening under the hood in order to use this feature and thus get the maximum benefit.
issue
pep

kumaraditya303 · 2025-06-10T10:54:53Z

On 3.14, the scaling issue is fixed, with free-threading it takes 1.92 seconds and with GIL is takes 4.05 seconds for me locally, please try 3.14 and report if there are still performance issues.

Venkat2811 added the type-bug An unexpected behavior, bug, or error label May 8, 2024

Eclips4 added the topic-free-threading label May 8, 2024

Eclips4 mentioned this issue Jun 4, 2024

3.130b1 Performance Issue with Free Threading build #120040

Closed

kumaraditya303 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`nogil` multi-threading is slower than multi-threading with gil for CPU bound #118749

`nogil` multi-threading is slower than multi-threading with gil for CPU bound #118749

Venkat2811 commented May 8, 2024 •

edited by github-actions bot

Loading

corona10 commented May 8, 2024 •

edited

Loading

Uh oh!

colesbury commented May 8, 2024

Uh oh!

corona10 commented May 8, 2024 •

edited

Loading

Uh oh!

Venkat2811 commented May 9, 2024

Uh oh!

Venkat2811 commented May 10, 2024 •

edited

Loading

Uh oh!

stonebig commented May 13, 2024 •

edited

Loading

Uh oh!

FeldrinH commented May 19, 2024

Uh oh!

ArtemIsmagilov commented Jul 30, 2024 •

edited

Loading

Uh oh!

ArtemIsmagilov commented Jul 31, 2024 •

edited

Loading

Uh oh!

kumaraditya303 commented Jun 10, 2025

Uh oh!

Uh oh!

nogil multi-threading is slower than multi-threading with gil for CPU bound #118749

nogil multi-threading is slower than multi-threading with gil for CPU bound #118749

Comments

Venkat2811 commented May 8, 2024 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

corona10 commented May 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colesbury commented May 8, 2024

Uh oh!

corona10 commented May 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Venkat2811 commented May 9, 2024

Uh oh!

Venkat2811 commented May 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stonebig commented May 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FeldrinH commented May 19, 2024

Uh oh!

ArtemIsmagilov commented Jul 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArtemIsmagilov commented Jul 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kumaraditya303 commented Jun 10, 2025

Uh oh!

`nogil` multi-threading is slower than multi-threading with gil for CPU bound #118749

`nogil` multi-threading is slower than multi-threading with gil for CPU bound #118749

Venkat2811 commented May 8, 2024 •

edited by github-actions bot

Loading

corona10 commented May 8, 2024 •

edited

Loading

corona10 commented May 8, 2024 •

edited

Loading

Venkat2811 commented May 10, 2024 •

edited

Loading

stonebig commented May 13, 2024 •

edited

Loading

ArtemIsmagilov commented Jul 30, 2024 •

edited

Loading

ArtemIsmagilov commented Jul 31, 2024 •

edited

Loading