Skip to content

threading primitives are subject to reference count contention #134761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ZeroIntensity opened this issue May 26, 2025 · 0 comments
Open

threading primitives are subject to reference count contention #134761

ZeroIntensity opened this issue May 26, 2025 · 0 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-free-threading type-feature A feature request or enhancement

Comments

@ZeroIntensity
Copy link
Member

ZeroIntensity commented May 26, 2025

Feature or enhancement

Proposal:

On the free-threaded build, threading's concurrency primitives have a bunch of extra overhead across multiple threads due to reference count contention. For example:

import threading
import time

lock = threading.Lock()

def scale():
    a = time.perf_counter()
    for _ in range(10000000):
        lock.locked()
    b = time.perf_counter()
    print(b - a, "s")

threads = [threading.Thread(target=scale) for _ in range(8)]
for thread in threads:
    thread.start()

vs

import threading
import time

def scale():
    lock = threading.Lock()
    a = time.perf_counter()
    for _ in range(10000000):
        lock.locked()
    b = time.perf_counter()
    print(b - a, "s")

threads = [threading.Thread(target=scale) for _ in range(8)]
for thread in threads:
    thread.start()

Comparing the two on a 3.15t release build:

0.38904138099997 s
0.39082639699995525 s
0.4013638610001635 s
0.40917961700006344 s
0.526825904000134 s
0.5402126970000154 s
0.540466712999887 s
0.5586060919999909 s
3.425866439999936 s
3.5953266010001244 s
3.6094701500001065 s
3.667731437000157 s
4.458146230000011 s
4.466017671000145 s
4.499206339000011 s
4.50090869099995 s

That's a ~90% slowdown solely due to reference count contention.

We can significantly reduce this overhead by enabling deferred reference counting on these objects. This is already done for threading.local, but we can also do this for Lock and RLock. It would also be nice to do this for primitives like Event, so that will require a (private) API to expose DRC into Python.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

N/A

Linked PRs

@ZeroIntensity ZeroIntensity added type-feature A feature request or enhancement topic-free-threading labels May 26, 2025
@ZeroIntensity ZeroIntensity added the performance Performance or resource usage label May 26, 2025
@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label May 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-free-threading type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

2 participants