GC performance regression in free threaded build #132917

gptsarthak · 2025-04-25T09:55:02Z

Bug report

Bug description:

I've identified a significant performance regression when using Python's free-threaded mode with shared list appends. In my test case, simply appending to a shared list causes a 10-15x performance decrease compared to normal Python operation.

Test Case:

import itertools
import time

def performance_test(n_options=5, n_items=5, iterations=50):
    list = []
    
    def expensive_operation():
        # Create lists of tuples
        data = []
        for _ in range(n_options):
            data.append([(f"a{i}", f"b{i}") for i in range(n_items)])
        
        # Generate all combinations and create result tuples
        results = []
        for combo in itertools.product(*data):
            result = tuple((x[0], x[1], f"name_{i}") for i, x in enumerate(combo))
            results.append(result)
        
        # Commenting the following line solves the performance regression in free-threaded mode
        list.append(results)
        return results

    start = time.time()
    for _ in range(iterations):
        result = expensive_operation()
    
    duration = time.time() - start
    print(f"n_options={n_options}, n_items={n_items}, iterations={iterations}")
    print(f"Time: {duration:.4f}s, Combinations: {len(result)}")
    return duration

if __name__ == "__main__":
    print("Python Performance Regression Test")
    print("-" * 40)
    performance_test()

Results:

Standard Python3.13: 0.1290s
Free-threaded Python3.13t: 2.1643s
Free-threaded Python 3.14.0a7: 2.1923s
Free-threaded Python3.13t with list.append commented out: 0.1332s

The regression appears to be caused by contention on the per-list locks and reference count fields when appending to a shared list in free-threaded mode.

CPython versions tested on:

3.14

Operating systems tested on:

Linux

Linked PRs

The text was updated successfully, but these errors were encountered:

colesbury · 2025-04-25T13:07:00Z

Yes, this is expected and not a bug.

corona10 · 2025-04-25T15:28:49Z

Closing this issue since this is expected.

colesbury · 2025-04-25T15:40:19Z

To add a bit more context: if you have an inherently serial operation, like incrementing a counter or appending to a list, it's going to be slower when run with multiple threads on multiple CPUs than if you have some sort of coarse grained serialization (like the GIL) or run it on a single CPU.

srinivasreddy · 2025-04-25T15:51:29Z

@colesbury I have been reading GIL removal pep. And this issue has been closed as "not planned" , Do we have any plans to fix performance partially at least, especially in the cases of list.append(results) ?

colesbury · 2025-04-25T15:56:47Z

No

terryjreedy · 2025-04-25T20:34:06Z

Is this issue specific to .append or does it apply to .extend and slice insertions?

Is/should this be documented, warned against, somewhere other than here?

colesbury · 2025-04-25T20:41:10Z

I don't think we need to specifically warn against it any more than we would warn against building a list via successive list.insert(0, item).

This is an inherently serial operation. If you try to perform a serial operation from multiple CPUs, you introduce lots of extra communication and everything gets slower. That's normal and not really specific to Python, just like quadratic behavior of some data structures is normal and not specific to Python.

gptsarthak · 2025-04-26T18:07:26Z

@colesbury Thank you for the response. I understand the general principle about serial operations being slower with multiple threads. However, I want to clarify that my MRE doesn't use any threading at all - it's entirely single-threaded code.

The performance regression I'm observing is occurring in single-threaded code simply by running it in free-threaded Python mode. This suggests that there's a significant overhead for thread safety mechanisms even when no actual threading is used.

This seems important to document, since while experienced concurrent programmers will understand the underlying reasons, many Python users might not expect their unthreaded code to suddenly run 10-15x slower.

Would it be worth mentioning this behavior in the free-threading documentation - that even single-threaded code using mutable shared data structures may experience performance regressions due to the thread safety overhead?

Also, is there any scope for optimizing free-threaded mode under these conditions in future releases?

colesbury · 2025-04-26T18:45:15Z

Sorry, I didn't pay close enough attention to the repro and focused on "shared list appends". Thanks for following up.

This appears to be some sort of performance regression related to GC. The list.append(results) isn't really slow itself, it just holds onto the generated combinations, which leads to the GC being scheduled. Without that list.append(), the objects are freed after each call to expensive_operation().

You can verify this by adding a import gc; gc.disable() at the top.

colesbury · 2025-04-26T18:47:18Z

cc @nascheme, in case you want to take a look at this

nascheme · 2025-04-27T04:25:01Z

Yeah, the difference in performance is almost purely due to the different behavior of the cyclic GC. With the list.append() you are quickly increasing the number of alive container objects and the GC runs many times. The free-threaded build doesn't have a generational collector and so it needs to do a full collection each time the increment in live objects threshold is exceeded. I need to investigate further to confirm but I suspect since you are using tuples, the default build GC is untracking those (since they don't contain containers) and so the need to perform full GC collections is greatly reduced. Needs more investigation to confirm that's all that's happening and there is not some other issue.

For the free-threaded build, check the process resident set size (RSS) increase before triggering a full automatic garbage collection. If the RSS has not increased 10% since the last collection then it is deferred.

…3464)

Fix data race detected by tsan (https://github.com/python/cpython/actions/runs/14857021107/job/41712717208?pr=133502): young.count can be modified by other threads even while the gcstate is locked. This is the simplest fix to (potentially) unblock beta 1, although this particular code path seems like it could just be an atomic swap followed by an atomic add, without having the lock at all.

On Linux, use /proc/self/status for mem usage info. Using smaps_rollup is quite a lot slower and we can get the similar info from /proc/self/status.

…133544) On Linux, use /proc/self/status for mem usage info. Using smaps_rollup is quite a lot slower and we can get the similar info from /proc/self/status. (cherry picked from commit 751db4e) Co-authored-by: Neil Schemenauer <nas-github@arctrix.com>

… (gh-133718) On Linux, use /proc/self/status for mem usage info. Using smaps_rollup is quite a lot slower and we can get the similar info from /proc/self/status. (cherry picked from commit 751db4e) Co-authored-by: Neil Schemenauer <nas-github@arctrix.com>

…thonGH-134692) (cherry picked from commit ac539e7) Co-authored-by: Kumar Aditya <kumaraditya@python.org>

…H-134692) (#134802) gh-132917: fix data race on `last_mem` in free-threading gc (GH-134692) (cherry picked from commit ac539e7) Co-authored-by: Kumar Aditya <kumaraditya@python.org>

colesbury · 2025-06-03T17:15:33Z

This is fixed in main now with:

@nascheme - should the fixes be backported to 3.14?

colesbury · 2025-06-03T17:18:26Z

Oh, seems to be fixed in 3.14 as well.

gptsarthak added the type-bug An unexpected behavior, bug, or error label Apr 25, 2025

picnixz added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-free-threading labels Apr 25, 2025

picnixz added pending The issue will be closed if no feedback is provided and removed type-bug An unexpected behavior, bug, or error labels Apr 25, 2025

corona10 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 25, 2025

colesbury reopened this Apr 26, 2025

colesbury removed the pending The issue will be closed if no feedback is provided label Apr 26, 2025

colesbury changed the title ~~15x Performance regression with shared list appends in free-threaded mode~~ GC performance regression in free threaded build Apr 26, 2025

picnixz added the type-bug An unexpected behavior, bug, or error label Apr 26, 2025

bedevere-app bot mentioned this issue May 4, 2025

gh-132917: Check resident set size (RSS) before GC trigger. #133399

Merged

nascheme added a commit to nascheme/cpython that referenced this issue May 5, 2025

pythongh-132917: Use RSS + swap for estimate of process memory usage.

4e768b8

bedevere-app bot mentioned this issue May 5, 2025

gh-132917: Use RSS + swap for estimate of process memory usage #133464

Merged

nascheme added a commit to nascheme/cpython that referenced this issue May 5, 2025

pythongh-132917: Use RSS + swap for estimate of process memory usage.

53e5e9b

nascheme added a commit that referenced this issue May 5, 2025

gh-132917: Use RSS + swap for estimate of process memory usage (gh-13…

893034c

…3464)

bedevere-app bot mentioned this issue May 6, 2025

gh-132917: Fix data race detected by tsan #133508

Merged

bedevere-app bot mentioned this issue May 6, 2025

gh-132917: Use /proc/self/status for mem usage info. #133544

Merged

nascheme added a commit to nascheme/cpython that referenced this issue May 7, 2025

Merge branch 'main' into pythongh-132917-use-proc-status

f7fe507

nascheme added a commit to nascheme/cpython that referenced this issue May 8, 2025

Merge branch 'main' into pythongh-132917-use-proc-status

bfb5f36

nascheme added a commit that referenced this issue May 8, 2025

gh-132917: Use /proc/self/status for mem usage info. (#133544)

751db4e

On Linux, use /proc/self/status for mem usage info. Using smaps_rollup is quite a lot slower and we can get the similar info from /proc/self/status.

bedevere-app bot mentioned this issue May 8, 2025

[3.14] gh-132917: Use /proc/self/status for mem usage info. (GH-133544) #133718

Merged

bedevere-app bot mentioned this issue May 26, 2025

gh-132917: fix data race on last_mem in free-threading gc #134692

Merged

kumaraditya303 added a commit that referenced this issue May 27, 2025

gh-132917: fix data race on last_mem in free-threading gc (#134692)

ac539e7

miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 27, 2025

pythongh-132917: fix data race on last_mem in free-threading gc (py…

f241677

…thonGH-134692) (cherry picked from commit ac539e7) Co-authored-by: Kumar Aditya <kumaraditya@python.org>

bedevere-app bot mentioned this issue May 27, 2025

[3.14] gh-132917: fix data race on last_mem in free-threading gc (GH-134692) #134802

Merged

mostafaammer added this to lavitaconnect@MOSTAFAAMMER May 31, 2025

colesbury closed this as completed Jun 3, 2025

github-project-automation bot moved this to Done in lavitaconnect@MOSTAFAAMMER Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GC performance regression in free threaded build #132917

GC performance regression in free threaded build #132917

gptsarthak commented Apr 25, 2025 •

edited by bedevere-app bot

Loading

colesbury commented Apr 25, 2025

Uh oh!

corona10 commented Apr 25, 2025

Uh oh!

colesbury commented Apr 25, 2025

Uh oh!

srinivasreddy commented Apr 25, 2025 •

edited

Loading

Uh oh!

colesbury commented Apr 25, 2025

Uh oh!

terryjreedy commented Apr 25, 2025

Uh oh!

colesbury commented Apr 25, 2025

Uh oh!

gptsarthak commented Apr 26, 2025

Uh oh!

colesbury commented Apr 26, 2025

Uh oh!

colesbury commented Apr 26, 2025

Uh oh!

nascheme commented Apr 27, 2025

Uh oh!

colesbury commented Jun 3, 2025

Uh oh!

colesbury commented Jun 3, 2025

Uh oh!

Uh oh!

GC performance regression in free threaded build #132917

GC performance regression in free threaded build #132917

Comments

gptsarthak commented Apr 25, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

Linked PRs

colesbury commented Apr 25, 2025

Uh oh!

corona10 commented Apr 25, 2025

Uh oh!

colesbury commented Apr 25, 2025

Uh oh!

srinivasreddy commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

colesbury commented Apr 25, 2025

Uh oh!

terryjreedy commented Apr 25, 2025

Uh oh!

colesbury commented Apr 25, 2025

Uh oh!

gptsarthak commented Apr 26, 2025

Uh oh!

colesbury commented Apr 26, 2025

Uh oh!

colesbury commented Apr 26, 2025

Uh oh!

nascheme commented Apr 27, 2025

Uh oh!

colesbury commented Jun 3, 2025

Uh oh!

colesbury commented Jun 3, 2025

Uh oh!

gptsarthak commented Apr 25, 2025 •

edited by bedevere-app bot

Loading

srinivasreddy commented Apr 25, 2025 •

edited

Loading