Memory keeps increasing with fixed-size dict during multi-threaded set/delete in 3.13.3t #133136

Yiling-J · 2025-04-29T09:22:23Z

Bug report

Bug description:

import random
import time
from threading import Lock, Thread

import psutil


class Cache:
    def __init__(self, cap):
        self.m = {}
        self.mu = Lock()
        self.cap = cap

    def get(self, key):
        with self.mu:
            try:
                v = self.m[key]
            except KeyError:
                return 0, False
            return v, True

    def set(self, key, value):
        with self.mu:
            self.m[key] = value
            if len(self.m) > self.cap:
                self.m.popitem()

def set_key(cache):
    c = 0
    while True:
        key = random.randint(0, 1000000000)
        cache.set(key, key)
        c += 1
        if c  == 100000:
            c = 0
            print(".", end="", flush=True)

def run():
    cache = Cache(500000)

    for _ in range(4):
        thread = Thread(target=set_key, args=[cache])
        thread.start()

    process = psutil.Process()
    while True:
        time.sleep(10)
        with cache.mu:
            print(f"rss: {process.memory_info().rss}, dict len: {len(cache.m)}")

if __name__ == "__main__":
    run()

Platform: macOS-14.6.1-x86_64-i386-64bit-Mach-O

Python 3.13.3t

uv run --python 3.13.3t mem.py

............................................................rss: 384696320, dict len: 500000
.......................................................rss: 636743680, dict len: 500000
.......................................................rss: 888782848, dict len: 500000
.........................................................rss: 1140793344, dict len: 500000
......................................................rss: 1392799744, dict len: 500000

Python 3.13.3

uv run --python 3.13.3 mem.py

....................................................................................................rss: 159633408, dict len: 500000
...............................................................................................................rss: 159813632, dict len: 500000
.........................................................................................................rss: 159981568, dict len: 500000
...........................................................................................................rss: 160149504, dict len: 500000
......................................................................................................rss: 160301056, dict len: 500000

CPython versions tested on:

3.13

Operating systems tested on:

macOS

Linked PRs

gh-133136: Limit excess memory held by QSBR #135107

Zheaoli · 2025-04-29T17:25:48Z

This bug can be reproduced on my Linux environment. I think may need some time to figure it out

Zheaoli · 2025-05-06T19:12:20Z

For now, I think this may related with the gc timing

import random
import time
from threading import Lock, Thread

import resource
import gc

class Cache:
    def __init__(self, cap):
        self.m = {}
        self.mu = Lock()
        self.cap = cap

    def get(self, key):
        with self.mu:
            try:
                v = self.m[key]
            except KeyError:
                return 0, False
            return v, True

    def set(self, key, value):
        with self.mu:
            self.m[key] = value
            if len(self.m) > self.cap:
                self.m.popitem()

def set_key(cache):
    c = 0
    while True:
        key = random.randint(0, 1000000000)
        cache.set(key, key)
        c += 1
        if c  == 100000:
            c = 0
            print(".", end="", flush=True)

def run():
    cache = Cache(500000)

    for _ in range(4):
        thread = Thread(target=set_key, args=[cache])
        thread.start()

    while True:
        time.sleep(1)
        with cache.mu:
            usage = resource.getrusage(resource.RUSAGE_SELF)

            rss_mb = usage.ru_maxrss
            print(f"Process RSS: {rss_mb:.2f} KB")
            # gc.collect()

if __name__ == "__main__":
    run()

if we run the gc.collect manually, the memory size will keep normal. Otherwise, the memory size will increase in 40MB in regular peroid

Zheaoli · 2025-05-06T19:20:16Z

It seems to run normally in v3.13.0a3.

After 10-minute loops, the memory size remains at less than 200MB.

This is a good signal. I think I can bisect for the bad commit.

Yiling-J · 2025-05-07T02:09:16Z

@Zheaoli Thanks for investigating the issue. I tested by manually triggering GC every 10 seconds and every 1 second. For the 10-second interval, the RSS stabilized around rss: 893140992, dict len: 500000. For the 1-second interval, the RSS stabilized around rss: 490344448, dict len: 500000. I also tested 3.13.0t without manually triggering GC, the memory usage still increases, though slower compared to 3.13.3t.

Zheaoli · 2025-05-07T04:50:00Z

Got it. I think this bug is introduced into the codebase in #116336

But I don't have enough idea about how to fix this. Would you mind taking a little bit of help? @colesbury

Zheaoli · 2025-05-12T05:46:42Z

@colesbury PTAL when you got time

Zheaoli · 2025-05-20T19:18:34Z

I think this issue is still related with the GC. I count the free_keys_object time. The normal version is much more than the free thread version.

Yiling-J · 2025-05-21T01:54:47Z

I think this issue is still related with the GC. I count the free_keys_object time. The normal version is much more than the free thread version.

This seems like a reasonable explanation for the issue. Hopefully, @colesbury can take a look when he has time, or if he's too busy, perhaps someone else familiar with free-threading could also help investigate.

colesbury · 2025-06-03T19:55:33Z

I think we need better heuristics for the QSBR here. The repeated insertions and popitem() calls lead to the dictionary keys object getting reallocated every so often. The PyDictKeysObjects here are large here so keeping a bunch of them means a lot of extra memory usage in the example.

We "buffer" QSBR frees in two ways:

We only advance the global QSBR counter every 10 calls to _Py_qsbr_deferred_advance. This is to reduce contention on the shared counter for frequent calls, but that's probably too infrequent here given the size of the keys objects

cpython/Objects/obmalloc.c

Line 1203 in 1ffe913

uint64_t seq = _Py_qsbr_deferred_advance(tstate->qsbr);

We only process the delayed free requests every WORK_ITEMS_PER_CHUNK (254) calls, again to limit work for frequent calls.

cpython/Objects/obmalloc.c

Lines 1208 to 1210 in 1ffe913

    
           if (buf->wr_idx == WORK_ITEMS_PER_CHUNK) { 
        
               _PyMem_ProcessDelayed((PyThreadState *)tstate); 
        
           }

In addition to periodic processing every 10 or 254 calls, I think we probably want to take into account the amount of memory that the QSBR subsystem is holding.

cc @mpage, @Yhg1s, @nascheme, @DinoV

The free threading build uses QSBR to delay the freeing of dictionary keys and list arrays when the objects are accessed by multiple threads in order to allow concurrent reads to proceeed with holding the object lock. The requests are processed in batches to reduce execution overhead, but for large memory blocks this can lead to excess memory usage. Take into account the size of the memory block when deciding when to process QSBR requests.

Yiling-J added the type-bug An unexpected behavior, bug, or error label Apr 29, 2025

markshannon added the topic-free-threading label Apr 29, 2025

picnixz added performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) labels May 1, 2025

bedevere-app bot mentioned this issue Jun 3, 2025

gh-133136: Limit excess memory held by QSBR #135107

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Memory keeps increasing with fixed-size dict during multi-threaded set/delete in 3.13.3t #133136

Memory keeps increasing with fixed-size dict during multi-threaded set/delete in 3.13.3t #133136

Yiling-J commented Apr 29, 2025 •

edited by bedevere-app bot

Loading

Zheaoli commented Apr 29, 2025

Uh oh!

Zheaoli commented May 6, 2025

Uh oh!

Zheaoli commented May 6, 2025

Uh oh!

Yiling-J commented May 7, 2025

Uh oh!

Zheaoli commented May 7, 2025

Uh oh!

Zheaoli commented May 12, 2025

Uh oh!

Zheaoli commented May 20, 2025

Uh oh!

Yiling-J commented May 21, 2025

Uh oh!

colesbury commented Jun 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Memory keeps increasing with fixed-size dict during multi-threaded set/delete in 3.13.3t #133136

Memory keeps increasing with fixed-size dict during multi-threaded set/delete in 3.13.3t #133136

Comments

Yiling-J commented Apr 29, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug report

Bug description:

Python 3.13.3t

Python 3.13.3

CPython versions tested on:

Operating systems tested on:

Linked PRs

Zheaoli commented Apr 29, 2025

Uh oh!

Zheaoli commented May 6, 2025

Uh oh!

Zheaoli commented May 6, 2025

Uh oh!

Yiling-J commented May 7, 2025

Uh oh!

Zheaoli commented May 7, 2025

Uh oh!

Zheaoli commented May 12, 2025

Uh oh!

Zheaoli commented May 20, 2025

Uh oh!

Yiling-J commented May 21, 2025

Uh oh!

colesbury commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yiling-J commented Apr 29, 2025 •

edited by bedevere-app bot

Loading

colesbury commented Jun 3, 2025 •

edited

Loading