gh-113055: Fix memory leak of obmalloc state #113218

nascheme · 2023-12-16T21:44:15Z

Free the obmalloc arenas, radix tree nodes and allarenas array when the interpreter state is freed.

Issue: Memory leak on executables embedded with 3.13 #113055

Free the obmalloc arenas, radix tree nodes and allarenas array when the interpreter state is freed.

neonene · 2023-12-17T00:47:53Z

Objects/obmalloc.c

+    }
+    // free the array containing pointers to all arenas
+    PyMem_RawFree(allarenas);
+#if WITH_PYMALLOC_RADIX_TREE


_freeze_module is defaulted to be a 32-bit application on Windows, which does not define the USE_INTERIOR_NODES flag during the build.

Make work for --without-pymalloc and for WITH_PYMALLOC_RADIX_TREE=0.

neonene · 2023-12-17T12:24:09Z

4471ac7 says my debug build crashes:

>_testembed_d test_repeated_init_exec "import _testcapi"  // or _ctypes
--- Loop #1 ---
--- Loop #2 ---  // Crash

nascheme · 2023-12-17T21:16:57Z

4471ac7 says my debug build crashes: ...

I suspect this is a use-after-free bug. If I print out how many obmalloc blocks are still in use when free_arenas() is called, there are a lot, like thousands. I can't reproduce the crash now but it's maybe something to do with the dict keys freelist.

nascheme · 2023-12-17T21:59:16Z

I don't know if this is the same bug but I can make it crash by running the following:

./Programs/_testembed test_repeated_init_exec "$(cat crash.py)"

Where the crash.py file contains:

import unittest

unittest.main('test.test_dict', verbosity=1, exit=False)
unittest.main('test.test_enum', verbosity=1, exit=False)
unittest.main('test.test_set', verbosity=1, exit=False)

Part of the stack trace:

#0  PyDict_SetItem (op=op@entry=0x7ffff6dbab40, key=0x7ffff6e4d2b0, value=value@entry=0x7ffff6e447e0) at ./Include/object.h:1180
#1  0x000055555565fd43 in PyDict_SetItemString (v=v@entry=0x7ffff6dbab40, key=key@entry=0x7ffff72c34e6 "resolution", 
    item=item@entry=0x7ffff6e447e0) at Objects/dictobject.c:4053
#2  0x00007ffff72c1dc5 in _datetime_exec (module=module@entry=0x7ffff6e39670) at ./Modules/_datetimemodule.c:6850
#3  0x00007ffff72c26a0 in PyInit__datetime () at ./Modules/_datetimemodule.c:6966
#4  0x0000555555742fa8 in _PyImport_LoadDynamicModuleWithSpec (spec=spec@entry=0x7ffff6e447a0, fp=fp@entry=0x0) at ./Python/importdl.c:170
#5  0x0000555555740c81 in _imp_create_dynamic_impl (module=module@entry=0x7ffff7cc1440, spec=0x7ffff6e447a0, file=<optimized out>)
    at Python/import.c:3749
#6  0x0000555555740cf3 in _imp_create_dynamic (module=0x7ffff7cc1440, args=0x7ffff6e44748, nargs=1) at Python/clinic/import.c.h:485
#7  0x00005555556697ae in cfunction_vectorcall_FASTCALL (func=0x7ffff7cc3880, args=0x7ffff6e44748, nargsf=<optimized out>, 
    kwnames=<optimized out>) at Objects/methodobject.c:425
#8  0x000055555562faec in _PyVectorcall_Call (tstate=tstate@entry=0x555555a84e28 <_PyRuntime+493384>, 
    func=0x555555669752 <cfunction_vectorcall_FASTCALL>, callable=callable@entry=0x7ffff7cc3880, tuple=tuple@entry=0x7ffff6e44730, 
    kwargs=kwargs@entry=0x7ffff6e4d180) at Objects/call.c:273
#9  0x000055555562fc78 in _PyObject_Call (tstate=0x555555a84e28 <_PyRuntime+493384>, callable=callable@entry=0x7ffff7cc3880, 
    args=args@entry=0x7ffff6e44730, kwargs=kwargs@entry=0x7ffff6e4d180) at Objects/call.c:348
#10 0x000055555562fcb3 in PyObject_Call (callable=callable@entry=0x7ffff7cc3880, args=args@entry=0x7ffff6e44730, 
    kwargs=kwargs@entry=0x7ffff6e4d180) at Objects/call.c:373
#11 0x00005555557035d7 in _PyEval_EvalFrameDefault (tstate=0x555555a84e28 <_PyRuntime+493384>, frame=0x7ffff7fb8b60, throwflag=<optimized out>)
    at Python/generated_cases.c.h:1225
#12 0x00005555557126fe in _PyEval_EvalFrame (tstate=tstate@entry=0x555555a84e28 <_PyRuntime+493384>, frame=<optimized out>, 
    throwflag=throwflag@entry=0) at ./Include/internal/pycore_ceval.h:115
#13 0x00005555557127e0 in _PyEval_Vector (tstate=0x555555a84e28 <_PyRuntime+493384>, func=<optimized out>, locals=locals@entry=0x0, 
    args=<optimized out>, argcount=<optimized out>, kwnames=<optimized out>) at Python/ceval.c:1785
#14 0x000055555562e6f1 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>)
    at Objects/call.c:413
#15 0x000055555562e8bd in _PyObject_VectorcallTstate (tstate=tstate@entry=0x555555a84e28 <_PyRuntime+493384>, 
    callable=callable@entry=0x7ffff75002c0, args=args@entry=0x7fffffffc470, nargsf=nargsf@entry=2, kwnames=kwnames@entry=0x0)
    at ./Include/internal/pycore_call.h:168

The op argument to PyDict_SetItem has an invalid ob_type pointer.

neonene · 2023-12-18T07:10:33Z

I also hit a crash when importing datetime module twice. The globals-to-fix.tsv shows extension modules related to me.

cpython/Tools/c-analyzer/cpython/globals-to-fix.tsv

Lines 379 to 381 in 2b93f52

    
           Modules/_ctypes/stgdict.c	-	PyCStgDict_Type	- 
        
           Modules/_cursesmodule.c	-	PyCursesWindow_Type	- 
        
           Modules/_datetimemodule.c	-	PyDateTime_DateTimeType	-

Maybe a related attempt and crash?: #44470 (comment)

ericsnowcurrently · 2023-12-18T16:47:22Z

I also hit a crash when importing datetime module twice.

I'd expect any single-phase init module to potentially have problems with multiple init/fini cycles, since such legacy modules have no direct opportunity to finalize their global state. Regarding datetime specifically, see gh-71587 (and others related).

ericsnowcurrently · 2023-12-18T16:50:59Z

Likewise, _testcapi (from #113218 (comment)) is single-phase init. That may be the problem.

ericsnowcurrently

LGTM (with a caveat for my lack of familiarity with the radix tree stuff)

I'm not marking "Approve" only due to remaining uncertainty about new crashes for embedded CPython across init/fini cycles.

ericsnowcurrently · 2023-12-18T17:33:57Z

Objects/obmalloc.c

+    for (uint i = 0; i < maxarenas; ++i) {
+        // free each obmalloc memory arena
+        struct arena_object *ao = &allarenas[i];
+        _PyObject_Arena.free(_PyObject_Arena.ctx,
+                             (void *)ao->address, ARENA_SIZE);
+    }
+    // free the array containing pointers to all arenas
+    PyMem_RawFree(allarenas);


I've double-checked that this is correct. I didn't see any other allocations for arenas (outside the radix tree stuff).

ericsnowcurrently · 2023-12-18T17:42:15Z

Objects/obmalloc.c

+    // Free the middle and bottom nodes of the radix tree.  These are allocated
+    // by arena_map_mark_used() but not freed when arenas are freed.


Why aren't they freed when arenas are freed?

Better performance and simpler code. It might not be too expensive to free them. We would have to track how many nodes are children and then remove when count reaches zero.

ericsnowcurrently · 2023-12-18T17:44:46Z

Objects/obmalloc.c

+#if WITH_PYMALLOC_RADIX_TREE
+#ifdef USE_INTERIOR_NODES
+    // Free the middle and bottom nodes of the radix tree.  These are allocated
+    // by arena_map_mark_used() but not freed when arenas are freed.
+    for (int i1 = 0; i1 < MAP_TOP_LENGTH; i1++) {
+         arena_map_mid_t *mid = arena_map_root.ptrs[i1];
+         if (mid == NULL) {
+             continue;
+         }
+         for (int i2 = 0; i2 < MAP_MID_LENGTH; i2++) {
+            arena_map_bot_t *bot = arena_map_root.ptrs[i1]->ptrs[i2];
+            if (bot == NULL) {
+                continue;
+            }
+            PyMem_RawFree(bot);
+         }
+         PyMem_RawFree(mid);
+    }
+#endif
+#endif


I'm not as familiar with the radix tree stuff. I do see the corresponding allocations in arena_map_get() (via arena_map_mark_used()) though. The logic you have here make sense relative to that.

ericsnowcurrently · 2023-12-18T17:53:05Z

I'm not marking "Approve" only due to remaining uncertainty about new crashes for embedded CPython across init/fini cycles.

We certainly don't want regressions there. However, I'd expect the parts causing crashes would already be problematic across init/fini cycles.

nascheme · 2023-12-18T18:46:32Z

Likewise, _testcapi (from #113218 (comment)) is single-phase init. That may be the problem.

It seems to me that we are not ready to be moving the obmalloc internal state to be per-interpreter. With so many modules allocating memory from obmalloc that outlives the interpreter we are suck in a no-win situation. Either we leak the obmalloc arenas on an init/finish interp cycles or we free them and risk use-after-free crashes.

I think my free_arenas() function is only safe to call if the obmalloc used block count is zero. We are very far from that and any real Python program likely has no hope of getting there without massive efforts to fix all of the extensions it uses. So, I don't see what the point of moving the obmalloc state actually was. I think we should move it back so we don't have this leak and then work on fixing modules with global state.

Is it possible that mimalloc and the nogil changes will help solve this? Maybe we should leave obmalloc alone (i.e. revert to 3.11 behaviour) and then phase it out when mimalloc is fully working.

nascheme · 2023-12-18T19:22:34Z

An idea for an improvement: we could check allocated blocks per arena and free only the arenas that have zero used blocks. That should be safe and free at least some memory. Then, if you want to avoid leaks on init/finish cycles you just have to fix all the extensions so they don't leak.

tim-one · 2023-12-18T20:11:42Z

It seems to me that we are not ready to be moving the obmalloc internal state to be per-interpreter.

Indeed, even having that as "a goal" makes scant sense to me, no more than, e.g., that different interpreters should need to use their own system malloc(), or ...

If a sub-interpreter wants to end its life, it should be responsible for freeing the memory it allocated.

Is it possible that mimalloc and the nogil changes will help solve this? Maybe we should leave obmalloc alone (i.e. revert to 3.11 behaviour) and then phase it out when mimalloc is fully working.

mimalloc will help with nogil, but I doubt it will help at all in out-guessing sub-interpreters that won't/can't free the memory they allocated.

nascheme · 2023-12-18T23:02:26Z

mimalloc will help with nogil, but I doubt it will help at all in out-guessing sub-interpreters that won't/can't free the memory they allocated.

My understanding is that mimalloc is thread safe and so we could have one instance of mimalloc per-process and shared between threads and sub-interpreters. However, to avoid leaking, you would need some kind of cross-interpreter or cross-thread reference counting system, so memory gets freed when it's no longer used. E.g. like what nogil's biased reference counting does.

ericsnowcurrently · 2023-12-18T23:14:44Z

I made obmalloc state per-interpreter to avoid a global lock under per-interpreter GIL.

neonene · 2023-12-19T02:16:51Z

For fixing memory leaks from sub-interpreters (issue: #110411), the current free_arenas() looks safe if it is correct that no legacy extension is allowed to be loaded by a sub-interpreter: PEP 554, C-API

ericsnowcurrently · 2023-12-19T16:52:26Z

For fixing memory leaks from sub-interpreters (issue: #110411), the current free_arenas() looks safe if it is correct that no legacy extension is allowed to be loaded by a sub-interpreter: PEP 554, C-API

That's correct for isolated subinterpreters (have their own GIL). It's already a non-issue for legacy subinterpreters, which share the main interpreter's GIL as well as the main interpreter's obmalloc state.

The only caveat for isolated subinterpreters is that the only way to identify a single-phase init module is to call its "init" function and check if it returned a module object (rather than a PyModuleDef). For an isolated subinterpreter we immediately clear/free the module object and error out. However, there is never an opportunity to clear out the extension's global state, if any.

As to the crashes with this PR, relative to global init/fini, that's still an issue. Here are some possible fixes:

dlclose() all extension module files during runtime fini (or interp fini in some cases)
use something like dlmopen() to load extensions in isolated symbol namespaces
track extension modules that store objects in global variables and don't clean them up, and do not clear obmalloc state for the corresponding interpreter
go back to 3.11 obmalloc, but use a PyMutex lock around obmalloc for thread-safety
switch to mimalloc sooner rather than later

neonene · 2023-12-20T20:01:36Z

the only way to identify a single-phase init module is to call its "init" function and check if it returned a module object

Sorry for not knowing much about the internals, but is there any chance isolated subinterpreters are excluded from the suggested approaches, by receiving the module identification from the main interpreter and by using the current free_arena() function?

neonene · 2023-12-22T06:05:58Z

track extension modules that store objects in global variables and don't clean them up, and do not clear obmalloc state for the corresponding interpreter

I would prefer this, leaving aside the issue of isolated subinterpreters.

nascheme · 2023-12-22T21:39:49Z

I tried a slightly different approach, as gh-113412. There is a slight overhead for get_state() vs 3.11 but I don't think that matters too much. If the Py_RTFLAGS_USE_MAIN_OBMALLOC flag is set (which I think is the default) the behaviour is like 3.11 in that there is one obmalloc state for all interpreters.

If you want to use isolated sub-interpreters, which is think requires that the Py_RTFLAGS_USE_MAIN_OBMALLOC flag is cleared, then you will need to make sure the sub-interpreters use obmalloc correctly. E.g. don't share memory allocated from obmalloc between interpreters and don't hang on to memory after the interpreter has been freed.

nascheme · 2024-10-15T23:00:20Z

For Python 3.13, GH-113412 was merged as an alternative to this. This fix could possibly be applied to 3.12 but it needs more polish and testing first.

nascheme · 2025-04-28T20:37:35Z

Closing, this is not going to get applied.

Fix memory leak of obmalloc state.

0d0c5a9

Free the obmalloc arenas, radix tree nodes and allarenas array when the interpreter state is freed.

bedevere-app bot mentioned this pull request Dec 16, 2023

Memory leak on executables embedded with 3.13 #113055

Closed

nascheme requested a review from ericsnowcurrently December 16, 2023 21:44

neonene reviewed Dec 17, 2023

View reviewed changes

Fix for various config options.

4471ac7

Make work for --without-pymalloc and for WITH_PYMALLOC_RADIX_TREE=0.

ericsnowcurrently reviewed Dec 18, 2023

View reviewed changes

neonene mentioned this pull request Jan 25, 2024

Extension Module Handles Are Never Closed #114538

Open

nascheme added the 3.12 only security fixes label Oct 15, 2024

nascheme closed this Apr 28, 2025

		// Free the middle and bottom nodes of the radix tree. These are allocated
		// by arena_map_mark_used() but not freed when arenas are freed.

Uh oh!

gh-113055: Fix memory leak of obmalloc state #113218

gh-113055: Fix memory leak of obmalloc state #113218

Uh oh!

Conversation

nascheme commented Dec 16, 2023 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neonene Dec 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

neonene commented Dec 17, 2023

Uh oh!

nascheme commented Dec 17, 2023

Uh oh!

nascheme commented Dec 17, 2023

Uh oh!

neonene commented Dec 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericsnowcurrently commented Dec 18, 2023

Uh oh!

ericsnowcurrently commented Dec 18, 2023

Uh oh!

ericsnowcurrently left a comment

Choose a reason for hiding this comment

Uh oh!

ericsnowcurrently Dec 18, 2023

Choose a reason for hiding this comment

Uh oh!

ericsnowcurrently Dec 18, 2023

Choose a reason for hiding this comment

Uh oh!

nascheme Dec 18, 2023

Choose a reason for hiding this comment

Uh oh!

ericsnowcurrently Dec 18, 2023

Choose a reason for hiding this comment

Uh oh!

ericsnowcurrently commented Dec 18, 2023

Uh oh!

nascheme commented Dec 18, 2023

Uh oh!

nascheme commented Dec 18, 2023

Uh oh!

tim-one commented Dec 18, 2023

Uh oh!

nascheme commented Dec 18, 2023

Uh oh!

ericsnowcurrently commented Dec 18, 2023

Uh oh!

neonene commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericsnowcurrently commented Dec 19, 2023

Uh oh!

neonene commented Dec 20, 2023

Uh oh!

neonene commented Dec 22, 2023

Uh oh!

nascheme commented Dec 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nascheme commented Oct 15, 2024

Uh oh!

nascheme commented Apr 28, 2025

Uh oh!

Uh oh!

nascheme commented Dec 16, 2023 •

edited by bedevere-app bot

Loading

neonene Dec 17, 2023 •

edited

Loading

neonene commented Dec 18, 2023 •

edited

Loading

neonene commented Dec 19, 2023 •

edited

Loading

nascheme commented Dec 22, 2023 •

edited

Loading