-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-113055: Fix memory leak of obmalloc state #113218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Free the obmalloc arenas, radix tree nodes and allarenas array when the interpreter state is freed.
Objects/obmalloc.c
Outdated
} | ||
// free the array containing pointers to all arenas | ||
PyMem_RawFree(allarenas); | ||
#if WITH_PYMALLOC_RADIX_TREE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_freeze_module
is defaulted to be a 32-bit application on Windows, which does not define the USE_INTERIOR_NODES
flag during the build.
Make work for --without-pymalloc and for WITH_PYMALLOC_RADIX_TREE=0.
4471ac7 says my debug build crashes:
|
I suspect this is a use-after-free bug. If I print out how many obmalloc blocks are still in use when |
I don't know if this is the same bug but I can make it crash by running the following:
Where the
Part of the stack trace:
The |
I also hit a crash when importing cpython/Tools/c-analyzer/cpython/globals-to-fix.tsv Lines 379 to 381 in 2b93f52
Maybe a related attempt and crash?: #44470 (comment) |
I'd expect any single-phase init module to potentially have problems with multiple init/fini cycles, since such legacy modules have no direct opportunity to finalize their global state. Regarding datetime specifically, see gh-71587 (and others related). |
Likewise, _testcapi (from #113218 (comment)) is single-phase init. That may be the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (with a caveat for my lack of familiarity with the radix tree stuff)
I'm not marking "Approve" only due to remaining uncertainty about new crashes for embedded CPython across init/fini cycles.
for (uint i = 0; i < maxarenas; ++i) { | ||
// free each obmalloc memory arena | ||
struct arena_object *ao = &allarenas[i]; | ||
_PyObject_Arena.free(_PyObject_Arena.ctx, | ||
(void *)ao->address, ARENA_SIZE); | ||
} | ||
// free the array containing pointers to all arenas | ||
PyMem_RawFree(allarenas); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've double-checked that this is correct. I didn't see any other allocations for arenas (outside the radix tree stuff).
// Free the middle and bottom nodes of the radix tree. These are allocated | ||
// by arena_map_mark_used() but not freed when arenas are freed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't they freed when arenas are freed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better performance and simpler code. It might not be too expensive to free them. We would have to track how many nodes are children and then remove when count reaches zero.
#if WITH_PYMALLOC_RADIX_TREE | ||
#ifdef USE_INTERIOR_NODES | ||
// Free the middle and bottom nodes of the radix tree. These are allocated | ||
// by arena_map_mark_used() but not freed when arenas are freed. | ||
for (int i1 = 0; i1 < MAP_TOP_LENGTH; i1++) { | ||
arena_map_mid_t *mid = arena_map_root.ptrs[i1]; | ||
if (mid == NULL) { | ||
continue; | ||
} | ||
for (int i2 = 0; i2 < MAP_MID_LENGTH; i2++) { | ||
arena_map_bot_t *bot = arena_map_root.ptrs[i1]->ptrs[i2]; | ||
if (bot == NULL) { | ||
continue; | ||
} | ||
PyMem_RawFree(bot); | ||
} | ||
PyMem_RawFree(mid); | ||
} | ||
#endif | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not as familiar with the radix tree stuff. I do see the corresponding allocations in arena_map_get()
(via arena_map_mark_used()
) though. The logic you have here make sense relative to that.
We certainly don't want regressions there. However, I'd expect the parts causing crashes would already be problematic across init/fini cycles. |
It seems to me that we are not ready to be moving the obmalloc internal state to be per-interpreter. With so many modules allocating memory from obmalloc that outlives the interpreter we are suck in a no-win situation. Either we leak the obmalloc arenas on an init/finish interp cycles or we free them and risk use-after-free crashes. I think my Is it possible that mimalloc and the nogil changes will help solve this? Maybe we should leave obmalloc alone (i.e. revert to 3.11 behaviour) and then phase it out when mimalloc is fully working. |
An idea for an improvement: we could check allocated blocks per arena and free only the arenas that have zero used blocks. That should be safe and free at least some memory. Then, if you want to avoid leaks on init/finish cycles you just have to fix all the extensions so they don't leak. |
Indeed, even having that as "a goal" makes scant sense to me, no more than, e.g., that different interpreters should need to use their own system If a sub-interpreter wants to end its life, it should be responsible for freeing the memory it allocated.
mimalloc will help with nogil, but I doubt it will help at all in out-guessing sub-interpreters that won't/can't free the memory they allocated. |
My understanding is that mimalloc is thread safe and so we could have one instance of mimalloc per-process and shared between threads and sub-interpreters. However, to avoid leaking, you would need some kind of cross-interpreter or cross-thread reference counting system, so memory gets freed when it's no longer used. E.g. like what nogil's biased reference counting does. |
I made obmalloc state per-interpreter to avoid a global lock under per-interpreter GIL. |
That's correct for isolated subinterpreters (have their own GIL). It's already a non-issue for legacy subinterpreters, which share the main interpreter's GIL as well as the main interpreter's obmalloc state. The only caveat for isolated subinterpreters is that the only way to identify a single-phase init module is to call its "init" function and check if it returned a module object (rather than a As to the crashes with this PR, relative to global init/fini, that's still an issue. Here are some possible fixes:
|
Sorry for not knowing much about the internals, but is there any chance isolated subinterpreters are excluded from the suggested approaches, by receiving the module identification from the main interpreter and by using the current |
I would prefer this, leaving aside the issue of isolated subinterpreters. |
I tried a slightly different approach, as gh-113412. There is a slight overhead for If you want to use isolated sub-interpreters, which is think requires that the |
For Python 3.13, GH-113412 was merged as an alternative to this. This fix could possibly be applied to 3.12 but it needs more polish and testing first. |
Closing, this is not going to get applied. |
Free the obmalloc arenas, radix tree nodes and allarenas array when the interpreter state is freed.