-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
[3.13] gh-128679: Fix tracemalloc.stop() race conditions #128897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tracemalloc_alloc(), tracemalloc_realloc(), PyTraceMalloc_Track(), PyTraceMalloc_Untrack() and _PyTraceMalloc_TraceRef() now check tracemalloc_config.tracing after calling TABLES_LOCK(). _PyTraceMalloc_Stop() now protects more code with TABLES_LOCK(), especially setting tracemalloc_config.tracing to 1. Add a test using PyTraceMalloc_Track() to test tracemalloc.stop() race condition.
a92f15b
to
545053f
Compare
@ZeroIntensity: Here is a simplified change for the 3.12 and 3.13 branch. Would you mind to review it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a couple other (unmodified) functions where I'm seeing lockless reads:
tracemalloc_get_traceback
_PyMem_DumpTraceback
_PyTraceMalloc_GetTraces
_PyTraceMalloc_GetTracedMemory
_PyTraceMalloc_ResetPeak
@@ -1376,16 +1395,21 @@ _PyTraceMalloc_TraceRef(PyObject *op, PyRefTracerEvent event, void* Py_UNUSED(ig | |||
int res = -1; | |||
|
|||
TABLES_LOCK(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also read tracing
without a lock before this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking tracing
without the lock is a workaround for the Python finalization. TABLES_LOCK() can no longer be called after _PyTraceMalloc_Fini() (the lock is destroyed), whereas _PyTraceMalloc_TraceRef() is still called after _PyTraceMalloc_Fini().
tracing
is checked again once we acquired the lock.
The issue was fixed properly in the main branch by calling (void)PyRefTracer_SetTracer(NULL, NULL);
in _PyTraceMalloc_Stop()
. Since this change is designed to be backported to Python 3.12, I didn't include this fix.
Once this change is merged in 3.13 and 3.12, we can remove the unprotected tracing
read in 3.13 in _PyTraceMalloc_TraceRef()
by adding (void)PyRefTracer_SetTracer(NULL, NULL);
.
Ok, I completed my PR to cover most (almost all?) functions, instead of focusing on alloc/realloc/free and Track/Untrack functions. @ZeroIntensity: Please review the updated PR. |
Fixing properly In the main branch, I redesigned the code to be able to hold the lock for all tables operations. For that, I had to modify the behavior:
I'm open to consider fixing |
It looks like tests somehow were broken. |
So TABLES_LOCK() can be called in _PyMem_DumpTraceback() even if tracemalloc is not tracing.
Fixed. Well... Fixing more functions require to backport more changes. I backported the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. With the exception of a few special cases that you've mentioned, everything is properly locked! Due to the changes to the lifecycle and the size of the change, I think we should run buildbots before merging.
🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit 54c8f9e 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
|
I mark this PR as a draft because of the FreeBSD crash. Oh, I can also reproduce the |
So far, I cannot reproduce the crash on Linux. |
I can trigger a crash if I insert a delay in diff --git a/Objects/obmalloc.c b/Objects/obmalloc.c
index b103deb01ca..c20eb47e8a8 100644
--- a/Objects/obmalloc.c
+++ b/Objects/obmalloc.c
@@ -859,7 +859,14 @@ set_allocator_unlocked(PyMemAllocatorDomain domain, PyMemAllocatorEx *allocator)
{
switch(domain)
{
- case PYMEM_DOMAIN_RAW: _PyMem_Raw = *allocator; break;
+ case PYMEM_DOMAIN_RAW:
+ _PyMem_Raw.ctx = allocator->ctx;
+ (void)sched_yield();
+ _PyMem_Raw.malloc = allocator->malloc;
+ _PyMem_Raw.calloc = allocator->calloc;
+ _PyMem_Raw.realloc = allocator->realloc;
+ _PyMem_Raw.free = allocator->free;
+ break;
case PYMEM_DOMAIN_MEM: _PyMem = *allocator; break;
case PYMEM_DOMAIN_OBJ: _PyObject = *allocator; break;
/* ignore unknown domain */ The write is protected by a lock (write), but |
Only debug build are affected, not release build. I modified the test in the main branch to skip it on debug build: #128988 |
…ython#128988) There is a race condition between PyMem_SetAllocator() and PyMem_RawMalloc()/PyMem_RawFree(). While PyMem_SetAllocator() write is protected by a lock, PyMem_RawMalloc()/PyMem_RawFree() reads are not protected by a lock. PyMem_RawMalloc()/PyMem_RawFree() can be called with an old context and the new function pointer. On a release build, it's not an issue since the context is not used. On a debug build, the debug hooks use the context and so can crash. (cherry picked from commit 9bc1964)
Thanks @vstinner for the PR 🌮🎉.. I'm working now to backport this PR to: 3.12. |
Sorry, @vstinner, I could not cleanly backport this to
|
GH-129022 is a backport of this pull request to the 3.12 branch. |
…129022) [3.13] gh-128679: Fix tracemalloc.stop() race conditions (#128897) tracemalloc_alloc(), tracemalloc_realloc(), PyTraceMalloc_Track(), PyTraceMalloc_Untrack() and _PyTraceMalloc_TraceRef() now check tracemalloc_config.tracing after calling TABLES_LOCK(). _PyTraceMalloc_Stop() now protects more code with TABLES_LOCK(), especially setting tracemalloc_config.tracing to 1. Add a test using PyTraceMalloc_Track() to test tracemalloc.stop() race condition. Call _PyTraceMalloc_Init() at Python startup. (cherry picked from commit 6b47499)
tracemalloc_alloc(), tracemalloc_realloc(), PyTraceMalloc_Track(), PyTraceMalloc_Untrack() and _PyTraceMalloc_TraceRef() now check tracemalloc_config.tracing after calling TABLES_LOCK().
_PyTraceMalloc_Stop() now protects more code with TABLES_LOCK(), especially setting tracemalloc_config.tracing to 1.
Add a test using PyTraceMalloc_Track() to test tracemalloc.stop() race condition.