-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
gh-127266: avoid data races when updating type slots #131174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We can avoid stop the world pauses in both cases by checking if the type is uniquely ref by the current thread, the type object is newly created here and not exposed to other threads so we can assign without stopping the world. |
This would be better to be part of the pystats info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about atomicity here (see comment below).
Objects/typeobject.c
Outdated
PyInterpreterState *interp = _PyInterpreterState_GET(); | ||
_PyEval_StopTheWorld(interp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_PyEval_StopTheWorld()
is always defined and is a no-op in the default build.
We might be able to make |
For TYPE_LOCK, replace critical sections with a simple mutex. A number of changes were needed to avoid deadlocking on reentrancy (trying to re-acquire the mutex). Split _PyType_LookupRefAndVersion() into a locked and unlocked version. Remove atomic operations where they are not required. Remove some cases of TYPE_LOCK being held that is not required.
I've converted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I think there's still too much mixing of the type lock and the stop the world. I think data should either be protected by the type lock OR by stop-the-world, but not both.
From reading through this:
type lock:
- tp_version_tag
- _spec_cache
stop the world:
- tp_flags (modifications)
- bases (modifications)
Additionally, there's a few functions that are named with _lock_held
but are called in a stop-the-world pause (not with the type lock held), which I found a bit confusing.
Objects/typeobject.c
Outdated
types_mutex_unlock(); | ||
new_mro = mro_invoke(type); /* might cause reentrance */ | ||
types_mutex_lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right to me. This can be called with both the type lock held and the world stopped. If mro_invoke
can call arbitrary code, it needs to start the world around the call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that can call a custom mro()
method on the meta class. The only thing in Lib that does that is a test_descr
test but obviously we need to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this has been resolved. Inside mro_invoke()
I either start-the-world before calling mro()
(if it's custom) or I unlock the mutex. The latter case happens during type_ready()
.
Okay, I've reverted back to using a critical section for |
This avoids the watcher callback potentially using invalid cache entries.
* Since type_modified_unlocked() can be called with the world stopped then we need to re-start if making re-entrant calls (like calling type watcher callbacks). * Re-order code before type_modified_unlocked(). This fixes a potential bug if called functions end up re-assigning the type version tag. If they do, the cache can contain invalid entries. We should do the dictionary update first then set the version to zero. * Replace 'is_new_type' with 'world_stopped' boolean parameter. This seems a bit more clear.
Rather than trying to update tp_base, tp_bases, and tp_mro while the world is stopped, hold TYPE_LOCK when updating them. This is much more similar to how the base version of the code works and requires many fewer changes. Using stop-the-world turned out to be difficult to make work correctly, mostly due to custom mro() methods but also for some other reasons (for example, functions that potentially re-enter the interpreter). This re-work requires that we can call stop-the-world inside a critical section and have the mutex still held after re-starting. This seems to work and I've added asserts to confirm it. Stop-the-world is now only used when updating either tp_flags or other type slots and not the three type members listed above.
Major rework, reverting back to using Here is a summary of what's changed vs the main branch:
|
These are not required as part of this PR and can be reviewed separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM once the issue with test_opcache
is resolved
Now that these slot updates use stop-the-world, these two tests are quite a lot slower. Reduce size of the items list from 1000 to 100.
…thongh-131174) This is triggering deadlocks in test_opcache.
Avoid data races when updating type slots and type flags by "stopping-the-world" first. This is only needed if the type is potentially exposed to multiple threads. There are a couple places this happens:
type_setattro()
when the assigned item is a "dunder" method that corresponds to one or more slot__bases__
_abc.c
calling_PyType_SetFlags()
to set the collection flags (SEQUENCE or MAPPING)Two examples of code that causes this:
dataclasses
will assign dunder methods after the type object has been created, when the decorator runs. That triggers this and since the slots are assigned one-by-one, we stop for each assignment. Not great. Another example is unittestMagicMock
. Again, it assigns dunder methods after the class is defined.In order to assess the performance impact of this change, I've measured the following things. The time it takes to run the unit test suite does not seem significantly worse, even when running tests serially using
-j 1
. Comparing the pyperformance results before and after this change shows no significant change. See link below for the chart.Running with TSAN enabled and running the
--tsan
and--tsan-parallel --parallel-threads=4
tests show no warnings. I also have a modified pyperformance test suite that I can run in parallel and that shows no TSAN warnings related to this as well.Additional changes after feedback from Sam:
TYPE_LOCK
with a simple mutex instead. To avoid dead-locks, the code was analyzed for re-entrancy and adjusted as needed. There are a couple places where the mutex is dropped before calling something that's possibly re-entrant.tp_mro
no longer needs to hold the lock).type_lookup_ex()
as a lower-level version of_PyType_LookupStackRefAndVersion
that optionally acquires theTYPE_LOCK
or requires that it is already held.Benchmark using pyperformance