Skip to content

gh-127266: avoid data races when updating type slots #131174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 54 commits into from
Apr 28, 2025

Conversation

nascheme
Copy link
Member

@nascheme nascheme commented Mar 13, 2025

Avoid data races when updating type slots and type flags by "stopping-the-world" first. This is only needed if the type is potentially exposed to multiple threads. There are a couple places this happens:

  • type_setattro() when the assigned item is a "dunder" method that corresponds to one or more slot
  • assigning to __bases__
  • _abc.c calling _PyType_SetFlags() to set the collection flags (SEQUENCE or MAPPING)

Two examples of code that causes this: dataclasses will assign dunder methods after the type object has been created, when the decorator runs. That triggers this and since the slots are assigned one-by-one, we stop for each assignment. Not great. Another example is unittest MagicMock. Again, it assigns dunder methods after the class is defined.

In order to assess the performance impact of this change, I've measured the following things. The time it takes to run the unit test suite does not seem significantly worse, even when running tests serially using -j 1. Comparing the pyperformance results before and after this change shows no significant change. See link below for the chart.

Running with TSAN enabled and running the --tsan and --tsan-parallel --parallel-threads=4 tests show no warnings. I also have a modified pyperformance test suite that I can run in parallel and that shows no TSAN warnings related to this as well.

Additional changes after feedback from Sam:

  • Replace the critical sections that use TYPE_LOCK with a simple mutex instead. To avoid dead-locks, the code was analyzed for re-entrancy and adjusted as needed. There are a couple places where the mutex is dropped before calling something that's possibly re-entrant.
  • Remove some locking where it is not needed. Slots can only be changed when the world is stopped (e.g. reading tp_mro no longer needs to hold the lock).
  • Create type_lookup_ex() as a lower-level version of _PyType_LookupStackRefAndVersion that optionally acquires the TYPE_LOCK or requires that it is already held.
  • Add a new type flag that is used for debug enabled builds. It tries to check if a type is potentially exposed to multiple threads (and therefore needs stop-the-world before modifying slots). This flag could perhaps be removed before the final merge of this PR (since type flag bits are kind of in short supply).

Benchmark using pyperformance

@kumaraditya303
Copy link
Contributor

kumaraditya303 commented Mar 14, 2025

Two examples that cause this: dataclasses will assign dunder methods after the type object has been created. That triggers this and since the slots are assigned one-by-one, we stop for each assignment. Not great. Another example is unittest MockMock. Again, it assigns dunder methods after the class is defined.

We can avoid stop the world pauses in both cases by checking if the type is uniquely ref by the current thread, the type object is newly created here and not exposed to other threads so we can assign without stopping the world.

@nascheme nascheme marked this pull request as ready for review March 14, 2025 21:37
@nascheme nascheme requested a review from markshannon as a code owner March 14, 2025 21:37
@nascheme nascheme requested review from mpage and colesbury March 15, 2025 18:11
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about atomicity here (see comment below).

Comment on lines 11314 to 11315
PyInterpreterState *interp = _PyInterpreterState_GET();
_PyEval_StopTheWorld(interp);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyEval_StopTheWorld() is always defined and is a no-op in the default build.

@colesbury
Copy link
Contributor

We might be able to make BEGIN_TYPE_LOCK() a regular mutex acquisition instead of a critical section, but we'll have to be careful that nothing between BEGIN_TYPE_LOCK() and END_TYPE_LOCK() leads to reentrancy.

For TYPE_LOCK, replace critical sections with a simple mutex.  A number
of changes were needed to avoid deadlocking on reentrancy (trying to
re-acquire the mutex). Split _PyType_LookupRefAndVersion() into a locked
and unlocked version.  Remove atomic operations where they are not
required. Remove some cases of TYPE_LOCK being held that is not
required.
@nascheme nascheme marked this pull request as draft March 26, 2025 07:49
@nascheme
Copy link
Member Author

I've converted TYPE_LOCK from being used as a critical section to being a regular mutex. There was a number of cases of reentrancy and so things got complex. I think I nearly have that sorted out now. The mro_invoke() function needs work yet. At least, unit tests pass without deadlocking. One somewhat ugly part yet is that _PyType_LookupRefAndVersion() got split into two versions with only slight differences between the functions (taking lock if needed vs caller already holding lock or the lock not being needed). Perhaps I can find a way to refactor it to avoid code duplication.

@nascheme nascheme marked this pull request as ready for review March 27, 2025 15:49
@nascheme nascheme requested a review from colesbury March 27, 2025 15:50
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

I think there's still too much mixing of the type lock and the stop the world. I think data should either be protected by the type lock OR by stop-the-world, but not both.

From reading through this:

type lock:

  • tp_version_tag
  • _spec_cache

stop the world:

  • tp_flags (modifications)
  • bases (modifications)

Additionally, there's a few functions that are named with _lock_held but are called in a stop-the-world pause (not with the type lock held), which I found a bit confusing.

Comment on lines 3552 to 3554
types_mutex_unlock();
new_mro = mro_invoke(type); /* might cause reentrance */
types_mutex_lock();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right to me. This can be called with both the type lock held and the world stopped. If mro_invoke can call arbitrary code, it needs to start the world around the call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that can call a custom mro() method on the meta class. The only thing in Lib that does that is a test_descr test but obviously we need to handle it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this has been resolved. Inside mro_invoke() I either start-the-world before calling mro() (if it's custom) or I unlock the mutex. The latter case happens during type_ready().

@nascheme
Copy link
Member Author

I'm not convinced that the switch of the type lock from a critical section to a plain mutex is profitable here.

Okay, I've reverted back to using a critical section for TYPE_LOCK.

This avoids the watcher callback potentially using invalid cache
entries.
* Since type_modified_unlocked() can be called with the world stopped
  then we need to re-start if making re-entrant calls (like calling
  type watcher callbacks).

* Re-order code before type_modified_unlocked().  This fixes a potential
  bug if called functions end up re-assigning the type version tag.  If
  they do, the cache can contain invalid entries.  We should do the
  dictionary update first then set the version to zero.

* Replace 'is_new_type' with 'world_stopped' boolean parameter.  This
  seems a bit more clear.
Rather than trying to update tp_base, tp_bases, and tp_mro while the
world is stopped, hold TYPE_LOCK when updating them.  This is much more
similar to how the base version of the code works and requires many
fewer changes.  Using stop-the-world turned out to be difficult to make
work correctly, mostly due to custom mro() methods but also for some
other reasons (for example, functions that potentially re-enter the
interpreter).

This re-work requires that we can call stop-the-world inside a critical
section and have the mutex still held after re-starting.  This seems to
work and I've added asserts to confirm it.  Stop-the-world is now only
used when updating either tp_flags or other type slots and not the three
type members listed above.
@nascheme
Copy link
Member Author

nascheme commented Apr 23, 2025

Major rework, reverting back to using TYPE_LOCK and critical sections to protect tp_mro, tp_bases, tp_base. In a previous version of this PR, I was making updates to those only while the world was stopped. Because we are so limited on what APIs you can call while the world is stopped it is hard to make it work without major overhaul to how types work. The mro() method on the metatype is particularly troublesome but there are other similar problems.

Here is a summary of what's changed vs the main branch:

  • Update tp_flags and type slots (other than tp_base, tp_bases, and tp_mro) only while the world is stopped. This allows us to read those type fields without atomics and without locking.
  • Add _Py_TYPE_REVEALED_FLAG as a debug build check to help prevent code that unsafely writes to a type (missing lock or stop-the-world).
  • Fix a data race with the pname/ptrs "little cache" for resolve_slotdups(). That cache is not safe in the free-threaded build so disable it.
  • A few code cleanup things that are not strictly required but leftover from the previous version of this PR. I think they improve the code so I left them in.

Updated pyperformance run.

@nascheme nascheme requested a review from colesbury April 23, 2025 20:18
These are not required as part of this PR and can be reviewed
separately.
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM once the issue with test_opcache is resolved

Now that these slot updates use stop-the-world, these two tests are
quite a lot slower.  Reduce size of the items list from 1000 to 100.
@nascheme nascheme enabled auto-merge (squash) April 28, 2025 20:26
@nascheme nascheme merged commit e414a2d into python:main Apr 28, 2025
48 checks passed
nascheme added a commit to nascheme/cpython that referenced this pull request Apr 29, 2025
nascheme added a commit that referenced this pull request Apr 29, 2025
… (gh-133129)

This is triggering deadlocks in test_opcache.  See GH-133130 for stack trace.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants