gh-127266: avoid data races when updating type slots v2 #133177

nascheme · 2025-04-29T23:45:52Z

This is an updated version of GH-131174, which was reverted. I figured the cleanest thing to do is make a new PR.

This is the same as the previous PR with the following additional change. The update_all_slots() and type_setattro() functions are now more careful when the world is stopped. Instead of doing the MRO lookups while the world is stopped, we do them all first and collect the slot pointers to be updated. Then, we stop the world and do those updates. This makes it much easier to confirm the code running during the stop-the-world is safe and that should avoid the deadlocks.

The test_opcache test has become quite a bit slower. It seems to be due to mutex contention in the __getitem__ and __getattribute__ method assignment tests. I reduced the items count from 1000 to 100 to keep the test from becoming much slower.

Issue: Type slots are not thread-safe in free-threaded builds #127266

In the free-threaded build, avoid data races caused by updating type slots or type flags after the type was initially created. For those (typically rare) cases, use the stop-the-world mechanism. Remove the use of atomics when reading or writing type flags. The use of atomics is not sufficient to avoid races (since flags are sometimes read without a lock and without atomics) and are no longer required.

To avoid deadlocks while the world is stopped, we need to avoid calling APIs like _PyObject_HashFast() and _PyDict_GetItemRef_KnownHash(). Collect the slot updates to be done and then apply them all at once. This reduces the amount of code running in the stop-the-world condition.

bedevere-bot · 2025-04-30T01:50:05Z

🤖 New build scheduled with the buildbot fleet by @nascheme for commit d511ca6 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F133177%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

Now that stop-the-world is used to do the slot update, these tests are a lot slower in the free-threaded build. Test with fewer items, which will still hopefully be enough to find bugs in the specializer.

The clearing of Py_TPFLAGS_HAVE_VECTORCALL must be done when the world is stopped too.

bedevere-bot · 2025-04-30T07:45:13Z

🤖 New build scheduled with the buildbot fleet by @nascheme for commit 3cb2256 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F133177%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

Since we stack allocate one chunk, we need to check 'n' to see if there are actually any updates to make. It's pretty common that no updates are actually needed.

colesbury · 2025-05-01T16:08:22Z

Lib/test/test_opcache.py

@@ -576,6 +576,7 @@ class TestRacesDoNotCrash(TestBase):
    # Careful with these. Bigger numbers have a higher chance of catching bugs,
    # but you can also burn through a *ton* of type/dict/function versions:
    ITEMS = 1000
+    SMALL_ITEMS = 100


It might be worth investigating this further. I'm surprised there's a large slowdown in this PR, but not in the earlier version. What's the relevant difference?

colesbury · 2025-05-01T16:10:12Z

Tools/tsan/suppressions_free_threading.txt

Hmm.. seems like we should fix this in a follow up PR

colesbury · 2025-05-01T16:15:46Z

Objects/typeobject.c

    PyObject *old_bases = lookup_tp_bases(type);
    assert(old_bases != NULL);
    PyTypeObject *old_base = type->tp_base;

    set_tp_bases(type, Py_NewRef(new_bases), 0);
-    type->tp_base = (PyTypeObject *)Py_NewRef(new_base);
+    type->tp_base = (PyTypeObject *)Py_NewRef(best_base);


Should we be setting type->tp_base in a stop the world pause?

Doesn't have to be part of this PR if it complicates things.

nascheme added 2 commits April 29, 2025 16:44

nascheme added type-bug An unexpected behavior, bug, or error topic-free-threading labels Apr 29, 2025

bedevere-app bot mentioned this pull request Apr 29, 2025

Type slots are not thread-safe in free-threaded builds #127266

Open

Avoid "empty structure" compile error.

d511ca6

nascheme added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 30, 2025

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 30, 2025

nascheme added 5 commits April 29, 2025 22:40

Use apply_slot_updates() for type_setattro().

5e38497

Merge 'origin/main' into type-slot-ts-v2

e9516c7

Reduce number of items in test for slot updates.

8c74a0c

Now that stop-the-world is used to do the slot update, these tests are a lot slower in the free-threaded build. Test with fewer items, which will still hopefully be enough to find bugs in the specializer.

Add TSAN suppression for _Py_slot_tp_getattr_hook.

6cd7644

Queue update of tp_flags as well.

3cb2256

The clearing of Py_TPFLAGS_HAVE_VECTORCALL must be done when the world is stopped too.

nascheme added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 30, 2025

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 30, 2025

nascheme marked this pull request as ready for review April 30, 2025 13:29

nascheme requested a review from markshannon as a code owner April 30, 2025 13:29

bedevere-app bot added the awaiting core review label Apr 30, 2025

nascheme requested a review from colesbury April 30, 2025 13:33

nascheme added 5 commits April 30, 2025 06:45

Performance, skip stop-the-world when possible.

47e41c9

Since we stack allocate one chunk, we need to check 'n' to see if there are actually any updates to make. It's pretty common that no updates are actually needed.

Merge 'origin/main' into type-slot-ts-v2

cb848f1

Always clear version after __bases__ update.

9859ebf

Merge 'origin/main' into type-slot-ts-v2

6c74cac

Add test for assigning __bases__.

583c435

colesbury reviewed May 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-127266: avoid data races when updating type slots v2 #133177

gh-127266: avoid data races when updating type slots v2 #133177

nascheme commented Apr 29, 2025 •

edited

Loading

bedevere-bot commented Apr 30, 2025

bedevere-bot commented Apr 30, 2025

colesbury May 1, 2025

colesbury May 1, 2025

colesbury May 1, 2025

gh-127266: avoid data races when updating type slots v2 #133177

Are you sure you want to change the base?

gh-127266: avoid data races when updating type slots v2 #133177

Conversation

nascheme commented Apr 29, 2025 • edited Loading

bedevere-bot commented Apr 30, 2025

bedevere-bot commented Apr 30, 2025

colesbury May 1, 2025

Choose a reason for hiding this comment

colesbury May 1, 2025

Choose a reason for hiding this comment

colesbury May 1, 2025

Choose a reason for hiding this comment

nascheme commented Apr 29, 2025 •

edited

Loading