Skip to content

gh-132641: fix race in lru_cache under free-threading #133787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 13, 2025

Conversation

hawkinsp
Copy link
Contributor

@hawkinsp hawkinsp commented May 9, 2025

The bounded_lru_cache code was using a critical section on the lru cache object to protect dictionary accesses in some code paths, but using the critical section on the dictionary itself to protect accesses in other code paths. This led to races since not all threads agreed on which lock they needed to be holding.

Instead, always use a critical section on the underlying dictionary, rather than the lru cache object itself.

Fixes #132641

I do not have a small Python reproducer for this problem, but a test in the JAX test suite fails under 3.14 with a TSAN warning 27 out of 50 runs without this fix and 0 out of 50 runs with it.

@hawkinsp
Copy link
Contributor Author

hawkinsp commented May 9, 2025

@colesbury

@serhiy-storchaka
Copy link
Member

@tom-pytel

@picnixz picnixz changed the title gh-132641: Fix race in lru_cache due to inconsistent critical section… gh-132641: Fix race in lru_cache May 10, 2025
@picnixz picnixz changed the title gh-132641: Fix race in lru_cache gh-132641: fix race in lru_cache under free-threading May 10, 2025
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@tom-pytel
Copy link
Contributor

tom-pytel commented May 10, 2025

One of the LRU functions is chosen at LRU cache creation, infinite_lru_cache_wrapper, uncached_lru_cache_wrapper or bounded_lru_cache_wrapper. The uncached doesn't use critical section, the infinite uses the dictionary critical section and the bounded uses a critical section on the cache object itself and as far as I can see it is never changed. Point being there shouldn't be confusion about lock as those functions shouldn't mix.

I do notice something though:

if (_PyDict_SetItem_KnownHash(self->cache, key, (PyObject *)link,

Maybe it should be _PyDict_SetItem_KnownHash_LockHeld, otherwise I think it might try to get the lock on the dict, removing the lock on the cache object temporarily and allowing race. Try that change and see what you get (the other one further down as well).

This would also explain why taking the lock on the dictionary object fixes it (which BTW is fine to do anyway, but maybe confirm source of original error).

@serhiy-storchaka
Copy link
Member

If bounded uses any non-_LockHeld PyDict API, there may be an issue.

@tom-pytel
Copy link
Contributor

tom-pytel commented May 10, 2025

If bounded uses any non-_LockHeld PyDict API, there may be an issue.

#131758 (comment)

Holding the lock on the dict makes sense in general and should use the _PyDict_SetItem_KnownHash_LockHeld as well.

@hawkinsp
Copy link
Contributor Author

Yes, that was my point in #132641 (comment)
It's just simpler to always use the critical section on the dict, I think. You can then safely use the _LockHeld functions if you like, but it is not mandatory, since after this change the dictionary critical section will always be held when reading or writing the dictionary.

Copy link
Contributor

@kumaraditya303 kumaraditya303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct, functions like lru_cache_append_link modify the lru_cache_object which isn't protected if critical section is only acquired on the underlying dict.

@bedevere-app
Copy link

bedevere-app bot commented May 11, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@hawkinsp
Copy link
Contributor Author

This isn't correct, functions like lru_cache_append_link modify the lru_cache_object which isn't protected if critical section is only acquired on the underlying dict.

I don't understand or agree with this comment.

It does not matter on which object we hold a critical section, so long as we consistently hold the same critical section whenever we are reading or writing fields of the object. After this PR, we are always using the dict's critical section to protect the lru_cache object as well. The critical section on the lru_cache object itself would never be used.

@tom-pytel
Copy link
Contributor

It does not matter on which object we hold a critical section, so long as we consistently hold the same critical section whenever we are reading or writing fields of the object.

Correct, the lru_cache fields wind up being protected by the dictionary lock. The only thing I would ask is to change the _PyDict_SetItem_KnownHash to _PyDict_SetItem_KnownHash_LockHeld for consistency, even if its not needed.

@kumaraditya303
Copy link
Contributor

The following code triggers TSAN warnings on this PR:

from functools import lru_cache

from threading import Thread, Barrier
from random import randint

b = Barrier(100)
@lru_cache(maxsize=128)
def func(arg=0):
    func.cache_clear()
    return object()


def thread_func():
    b.wait()
    for i in range(10000):
        func(randint(0, 10000))

threads = []
for i in range(100):
    t = Thread(target=thread_func)
    threads.append(t)
    t.start()

for t in threads:
    t.join()

TSAN race:

==================
WARNING: ThreadSanitizer: data race (pid=26580)
  Write of size 8 at 0x7f250e4f7be8 by thread T5:
    #0 lru_cache_unlink_list /home/realkumaraditya/cpython/./Modules/_functoolsmodule.c:1589:16 (python+0x5bc97a) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #1 _functools__lru_cache_wrapper_cache_clear_impl /home/realkumaraditya/cpython/./Modules/_functoolsmodule.c:1689:27 (python+0x5bc97a)
    #2 _functools__lru_cache_wrapper_cache_clear /home/realkumaraditya/cpython/./Modules/clinic/_functoolsmodule.c.h:190:20 (python+0x5bc97a)
    #3 method_vectorcall_NOARGS /home/realkumaraditya/cpython/Objects/descrobject.c:448:24 (python+0x20ef99) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #4 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:169:11 (python+0x1f61ba) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #5 PyObject_Vectorcall /home/realkumaraditya/cpython/Objects/call.c:327:12 (python+0x1f61ba)
    #6 _PyEval_EvalFrameDefault /home/realkumaraditya/cpython/Python/generated_cases.c.h:1619:35 (python+0x40429e) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #7 _PyEval_EvalFrame /home/realkumaraditya/cpython/./Include/internal/pycore_ceval.h:119:16 (python+0x3ff170) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #8 _PyEval_Vector /home/realkumaraditya/cpython/Python/ceval.c:1964:12 (python+0x3ff170)
    #9 _PyFunction_Vectorcall /home/realkumaraditya/cpython/Objects/call.c (python+0x1f681f) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #10 _PyVectorcall_Call /home/realkumaraditya/cpython/Objects/call.c:273:16 (python+0x1f64af) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #11 _PyObject_Call /home/realkumaraditya/cpython/Objects/call.c:348:16 (python+0x1f64af)
    #12 PyObject_Call /home/realkumaraditya/cpython/Objects/call.c:373:12 (python+0x1f6515) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #13 bounded_lru_cache_wrapper /home/realkumaraditya/cpython/./Modules/_functoolsmodule.c:1493:14 (python+0x5bd50e) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #14 lru_cache_call /home/realkumaraditya/cpython/./Modules/_functoolsmodule.c:1638:14 (python+0x5bbbad) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #15 _PyObject_MakeTpCall /home/realkumaraditya/cpython/Objects/call.c:242:18 (python+0x1f5658) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #16 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:167:16 (python+0x1f6278) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #17 PyObject_Vectorcall /home/realkumaraditya/cpython/Objects/call.c:327:12 (python+0x1f6278)
    #18 _PyEval_EvalFrameDefault /home/realkumaraditya/cpython/Python/generated_cases.c.h:1619:35 (python+0x40429e) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #19 _PyEval_EvalFrame /home/realkumaraditya/cpython/./Include/internal/pycore_ceval.h:119:16 (python+0x3ff170) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #20 _PyEval_Vector /home/realkumaraditya/cpython/Python/ceval.c:1964:12 (python+0x3ff170)
    #21 _PyFunction_Vectorcall /home/realkumaraditya/cpython/Objects/call.c (python+0x1f681f) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #22 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:169:11 (python+0x1fb13f) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #23 method_vectorcall /home/realkumaraditya/cpython/Objects/classobject.c:72:20 (python+0x1fb13f)
    #24 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:169:11 (python+0x4584e7) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #25 context_run /home/realkumaraditya/cpython/Python/context.c:728:29 (python+0x4584e7)
    #26 method_vectorcall_FASTCALL_KEYWORDS /home/realkumaraditya/cpython/Objects/descrobject.c:421:24 (python+0x20ee99) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #27 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:169:11 (python+0x1f61ba) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #28 PyObject_Vectorcall /home/realkumaraditya/cpython/Objects/call.c:327:12 (python+0x1f61ba)
    #29 _PyEval_EvalFrameDefault /home/realkumaraditya/cpython/Python/generated_cases.c.h:1619:35 (python+0x40429e) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #30 _PyEval_EvalFrame /home/realkumaraditya/cpython/./Include/internal/pycore_ceval.h:119:16 (python+0x3ff170) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #31 _PyEval_Vector /home/realkumaraditya/cpython/Python/ceval.c:1964:12 (python+0x3ff170)
    #32 _PyFunction_Vectorcall /home/realkumaraditya/cpython/Objects/call.c (python+0x1f681f) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #33 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:169:11 (python+0x1fb13f) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #34 method_vectorcall /home/realkumaraditya/cpython/Objects/classobject.c:72:20 (python+0x1fb13f)
    #35 _PyVectorcall_Call /home/realkumaraditya/cpython/Objects/call.c:273:16 (python+0x1f64af) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #36 _PyObject_Call /home/realkumaraditya/cpython/Objects/call.c:348:16 (python+0x1f64af)
    #37 PyObject_Call /home/realkumaraditya/cpython/Objects/call.c:373:12 (python+0x1f6515) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #38 thread_run /home/realkumaraditya/cpython/./Modules/_threadmodule.c:353:21 (python+0x5acdd2) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #39 pythread_wrapper /home/realkumaraditya/cpython/Python/thread_pthread.h:242:5 (python+0x504887) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)

  Previous write of size 8 at 0x7f250e4f7be8 by thread T4:
    #0 lru_cache_append_link /home/realkumaraditya/cpython/./Modules/_functoolsmodule.c:1248:16 (python+0x5bda15) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #1 bounded_lru_cache_update_lock_held /home/realkumaraditya/cpython/./Modules/_functoolsmodule.c:1391:9 (python+0x5bda15)
    #2 bounded_lru_cache_wrapper /home/realkumaraditya/cpython/./Modules/_functoolsmodule.c:1500:14 (python+0x5bda15)
    #3 lru_cache_call /home/realkumaraditya/cpython/./Modules/_functoolsmodule.c:1638:14 (python+0x5bbbad) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #4 _PyObject_MakeTpCall /home/realkumaraditya/cpython/Objects/call.c:242:18 (python+0x1f5658) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #5 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:167:16 (python+0x1f6278) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #6 PyObject_Vectorcall /home/realkumaraditya/cpython/Objects/call.c:327:12 (python+0x1f6278)
    #7 _PyEval_EvalFrameDefault /home/realkumaraditya/cpython/Python/generated_cases.c.h:1619:35 (python+0x40429e) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #8 _PyEval_EvalFrame /home/realkumaraditya/cpython/./Include/internal/pycore_ceval.h:119:16 (python+0x3ff170) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #9 _PyEval_Vector /home/realkumaraditya/cpython/Python/ceval.c:1964:12 (python+0x3ff170)
    #10 _PyFunction_Vectorcall /home/realkumaraditya/cpython/Objects/call.c (python+0x1f681f) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #11 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:169:11 (python+0x1fb13f) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #12 method_vectorcall /home/realkumaraditya/cpython/Objects/classobject.c:72:20 (python+0x1fb13f)
    #13 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:169:11 (python+0x4584e7) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #14 context_run /home/realkumaraditya/cpython/Python/context.c:728:29 (python+0x4584e7)
    #15 method_vectorcall_FASTCALL_KEYWORDS /home/realkumaraditya/cpython/Objects/descrobject.c:421:24 (python+0x20ee99) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #16 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:169:11 (python+0x1f61ba) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #17 PyObject_Vectorcall /home/realkumaraditya/cpython/Objects/call.c:327:12 (python+0x1f61ba)
    #18 _PyEval_EvalFrameDefault /home/realkumaraditya/cpython/Python/generated_cases.c.h:1619:35 (python+0x40429e) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #19 _PyEval_EvalFrame /home/realkumaraditya/cpython/./Include/internal/pycore_ceval.h:119:16 (python+0x3ff170) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #20 _PyEval_Vector /home/realkumaraditya/cpython/Python/ceval.c:1964:12 (python+0x3ff170)
    #21 _PyFunction_Vectorcall /home/realkumaraditya/cpython/Objects/call.c (python+0x1f681f) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #22 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:169:11 (python+0x1fb13f) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #23 method_vectorcall /home/realkumaraditya/cpython/Objects/classobject.c:72:20 (python+0x1fb13f)
    #24 _PyVectorcall_Call /home/realkumaraditya/cpython/Objects/call.c:273:16 (python+0x1f64af) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #25 _PyObject_Call /home/realkumaraditya/cpython/Objects/call.c:348:16 (python+0x1f64af)
    #26 PyObject_Call /home/realkumaraditya/cpython/Objects/call.c:373:12 (python+0x1f6515) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #27 thread_run /home/realkumaraditya/cpython/./Modules/_threadmodule.c:353:21 (python+0x5acdd2) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #28 pythread_wrapper /home/realkumaraditya/cpython/Python/thread_pthread.h:242:5 (python+0x504887) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)

  Thread T5 'Thread-5 (threa' (tid=26589, running) created by main thread at:
    #0 pthread_create <null> (python+0xe21df) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #1 do_start_joinable_thread /home/realkumaraditya/cpython/Python/thread_pthread.h:289:14 (python+0x503718) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #2 PyThread_start_joinable_thread /home/realkumaraditya/cpython/Python/thread_pthread.h:331:9 (python+0x50353a) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #3 ThreadHandle_start /home/realkumaraditya/cpython/./Modules/_threadmodule.c:439:9 (python+0x5ac967) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #4 do_start_new_thread /home/realkumaraditya/cpython/./Modules/_threadmodule.c:1849:9 (python+0x5ac967)
    #5 thread_PyThread_start_joinable_thread /home/realkumaraditya/cpython/./Modules/_threadmodule.c:1964:14 (python+0x5ab731) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #6 cfunction_call /home/realkumaraditya/cpython/Objects/methodobject.c:565:18 (python+0x2a2fa7) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #7 _PyObject_MakeTpCall /home/realkumaraditya/cpython/Objects/call.c:242:18 (python+0x1f5658) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #8 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:167:16 (python+0x1f6278) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #9 PyObject_Vectorcall /home/realkumaraditya/cpython/Objects/call.c:327:12 (python+0x1f6278)
    #10 _PyEval_EvalFrameDefault /home/realkumaraditya/cpython/Python/generated_cases.c.h:3234:35 (python+0x40a577) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #11 _PyEval_EvalFrame /home/realkumaraditya/cpython/./Include/internal/pycore_ceval.h:119:16 (python+0x3fecaf) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #12 _PyEval_Vector /home/realkumaraditya/cpython/Python/ceval.c:1964:12 (python+0x3fecaf)
    #13 PyEval_EvalCode /home/realkumaraditya/cpython/Python/ceval.c:856:21 (python+0x3fecaf)
    #14 run_eval_code_obj /home/realkumaraditya/cpython/Python/pythonrun.c:1365:12 (python+0x4e1041) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #15 run_mod /home/realkumaraditya/cpython/Python/pythonrun.c:1436:19 (python+0x4e1041)
    #16 pyrun_file /home/realkumaraditya/cpython/Python/pythonrun.c:1293:15 (python+0x4dc530) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #17 _PyRun_SimpleFileObject /home/realkumaraditya/cpython/Python/pythonrun.c:521:13 (python+0x4dc530)
    #18 _PyRun_AnyFileObject /home/realkumaraditya/cpython/Python/pythonrun.c:81:15 (python+0x4dbc88) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #19 pymain_run_file_obj /home/realkumaraditya/cpython/Modules/main.c:402:15 (python+0x52199f) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #20 pymain_run_file /home/realkumaraditya/cpython/Modules/main.c:421:15 (python+0x52199f)
    #21 pymain_run_python /home/realkumaraditya/cpython/Modules/main.c:686:21 (python+0x520ccf) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #22 Py_RunMain /home/realkumaraditya/cpython/Modules/main.c:767:5 (python+0x520ccf)
    #23 pymain_main /home/realkumaraditya/cpython/Modules/main.c:797:12 (python+0x521238) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #24 Py_BytesMain /home/realkumaraditya/cpython/Modules/main.c:821:12 (python+0x5212bb) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #25 main /home/realkumaraditya/cpython/./Programs/python.c:15:12 (python+0x1607eb) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)

  Thread T4 'Thread-4 (threa' (tid=26588, running) created by main thread at:
    #0 pthread_create <null> (python+0xe21df) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #1 do_start_joinable_thread /home/realkumaraditya/cpython/Python/thread_pthread.h:289:14 (python+0x503718) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #2 PyThread_start_joinable_thread /home/realkumaraditya/cpython/Python/thread_pthread.h:331:9 (python+0x50353a) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #3 ThreadHandle_start /home/realkumaraditya/cpython/./Modules/_threadmodule.c:439:9 (python+0x5ac967) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #4 do_start_new_thread /home/realkumaraditya/cpython/./Modules/_threadmodule.c:1849:9 (python+0x5ac967)
    #5 thread_PyThread_start_joinable_thread /home/realkumaraditya/cpython/./Modules/_threadmodule.c:1964:14 (python+0x5ab731) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #6 cfunction_call /home/realkumaraditya/cpython/Objects/methodobject.c:565:18 (python+0x2a2fa7) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #7 _PyObject_MakeTpCall /home/realkumaraditya/cpython/Objects/call.c:242:18 (python+0x1f5658) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #8 _PyObject_VectorcallTstate /home/realkumaraditya/cpython/./Include/internal/pycore_call.h:167:16 (python+0x1f6278) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #9 PyObject_Vectorcall /home/realkumaraditya/cpython/Objects/call.c:327:12 (python+0x1f6278)
    #10 _PyEval_EvalFrameDefault /home/realkumaraditya/cpython/Python/generated_cases.c.h:3234:35 (python+0x40a577) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #11 _PyEval_EvalFrame /home/realkumaraditya/cpython/./Include/internal/pycore_ceval.h:119:16 (python+0x3fecaf) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #12 _PyEval_Vector /home/realkumaraditya/cpython/Python/ceval.c:1964:12 (python+0x3fecaf)
    #13 PyEval_EvalCode /home/realkumaraditya/cpython/Python/ceval.c:856:21 (python+0x3fecaf)
    #14 run_eval_code_obj /home/realkumaraditya/cpython/Python/pythonrun.c:1365:12 (python+0x4e1041) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #15 run_mod /home/realkumaraditya/cpython/Python/pythonrun.c:1436:19 (python+0x4e1041)
    #16 pyrun_file /home/realkumaraditya/cpython/Python/pythonrun.c:1293:15 (python+0x4dc530) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #17 _PyRun_SimpleFileObject /home/realkumaraditya/cpython/Python/pythonrun.c:521:13 (python+0x4dc530)
    #18 _PyRun_AnyFileObject /home/realkumaraditya/cpython/Python/pythonrun.c:81:15 (python+0x4dbc88) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #19 pymain_run_file_obj /home/realkumaraditya/cpython/Modules/main.c:402:15 (python+0x52199f) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #20 pymain_run_file /home/realkumaraditya/cpython/Modules/main.c:421:15 (python+0x52199f)
    #21 pymain_run_python /home/realkumaraditya/cpython/Modules/main.c:686:21 (python+0x520ccf) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #22 Py_RunMain /home/realkumaraditya/cpython/Modules/main.c:767:5 (python+0x520ccf)
    #23 pymain_main /home/realkumaraditya/cpython/Modules/main.c:797:12 (python+0x521238) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #24 Py_BytesMain /home/realkumaraditya/cpython/Modules/main.c:821:12 (python+0x5212bb) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)
    #25 main /home/realkumaraditya/cpython/./Programs/python.c:15:12 (python+0x1607eb) (BuildId: 6a926f53d2222342de04cdb51d70c5f1b710c0bd)

SUMMARY: ThreadSanitizer: data race /home/realkumaraditya/cpython/./Modules/_functoolsmodule.c:1589:16 in lru_cache_unlink_list
==================

@tom-pytel
Copy link
Contributor

The following code triggers TSAN warnings on this PR:

The @critical_sections on _functools__lru_cache_wrapper_cache_info_impl and _functools__lru_cache_wrapper_cache_clear_impl would need to be changed to the dict lock, so that's a point for keeping the lock on the lru_cache object.

@kumaraditya303
Copy link
Contributor

It does not matter on which object we hold a critical section, so long as we consistently hold the same critical section whenever we are reading or writing fields of the object. After this PR, we are always using the dict's critical section to protect the lru_cache object as well. The critical section on the lru_cache object itself would never be used.

Acquiring critical section on dict only provides exclusive access to the dict not the lru cache object, as you can see there are new data races introduced by this PR because it doesn't lock the lru cache object.

See critical section section at https://peps.python.org/pep-0703/#python-critical-sections

@hawkinsp
Copy link
Contributor Author

hawkinsp commented May 11, 2025

It does not matter on which object we hold a critical section, so long as we consistently hold the same critical section whenever we are reading or writing fields of the object. After this PR, we are always using the dict's critical section to protect the lru_cache object as well. The critical section on the lru_cache object itself would never be used.

Acquiring critical section on dict only provides exclusive access to the dict not the lru cache object, as you can see there are new data races introduced by this PR because it doesn't lock the lru cache object.

See critical section section at https://peps.python.org/pep-0703/#python-critical-sections

Again, it does not matter which lock we use to provide exclusive access, so long as we are consistent. Just because a mutex is attached to a particular object, it does not mean it has to protect that object alone.

The dict lock will work fine, so long as it is used consistently. Not being a frequent CPython contributor, I did not know that @critical_section was a thing and that's why I didn't update it in this PR. I can fix that, if we like.

You could use the lru cache critical section, but then you have be careful never to mix the unlocked and locked dict APIs. e.g., I think calling PyDict_Clear() here is unsafe

PyDict_Clear(_self->cache);

because it will acquire the dict's critical section, and that can release the critical section on the lru cache, leading to the sort of race I originally reported: some callers will hold the dict lock when updating the dict and not the lru cache lock, whereas others will hold the lru cache lock but not the dict lock. Similarly you'd have the same problem for:
https://github.com/python/cpython/blob/0eb448cae5e9008f815204d8b46bfd7cd641a152/Modules/_functoolsmodule.c#L1416C11-L1416C32

It would be fine if all callers used the locked dictionary APIs, of course. But as far as I can tell there is no _PyDict_Clear_LockHeld, for example, so we'd have to add one. If you are happy for me to go and add _LockHeld variants of every PyDict API that the LRU cache calls, we could use that lock instead.

I admit I am slightly vague on under exactly what condition acquiring an inner critical section will release the outer critical section. However, it certainly can be triggered in practice, see the original bug report.

@tom-pytel
Copy link
Contributor

Similarly you'd have the same problem for:
https://github.com/python/cpython/blob/0eb448cae5e9008f815204d8b46bfd7cd641a152/Modules/_functoolsmodule.c#L1416C11-L1416C32

In this case no because this function assumes lock, should really be called _PyDict_Pop_KnownHash_LockHeld:

ASSERT_DICT_LOCKED(mp);

But you are right about PyDict_Clear(), it would drop the lock on lru_cache as soon as it took its own, so that's kind of a brick wall. You could lock both simultaneously there with a Py_BEGIN_CRITICAL_SECTION2(lru_cache, dict) maybe, not sure if that would drop the lru_cache lock when dict took its own already-locked. But if locking the dict then why bother locking the lru_cache at all? Only to keep the all-objects-use-own-lock paradigm? Is that a something we are trying to enforce?

Copy link
Contributor

@kumaraditya303 kumaraditya303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some tests like the one I posted above to test_free_threading

@hawkinsp
Copy link
Contributor Author

Please add some tests like the one I posted above to test_free_threading

Done.

@hawkinsp
Copy link
Contributor Author

Done.

@hawkinsp
Copy link
Contributor Author

Done. Incidentally, with that fix, this test now reliably triggers #129748 with high probability run under tsan. However, that is not a new problem in this PR.

…ection use.

The bounded_lru_cache code was using a critical section on the lru cache
object to protect dictionary accesses in some code paths, but using the
critical section on the dictionary itself to protect accesses in other
code paths. This led to races since not all threads agreed on which lock
they needed to be holding.

Instead, always use a critical section on the underlying dictionary,
rather than the lru cache object itself.

Fixes python#132641
Copy link
Contributor

@kumaraditya303 kumaraditya303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@kumaraditya303 kumaraditya303 enabled auto-merge (squash) May 13, 2025 17:32
@kumaraditya303 kumaraditya303 merged commit 9ad0c7b into python:main May 13, 2025
46 of 47 checks passed
@miss-islington-app
Copy link

Thanks @hawkinsp for the PR, and @kumaraditya303 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 13, 2025
…GH-133787)

Fix race in `lru_cache` by acquiring critical section on the cache object itself and call the lock held variant of dict functions to modify the underlying dict.
(cherry picked from commit 9ad0c7b)

Co-authored-by: Peter Hawkins <phawkins@google.com>
@bedevere-app
Copy link

bedevere-app bot commented May 13, 2025

GH-133979 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data race between compare_generic and insert_combined_dict under free-threading
5 participants