Skip to content

Document that cache() and lru_cache() do not have a "call once" guarantee #103475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
okrcma opened this issue Apr 12, 2023 · 8 comments
Closed

Document that cache() and lru_cache() do not have a "call once" guarantee #103475

okrcma opened this issue Apr 12, 2023 · 8 comments
Assignees
Labels
docs Documentation in the Doc dir

Comments

@okrcma
Copy link

okrcma commented Apr 12, 2023

Documentation

The documentation for @functools.cache states that

The cache is threadsafe so the wrapped function can be used in multiple threads.

but it is not threadsafe before the value is cached.

In the following example the wrapped function will return two different instances of list.

import functools
from threading import Thread
from time import sleep


@functools.cache
def get_cached_object():
    sleep(1)  # stand-in for some logic which takes some time to execute
    return list()


def run():
    print(f"{id(get_cached_object())}\n")


if __name__ == "__main__":
    first_thread = Thread(target=run)
    second_thread = Thread(target=run)

    first_thread.start()
    second_thread.start()

    sleep(2)  # wait for both threads to finish
    run()

This is an issue, for example, when you use @functools.cache for implementation of singletons (we can leave debate about singletons not being a good practice out of this).

The documentation should not claim the cache to be threadsafe or there should be an explicit warning about this situation.

Linked PRs

@okrcma okrcma added the docs Documentation in the Doc dir label Apr 12, 2023
@sunmy2019
Copy link
Member

From source code, it seems only
cached_property() - computed once per instance, cached as attribute
provides the "once guarantee".

cache has the meaning of "saving computation" rather than "once guarantee", since it is part of lru_cache

@rhettinger
Copy link
Contributor

The term threadsafe means different things to different people. Here, it is correctly used in the narrow sense to indicate that the underlying data structure updates take place with a lock (or GIL) held. That makes it safe to use in a multithreaded environment without fear that the data structure will become incoherent. The stands in marked contrast to structures like the pure Python version of OrderedDict or random.gauss() where the invariants get broken during multi-threaded updates. We're mostly consistent about this in the docs where we say that random.random, dict.setdefault, and deque.append are threadsafe.

I'll add a clarifying note to the cache() and lru_cache() docs to note that there is no "call once" guarantee. We've never made that claim and have intentionally called the underlying function outside of the lock. In general, it isn't reliable or safe to leave a lock open across a call to arbitrary user code.

@rhettinger rhettinger changed the title @functools.cache is NOT thread safe Document that cache() and lru_cache() do not have a "call once" guarantee Apr 13, 2023
rhettinger added a commit to rhettinger/cpython that referenced this issue Apr 21, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Apr 22, 2023
…arantee (pythonGH-103669)

(cherry picked from commit e5eaac6)

Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
AlexWaygood pushed a commit that referenced this issue Apr 22, 2023
…uarantee (GH-103669) (#103682)

GH-103475: cache() and lru_cache() do not have a "call once" guarantee (GH-103669)
(cherry picked from commit e5eaac6)

Co-authored-by: Raymond Hettinger <rhettinger@users.noreply.github.com>
@petergaultney
Copy link

I'll add a clarifying note to the cache() and lru_cache() docs to note that there is no "call once" guarantee. We've never made that claim and have intentionally called the underlying function outside of the lock. In general, it isn't reliable or safe to leave a lock open across a call to arbitrary user code.

I'm curious about this statement. I understand that locks, in general, do not compose. But the current implementation uses a reentrant lock for the caching bits, and I can't come up with a reason we couldn't do an efficient double-checked lock with that reentrant lock even in the case that the user code (perversely) chooses to recurse.

Assuming I'm wrong about the above... would the CPython team be open to a PR that would include a conditional in the lru_cache decorator (perhaps lock_call: bool = False) that would conditionally apply the existing reentrant lock to the call to user_function? It would slightly complicate the code, but I and my coworkers run into the need for something like this (essentially the computation or underlying 'side effect' is so expensive that we cannot afford to have it called twice in a multithreading context) so frequently that we really wish it could eventually be part of the stdlib.

@sunmy2019
Copy link
Member

sunmy2019 commented May 12, 2023

would the CPython team be open to a PR

You can open a PR. But it is hard to handle things correctly.

Note you cannot simply use one lock to guard the whole function. This lru_cache may be called during the call of the user function. Consider this

from functools import lru_cache
from concurrent.futures import ThreadPoolExecutor


def heavy_things(n: int):
    return f(n - 1) + f(n - 2)


@lru_cache
def f(n):
    if n > 1:
        with ThreadPoolExecutor() as p:
            result = p.map(heavy_things, [n])
        return sum(result)
    return 1


print(f(10))

It's a demo that some calculations might happen in another thread.

Do not cause a deadlock in this case.

My recommendation would be to customize with your own implementation.

@petergaultney
Copy link

yes, this makes sense.

for what it's worth, we do have our own implementation. it's just unfortunate that we have to have it.

An updated suggestion for a possible API that would be backward-compatible and put as much power as possible in the hands of the user: Allow lru_cache to take a func_lock argument, which defaults to None. If provided, the user is providing a lock (ideally, nothing more than a context manager which is whatever locking implementation they wish to provide), and the user_function is called inside that context.

I feel confident this would not break existing tests, and it should not be terribly difficult to write a threaded test (although what would constitute sufficient 'negative proof' of race conditions I am not sure.)

@petergaultney
Copy link

petergaultney commented May 13, 2023

actually, i'd revise my proposal. The parameter would be key_lock, and its call signature would be key_lock: typing.ContextManager[Hashable].

Then, if this was non-nil, _lru_wrapper would call user_function inside the context as the following pseudocode demonstrates:

key = make_key(...)

def _check_cache(...):
    # same code as currently exists

cache_hit = _check_cache(key)

if not cache_hit:
    with key_lock(key):
        cache_hit = _check_cache(key)
        if not cache_hit:
            cache_hit = add_result_to_cache(key, user_function(*args, **kwds))

 return cache_hit.result

This allows the user to opt into granular locking per unique call. there are various ways to implement this; our implementation uses threading.Event to release waiting threads after the winning thread has performed the work.

@rhettinger
Copy link
Contributor

would the CPython team be open to a PR that would include a conditional in the lru_cache decorator (perhaps lock_call: bool = False) that would conditionally apply the existing reentrant lock to the call to user_function?

I don't think so. This would be a feature creep for the cache to handle a niche use case. IIRC other cache implementations I looked at did not have this feature.

Also note that the C version doesn't even have its own lock, so there is nothing to attach this to. And for the Python version, I'm wary of ever leaving an open lock across an arbitrary user function call — that is a recipe for hard to find bugs.

Perhaps make your own package for implementing call-once behavior and post it to the Python packaging index. There, it can get a thorough shake-down and we can see if there is any uptake by the user community. FWIW,OrderedDict makes it easy to implement your own lru_cache variants.

@sunmy2019
Copy link
Member

would the CPython team be open to a PR that would include a conditional in the lru_cache decorator (perhaps lock_call: bool = False) that would conditionally apply the existing reentrant lock to the call to user_function?

I don't think so now. We are moving the other way. See #101890

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir
Projects
None yet
Development

No branches or pull requests

4 participants