Skip to content

gh-131757: allow lru_cache functions to execute concurrently #131758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Apr 14, 2025

Conversation

tom-pytel
Copy link
Contributor

@tom-pytel tom-pytel commented Mar 26, 2025

This PR changes functools.lru_cache to only hold critical sections when it is performing operations on itself and not when it calls the wrapped function being cached.

Example script timing, current code:

$ ./python ../bench_lrucache.py
Time: 31.0s

This PR:

$ ./python ../bench_lrucache.py
Time: 2.4s

Explanation: The script is 16 threads doing a long operation. In the current code they run sequentially because they are serialized by lru_cache. In this PR they are allowed to run concurrently.

Script:

import threading
from functools import lru_cache
from time import time


@lru_cache
def func(v):
    for i in range(100000000):
        pass
    return v

threads = [threading.Thread(target=func, args=(i,)) for i in range(16)]

t0 = time()

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

print(f'Time: {time() - t0:0.1f}s')

More detail: There are three caching functions that can be used, currently they all execute locked (and by extension the function being cached):

  • uncached_lru_cache_wrapper - Used when lru_cache(maxsize=0). Just a passthrough call, this doesn't need a lock at all.
  • infinite_lru_cache_wrapper - Used when lru_cache(maxsize=None). This can just rely on the possible lock when doing dict operations.
  • bounded_lru_cache_wrapper - Used otherwise. This has been split up into a pre-call function and a post-call function which are both locked individually, with the critical section being released during the actual call to the cached function. This can be done because lru_cache is thread-safe and accomodates for the possibility of the cache dictionary changing during execution of the cached function.

NOTE: This can be reduced to a single critical section if the locked function is only ever called in its owner thread, but not sure is necessary and wanted to keep things simple for this PR.

@vstinner
Copy link
Member

The difference is that func(arg) can now be called with the same arg multiple times in parallel, no?

@tom-pytel
Copy link
Contributor Author

The difference is that func(arg) can now be called with the same arg multiple times in parallel, no?

Yes, and it behaves the same way as before free-threading - it can get into the same function with the same args multiple times if the calls arrive roughly at the same time.

@rhettinger
Copy link
Contributor

This PR changes functools.lru_cache to only hold critical sections when it is performing operations on itself and not when it calls the wrapped function being cache

+1 in principle. The C version should be at least as good as the pure python version. In practice, this is tricky to get right.

@colesbury This PR modifies you previous work. Do you want to take a look at it?

@serhiy-storchaka This is mostly your C code. Do you want to look this over?

@tom-pytel I suggest looking at the pure python version in /Lib/functools.py to verify that the C version handles all of the cases listed in the comments. In particular, look at the one that starts with "Getting here means that this same key was added to the cache while the lock was released." The wrapped function can reenter the cache, clear the cache, add to the cache, age out old keys, or be recursive.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How safe is to execute self->hits++ or self->misses++ without critical section?

@tom-pytel
Copy link
Contributor Author

How safe is to execute self->hits++ or self->misses++ without critical section?

In infinite_lru_cache_wrapper you mean, good catch, can change to atomics.

@serhiy-storchaka
Copy link
Member

And in uncached_lru_cache_wrapper.

@tom-pytel
Copy link
Contributor Author

And in uncached_lru_cache_wrapper.

Also _functools__lru_cache_wrapper_cache_info_impl and _functools__lru_cache_wrapper_cache_clear_impl as they may be called on the uncached or infinite non-critical wrappers. The atomics in bounded_lru_cache_get_lock_held are unnecessary I just realized so will remove.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@colesbury
Copy link
Contributor

Also, can you share bench_lrucache.py?

@tom-pytel
Copy link
Contributor Author

Also, can you share bench_lrucache.py?

Its in the header of this PR.

@tom-pytel
Copy link
Contributor Author

@tom-pytel I suggest looking at the pure python version in /Lib/functools.py to verify that the C version handles all of the cases listed in the comments.

I double checked all these cases as you suggested and its fine. Which makes sense since all this PR really amounts to is releasing the lock on the call to the function, no other behavior is changed.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little late to the party, but this looks pretty good.

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well, thanks for doing this.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not overdo this. If is a simple macro to make the code in that file clearer. We do not need a return value. Other similar macros do not use inline functions. If we need that macro in other places, we can update the implementation.

I suggested to implement a simple increment. value++ returns an old value, if this is important.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vstinner vstinner merged commit 4c12a2d into python:main Apr 14, 2025
42 checks passed
@vstinner
Copy link
Member

Merged, thank you for this new optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants