-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Improve the cache when getting font metrics #28831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This tripped me up. "Please look separately at (i) commit 1, (ii) commit 1+2". is the way to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to sum up the differences:
- API: option 1 takes a weak ref, option 2 takes the renderer itself. - API-wise 2 is slightly more comfortable. The user does not have to care for weak refs. Also, the weakref-handling is better contained.
- Behavior: The renderer cache in option 2 is unbound. Can that be an issue for long-running processes?
- Both compared to the original implementation: We create separate caches per renderer. Before we had one large cache for renderer+settings. I assume this does not make a relevant difference
lib/matplotlib/text.py
Outdated
@functools.lru_cache(4096) | ||
def _text_metrics(text, fontprop, ismath, dpi): | ||
# dpi is unused, but participates in cache invalidation (via the renderer). | ||
return renderer_ref().get_text_width_height_descent(text, fontprop, ismath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In option 2: don't we have to raise a RuntimeError (as in option 1) if the weakref does not resolve (renderer_ref()
is None)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, but if we are being called then we have (on behalf of the user) recently called _get_text_metrics_with_cache
with a hard-ref so we are sure it is alive. That said, I'll put it back in.
This helper should probably be made __
private as well?
🤦🏻 yeah, that makes sense, the second diff in going to be weird. I thought putting them in the same PR would be simpler than two PRs and easier to deal with diffs in the comments of the issue. Currently the only place we call the matplotlib/lib/matplotlib/text.py Lines 65 to 70 in 64d45cb
_get_text_metrics_with_cache is matplotlib/lib/matplotlib/text.py Lines 372 to 384 in 64d45cb
option 2 is unbounded, but it is tied to the lifetime of the renderers it should not grow past the number of renderers the user implicitly keeps alive by the number of In option 1 we keep some number of renderers alive (25) which is either going to cause incorrect misses for someone who has more than 25 figures alive (good idea or not, someone might really want this) and keep the cache too long in cases where the user is churning through figures one at a time. I think the pro of option 1 is that it is "simpler" by using layered LRU cache, but option 2 is more complex but technically better. In either case the two-tiered cache is the main "fix". |
e93968c
to
b2f73a6
Compare
That's clever. 👍 I hadn't realized that. In this case I'm for option 2 plus a good comment why we do this and how it works 😃. |
lib/matplotlib/text.py
Outdated
|
||
|
||
# use mutable default to carry hidden global state | ||
def _get_text_metrics_function(inp_renderer, _cache=weakref.WeakKeyDictionary()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does inp stand for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"input"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we could afford the two additional letters 😁
lib/matplotlib/text.py
Outdated
# use mutable default to carry hidden global state | ||
def _get_text_metrics_function(inp_renderer, _cache=weakref.WeakKeyDictionary()): | ||
if (_text_metrics := _cache.get(inp_renderer, None)) is None: | ||
renderer_ref = weakref.ref(inp_renderer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you actually need to explicitly create a weakref here? Doesn't using a WeakKeyDictionary basically do this for you for free already (i.e. aren't you having two layers of weakref'ing here?)? (not sure...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, if we do not the closure holds a hard reference mediated in the value side of the weakkey dict and they become immortal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I missed the fact that you are caching a closure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the closure can be made more explicit, something like (untested)
cache = WeakKeyDictionary() # Can be hidden as an attribute or a private parameter...
def _get_text_metrics_with_cache(renderer, text, fp, ismath, dpi):
if renderer not in cache:
cache[renderer] = functools.lru_cache(4096)(
functools.partial(_weak_text_metrics, weakref.ref(renderer)))
return cache[renderer](text, fp.copy(), ismath, dpi)
def _weak_text_metrics(renderer_ref, text, fp, ismath, dpi):
return renderer_ref().get_text_width_height_descent(text, fp, ismath)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a some comments. I think that the closure is marginally clearer than the use of partial
.
lib/matplotlib/text.py
Outdated
|
||
|
||
# use mutable default to carry hidden global state | ||
def _get_text_metrics_function(inp_renderer, _cache=weakref.WeakKeyDictionary()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hiding the cache in a hidden mutable kwarg seems a bit weird to me (but sure, why not); I would rather have defined it as a custom attribute on the function (_get_text_metrics_function._cache = WeakKeyDictionary()
). Just a stylistic choice, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When put into a keyword, it's easier to access it inside the function. - But needs documentation 😄
lib/matplotlib/text.py
Outdated
@functools.lru_cache(4096) | ||
def _text_metrics(text, fontprop, ismath, dpi): | ||
# dpi is unused, but participates in cache invalidation (via the renderer). | ||
if (lcl_renderer := renderer_ref()) is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I assume lcl
here is local
as well? Probably can fit the two letters here too...
db121ec
to
a0e0809
Compare
the renderer is garbage collected. | ||
|
||
The inner layer is provided by an lru_cache with a large maximum size (such | ||
that we do not expect it to be hit in very few actual use cases). As the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wording was a bit confusing.
that we do not expect it to be hit in very few actual use cases). As the | |
that we expect very few cache misses in actual use cases). As the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice documentation 👍
# We pass a copy of the fontprop because FontProperties is both mutable and | ||
# has a `__hash__` that depends on that mutable state. This is not ideal | ||
# as it means the hash of an object is not stable over time which leads to | ||
# very confusing behavior when used as keys in dictionaries or hashes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, see also #22495 (immutable FontProperties).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I'm glad there is an issue for that.
@@ -919,6 +919,7 @@ def call(*args, **kwargs): | |||
|
|||
|
|||
def test_metrics_cache2(): | |||
plt.close('all') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tests should start with no figure open due to the test fixture, so this should be unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a long shot that something was getting not cleared because the assert that is failing is before we do anything in this test.
Please look at each of the first commit (option 1) and then the combination of both commits (net to option 2). Once we pick which one we like better I'll drop the other commit and fix / add tests.We settled on option 2 in the discussion so I squashed option 1 out of existence.
Fixes #28827