Improve the cache when getting font metrics #28831

tacaswell · 2024-09-17T13:34:24Z

~~Please look at each of the first commit (option 1) and then the combination of both commits (net to option 2). Once we pick which one we like better I'll drop the other commit and fix / add tests.~~

We settled on option 2 in the discussion so I squashed option 1 out of existence.

Fixes #28827

timhoffm · 2024-09-17T13:55:58Z

Please look at each of the commits separately.

This tripped me up. "Please look separately at (i) commit 1, (ii) commit 1+2". is the way to go.

timhoffm

Trying to sum up the differences:

API: option 1 takes a weak ref, option 2 takes the renderer itself. - API-wise 2 is slightly more comfortable. The user does not have to care for weak refs. Also, the weakref-handling is better contained.
Behavior: The renderer cache in option 2 is unbound. Can that be an issue for long-running processes?
Both compared to the original implementation: We create separate caches per renderer. Before we had one large cache for renderer+settings. I assume this does not make a relevant difference

timhoffm · 2024-09-17T14:08:48Z

lib/matplotlib/text.py

+        @functools.lru_cache(4096)
+        def _text_metrics(text, fontprop, ismath, dpi):
+            # dpi is unused, but participates in cache invalidation (via the renderer).
+            return renderer_ref().get_text_width_height_descent(text, fontprop, ismath)


In option 2: don't we have to raise a RuntimeError (as in option 1) if the weakref does not resolve (renderer_ref() is None)?

I thought about that, but if we are being called then we have (on behalf of the user) recently called _get_text_metrics_with_cache with a hard-ref so we are sure it is alive. That said, I'll put it back in.

This helper should probably be made __ private as well?

tacaswell · 2024-09-17T14:27:17Z

This tripped me up.

🤦🏻 yeah, that makes sense, the second diff in going to be weird. I thought putting them in the same PR would be simpler than two PRs and easier to deal with diffs in the comments of the issue.

Currently the only place we call the _impl function is

matplotlib/lib/matplotlib/text.py

Lines 65 to 70 in 64d45cb

    
           def _get_text_metrics_with_cache(renderer, text, fontprop, ismath, dpi): 
        
               """Call ``renderer.get_text_width_height_descent``, caching the results.""" 
        
               # Cached based on a copy of fontprop so that later in-place mutations of 
        
               # the passed-in argument do not mess up the cache. 
        
               return _get_text_metrics_with_cache_impl( 
        
                   weakref.ref(renderer), text, fontprop.copy(), ismath, dpi)

and the only place we call _get_text_metrics_with_cache is

matplotlib/lib/matplotlib/text.py

Lines 372 to 384 in 64d45cb

    
           # Full vertical extent of font, including ascenders and descenders: 
        
           _, lp_h, lp_d = _get_text_metrics_with_cache( 
        
               renderer, "lp", self._fontproperties, 
        
               ismath="TeX" if self.get_usetex() else False, 
        
               dpi=self.get_figure(root=True).dpi) 
        
           min_dy = (lp_h - lp_d) * self._linespacing 
        
           for i, line in enumerate(lines): 
        
               clean_line, ismath = self._preprocess_math(line) 
        
               if clean_line: 
        
                   w, h, d = _get_text_metrics_with_cache( 
        
                       renderer, clean_line, self._fontproperties, 
        
                       ismath=ismath, dpi=self.get_figure(root=True).dpi)

so I'm not too worried about API considerations.

option 2 is unbounded, but it is tied to the lifetime of the renderers it should not grow past the number of renderers the user implicitly keeps alive by the number of Figures they keep alive.

In option 1 we keep some number of renderers alive (25) which is either going to cause incorrect misses for someone who has more than 25 figures alive (good idea or not, someone might really want this) and keep the cache too long in cases where the user is churning through figures one at a time.

I think the pro of option 1 is that it is "simpler" by using layered LRU cache, but option 2 is more complex but technically better.

In either case the two-tiered cache is the main "fix".

timhoffm · 2024-09-17T15:03:55Z

option 2 is unbounded, but it is tied to the lifetime of the renderers it should not grow past the number of renderers the user implicitly keeps alive by the number of Figures they keep alive.

That's clever. 👍 I hadn't realized that. In this case I'm for option 2 plus a good comment why we do this and how it works 😃.

anntzer · 2024-09-17T15:39:30Z

lib/matplotlib/text.py

+
+
+# use mutable default to carry hidden global state
+def _get_text_metrics_function(inp_renderer, _cache=weakref.WeakKeyDictionary()):


What does inp stand for?

I believe we could afford the two additional letters 😁

anntzer · 2024-09-17T15:40:27Z

lib/matplotlib/text.py

+# use mutable default to carry hidden global state
+def _get_text_metrics_function(inp_renderer, _cache=weakref.WeakKeyDictionary()):
+    if (_text_metrics := _cache.get(inp_renderer, None)) is None:
+        renderer_ref = weakref.ref(inp_renderer)


Do you actually need to explicitly create a weakref here? Doesn't using a WeakKeyDictionary basically do this for you for free already (i.e. aren't you having two layers of weakref'ing here?)? (not sure...)

yes, if we do not the closure holds a hard reference mediated in the value side of the weakkey dict and they become immortal.

Ah yes, I missed the fact that you are caching a closure.

Perhaps the closure can be made more explicit, something like (untested)

cache = WeakKeyDictionary() # Can be hidden as an attribute or a private parameter... def _get_text_metrics_with_cache(renderer, text, fp, ismath, dpi): if renderer not in cache: cache[renderer] = functools.lru_cache(4096)( functools.partial(_weak_text_metrics, weakref.ref(renderer))) return cache[renderer](text, fp.copy(), ismath, dpi) def _weak_text_metrics(renderer_ref, text, fp, ismath, dpi): return renderer_ref().get_text_width_height_descent(text, fp, ismath)

I added a some comments. I think that the closure is marginally clearer than the use of partial.

anntzer · 2024-09-17T15:44:35Z

lib/matplotlib/text.py

+
+
+# use mutable default to carry hidden global state
+def _get_text_metrics_function(inp_renderer, _cache=weakref.WeakKeyDictionary()):


Hiding the cache in a hidden mutable kwarg seems a bit weird to me (but sure, why not); I would rather have defined it as a custom attribute on the function (_get_text_metrics_function._cache = WeakKeyDictionary()). Just a stylistic choice, though.

When put into a keyword, it's easier to access it inside the function. - But needs documentation 😄

QuLogic · 2024-09-18T00:12:08Z

lib/matplotlib/text.py

+        @functools.lru_cache(4096)
+        def _text_metrics(text, fontprop, ismath, dpi):
+            # dpi is unused, but participates in cache invalidation (via the renderer).
+            if (lcl_renderer := renderer_ref()) is None:


And I assume lcl here is local as well? Probably can fit the two letters here too...

timhoffm · 2024-09-18T14:19:48Z

lib/matplotlib/text.py

+    the renderer is garbage collected.
+
+    The inner layer is provided by an lru_cache with a large maximum size (such
+    that we do not expect it to be hit in very few actual use cases).  As the


wording was a bit confusing.

Suggested change

that we do not expect it to be hit in very few actual use cases). As the

that we expect very few cache misses in actual use cases). As the

timhoffm

Very nice documentation 👍

anntzer · 2024-09-18T17:02:48Z

lib/matplotlib/text.py

+    # We pass a copy of the fontprop because FontProperties is both mutable and
+    # has a `__hash__` that depends on that mutable state.  This is not ideal
+    # as it means the hash of an object is not stable over time which leads to
+    # very confusing behavior when used as keys in dictionaries or hashes.


btw, see also #22495 (immutable FontProperties).

Ah, I'm glad there is an issue for that.

QuLogic · 2024-09-19T19:00:22Z

lib/matplotlib/tests/test_text.py

@@ -919,6 +919,7 @@ def call(*args, **kwargs):


 def test_metrics_cache2():
+    plt.close('all')


All tests should start with no figure open due to the test fixture, so this should be unnecessary?

This was a long shot that something was getting not cleared because the assert that is failing is before we do anything in this test.

tacaswell added this to the v3.10.0 milestone Sep 17, 2024

github-actions bot added the topic: text label Sep 17, 2024

timhoffm reviewed Sep 17, 2024

View reviewed changes

tacaswell force-pushed the fix/better_fm_cache branch from e93968c to b2f73a6 Compare September 17, 2024 14:32

anntzer reviewed Sep 17, 2024

View reviewed changes

QuLogic reviewed Sep 18, 2024

View reviewed changes

MNT: improve how we manage the cache for font metrics

a0e0809

tacaswell force-pushed the fix/better_fm_cache branch from db121ec to a0e0809 Compare September 18, 2024 14:07

tacaswell changed the title ~~Two proposals to improve the cache when getting font metrics~~ Improve the cache when getting font metrics Sep 18, 2024

tacaswell marked this pull request as ready for review September 18, 2024 14:09

timhoffm reviewed Sep 18, 2024

View reviewed changes

timhoffm approved these changes Sep 18, 2024

View reviewed changes

anntzer reviewed Sep 18, 2024

View reviewed changes

TST: make sure all figures are closed

61f4ef3

QuLogic reviewed Sep 19, 2024

View reviewed changes

tacaswell modified the milestones: v3.10.0, v3.11.0 Oct 2, 2024



		# use mutable default to carry hidden global state
		def _get_text_metrics_function(inp_renderer, _cache=weakref.WeakKeyDictionary()):

	that we do not expect it to be hit in very few actual use cases). As the
	that we expect very few cache misses in actual use cases). As the

		@@ -919,6 +919,7 @@ def call(args, *kwargs):


		def test_metrics_cache2():
		plt.close('all')

Uh oh!

Improve the cache when getting font metrics #28831

Are you sure you want to change the base?

Improve the cache when getting font metrics #28831

Uh oh!

Conversation

tacaswell commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timhoffm commented Sep 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timhoffm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tacaswell commented Sep 17, 2024

Uh oh!

timhoffm commented Sep 17, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timhoffm Sep 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timhoffm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tacaswell commented Sep 17, 2024 •

edited

Loading

timhoffm commented Sep 17, 2024 •

edited

Loading

timhoffm Sep 18, 2024 •

edited

Loading