Skip to content

Rework/fix Text layout cache. #22271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 25, 2022
Merged

Rework/fix Text layout cache. #22271

merged 1 commit into from
Jan 25, 2022

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Jan 20, 2022

Instead of caching the text layout based on a bunch of properties, only
cache the computation of the text's metrics, which 1) should be the most
expensive part (everything else in _get_layout is relatively simple
iteration and arithmetic) and 2) depends on fewer implicit parameters.

In fact, the old cache key was insufficient in that it would conflate
usetex and non-usetex strings together, even though they have different
metrics; e.g. with the (extremely artificial) example

figtext(.1, .5, "foo\nbar", size=32)  # (0)
figtext(.1, .5, "foo\nbar", usetex=True, size=32, c="r", alpha=.5)  # (1)
figtext(.3, .5, "foo\nbar", usetex=True, size=32, c="r", alpha=.5)  # (2)

the linespacing of the first usetex string (1) would be "wrong": it is
bigger that the one of the second usetex string (2), because it instead
reuses the layout computed for the non-usetex string (0).

The motivation is also to in the future let the renderer have better
control on cache invalidation (with a yet-to-be-added renderer method),
e.g. multiple instances of the same renderer cache could share the same
layout info.

old:
old
new:
new
(compare the position of the left "foo")

(This is also orthogonal to #21958, which fixes another aspect of the text cache.)

PR Summary

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

@tacaswell
Copy link
Member

👍 in principle

@anntzer
Copy link
Contributor Author

anntzer commented Jan 20, 2022

@greglucas Actually, switching to lru_cache would be a bit annoying, because the caching on fontproperties must be based on hash(fontproperties) (as fontproperties themselves are mutable) but then you can't get back the fontproperties from the hash...
(Again, it may be possible to add more indirection around that but it may not really be worth it for now...)

Copy link
Contributor

@greglucas greglucas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Might be nice to add your artificial example in for a quick test... Something as simple as asserting that the w, h, d returned are not the same between the first and second calls due to a proper cache miss now?

I agree, removing the maxdict cache should not hold this up. That was an orthogonal comment/question.

@anntzer
Copy link
Contributor Author

anntzer commented Jan 21, 2022

Added test.

@tacaswell Actually, wrt. just using the renderer class get_text_cache_invalidation_token, I guess that mplcairo may actually return consistent metrics(?) for different output formats (TBC), but I realized that another issue is that matplotlib's own vector renderers are not used directly, but always get wrapped in a MixedModeRenderer, so plainly checking the type is not going to distinguish e.g. a pdf and a svg renderer.

@dstansby
Copy link
Member

dstansby commented Jan 21, 2022

@greglucas Actually, switching to lru_cache would be a bit annoying, because the caching on fontproperties must be based on hash(fontproperties) (as fontproperties themselves are mutable) but then you can't get back the fontproperties from the hash... (Again, it may be possible to add more indirection around that but it may not really be worth it for now...)

A while ago I wrote some code to cache a property based on whether an attribute had changed or not:
https://github.com/sunpy/sunpy/blob/bf3a54f6efd104faad4f4bb7347ba542cb2f16a2/sunpy/util/decorators.py#L324
perhaps there is a use for something like that here?

@anntzer
Copy link
Contributor Author

anntzer commented Jan 21, 2022

There's a few different ways to skin this cat (thanks for linking your implementation), but we can probably have that discussion in #22278?

Instead of caching the text layout based on a bunch of properties, only
cache the computation of the text's metrics, which 1) should be the most
expensive part (everything else in _get_layout is relatively simple
iteration and arithmetic) and 2) depends on fewer implicit parameters.

In fact, the old cache key was insufficient in that it would conflate
usetex and non-usetex strings together, even though they have different
metrics; e.g. with the (extremely artificial) example
```python
figtext(.1, .5, "foo\nbar", size=32)  # (0)
figtext(.1, .5, "foo\nbar", usetex=True, size=32, c="r", alpha=.5)  # (1)
figtext(.3, .5, "foo\nbar", usetex=True, size=32, c="r", alpha=.5)  # (2)
```
the linespacing of the first usetex string (1) would be "wrong": it is
bigger that the one of the second usetex string (2), because it instead
reuses the layout computed for the non-usetex string (0).

The motivation is also to in the future let the renderer have better
control on cache invalidation (with a yet-to-be-added renderer method),
e.g. multiple instances of the same renderer cache could share the same
layout info.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants