Skip to content

Support TTC fonts by converting them to TTF in font cache #23293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

sauerburger
Copy link
Member

@sauerburger sauerburger commented Jun 17, 2022

PR Summary

This PR adds support for all ttf fonts bundeled in ttc files. In FontManager.addfont(), if a font ends in .ttc, the new functions _read_ttc() and _dump_ttf() are triggered to extract all TTF fonts from the collection. The extracted fonts are stored in the font cache directory with the .ttf extension and added to the list of ttf fonts.

The details of the implantation can be changed (for example where to put the parsing code). Do you think the overall strategy of parsing TTC, dumping TTF to the font cache is a good idea?

Why is it useful

Caveats

  • If users upgrade to a version that includes this change, it will only become effective once the font cache is rebuilt.

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • [N/A] API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • [N/A] Documentation is sphinx and numpydoc compliant (the docs should build without error).

@oscargus
Copy link
Member

Nice one! I cannot really review the content, but do you know how cff-parts of the ttc behaves, e.g. #20870?

(Also, out of curiosity but maybe also usefulness, it may be nice to have some info-level logging of the conversion process.)

@oscargus
Copy link
Member

Do you think the overall strategy of parsing TTC, dumping TTF to the font cache is a good idea?

I guess that directly handling TTC is "better", but if the option is to not be able to handle TTC, this is for sure a very worthwhile approach!

I think it can be worth a user release note as users now can do things that they couldn't do earlier (and may not realize that otherwise).

@anntzer
Copy link
Contributor

anntzer commented Jun 17, 2022

Haven't checked the technical details of the approach, but if that can be made to work this may be one of the easier ways forward (we can always remove that later once "real" ttc support comes in, similarly to @oscargus' point above).

@sauerburger
Copy link
Member Author

I cannot really review the content, but do you know how cff-parts of the ttc behaves, e.g. #20870?

As far as I can tell, the cff-tables are correctly extracted. The created TTF file, e.g. for Apple SD Gothic Neo, is valid in the font preview and font forge without warnings. However, the example from #20870 still doesn't work, probably failing in the pdf/ps backend.

it may be nice to have some info-level logging of the conversion process.

Good point, I will add that.

@sauerburger sauerburger force-pushed the ttc-font-support branch 3 times, most recently from 51cd34c to 8dc3d5a Compare June 21, 2022 15:29
Fonts in a TrueType collection file (TTC) can now be added and used. Internally,
the embedded TTF fonts are extracted and stored in the matplotlib cache
directory. Users upgrading to this version need to rebuild the font cache for
this feature to become effective.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe write how that is done? (I do not really recall myself...)

(Also one may consider having a public method that is directly importable from matplotlib, but that is another question...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean rebuilding the font cache? I think the simplest way to trigger a rebuild is to delete the folder printed by

python -c "import matplotlib; print(matplotlib.get_cachedir())"

Maybe there is a more user-friendly, programmatic way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the question about the public API, I'm not sure if we want to expose this. It's not directly useful for the end user. If someone wants to use a custom TTC font, they would call the font manager's addfont() with the path to the TTC file. This will automatically extract the TTF and add the created files to the internal list of fonts.

If a user were to use the _split_ttc() function, they would need to use the returned paths and add them to the font manager themself. I don't think that's what we want users to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, rebuilding the font cache. I seem to recall that there is a function to rebuild the cache without removing it, but cannot find it now... Googling gives matplotlib.font_manager._rebuild() (and quite a few questions), but that is not a current approach.

Anyway, some sort of info about how to do this (maybe it is in the docs to link to) I would think is useful since probably quite a few users like to do that.

Fix typos

Co-authored-by: Oscar Gustafsson <oscar.gustafsson@gmail.com>
@tacaswell
Copy link
Member

I was just looking at how to write this functionality! This is very brute force, we do currently support ttc, but only pick out the first font.

If users upgrade to a version that includes this change, it will only become effective once the font cache is rebuilt.

we have a version number on the font cache for exactly this reason (well in case we change the format, but close enough).

@tacaswell
Copy link
Member

tacaswell@b143647 is a stack of changes that will find all of the fonts in a ttc and passes all but 1 test (which I think identifies that there is a bigger set of bugs in the way fonts + mathtext are being used....they are just not raising). This is on top of #20740 so won't cleanly cherry-pick to main. As @anntzer predicted, the hard part was not the c++ work, but in threading through the information that we have to identify fonts by (path, index) rather than just path. I took the "just break things" approach, not sure if we want to make a new internal API that we use everywere and leave the public API only half right, just accept the on-off breaking change, do terrible things and subsclass str (i'm only mostly joking), or make a new set of APIs with new names and deprecate the old ones (that is probably the best option).

The code is very ambiguous about treating FontEntry, FontProperty, str (that is a family name), str (that is a fontspec), str (that is the absolute path to a file) as "font" in the codebase so doing some refactoring to clarify that would also be good.


On a bit more consideration, adding public API to re-generate the font cache and install it in the correct global state would be good, however I am not sure that it is trivial to do reliably because we make a FontManager singleton and then expose its methods as module attributes.

We could replace those with free functions that then call the right methods, but I'm sure at least one of our users is trying to walk back up to the instance from the method, but that is probably an acceptable thing to break with no warning.


@sauerburger Sorry for high-jacking you PR with only marginally related side-tracks.

@anntzer
Copy link
Contributor

anntzer commented Jul 7, 2022

FWIW in mplcairo I specifically recognize when you pass a path that ends with #0/#1/... (hash-and-a-number) and reparse these as font indices into a ttc file (https://github.com/matplotlib/mplcairo#font-formats-and-features). Probably not the greatest API, but one that I came up with to support this in a backend without changing anything to core. Note that I likewise do something for selecting OpenType features (also documented at the link above); it's probably worth taking that use case into account if we're going to define a new cleaner API.

@tacaswell
Copy link
Member

FWIW in mplcairo I specifically recognize when you pass a path that ends with #0/#1/... (hash-and-a-number)

I thought of doing fame:N as the pattern and decided to go with the tuples instead.

@jklymak
Copy link
Member

jklymak commented Jan 26, 2023

@sauerburger is this waiting on us, or on you? I've moved to draft, but feel free to move back if we should review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants