-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Faster character mapping #5299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster character mapping #5299
Conversation
from matplotlib.font_manager import findfont | ||
from matplotlib.ft2font import FT2Font, LOAD_FORCE_AUTOHINT, LOAD_NO_HINTING, \ | ||
from matplotlib.font_manager import findfont, get_font | ||
from matplotlib.ft2font import LOAD_FORCE_AUTOHINT, LOAD_NO_HINTING, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as you are touching this can you get rid of the \
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Did you mean to include the script M commits in this PR? |
Yes -- I meant to include the M script commit here. It turns out that these changes causes that bug to manifest itself in a different way. So to avoid that failure, I just included the fix here. Now that the fix is merged though, I can take it back out after rebasing. |
81ff794
to
4b2306d
Compare
c4af76a
to
4f11fb6
Compare
This should hopefully address the long-reported "Too many open files" error message (Fix matplotlib#3315). To reproduce: On a Mac or Windows box with starvation for file handles (Linux has a much higher file handle limit by default), build the docs, then immediately build again. This will trigger the caching bug. The font cache in the mathtext renderer was broken. It was caching a font file once for every *combination* of font properties, including things like size. Therefore, in a complex math expression containing many different sizes of the same font, the font file was opened once for each of those sizes. Font files are opened and kept open (rather than opened, read, and closed) so that FreeType only needs to load the actual glyphs that are used, rather than the entire font. In an era of cheap memory and fast disk, it probably doesn't matter for our current fonts, but once matplotlib#5214 is merged, we will have larger font files with many more glyphs and this loading time will matter more. The solution here is to do all font file loading in one place and to use `lru_cache` (available since Python 3.2) to do the caching, and to use only the file name and hinting parameters as a cache key. For earlier versions of Python, the functools32 backport package is required. (Or we can discuss whether we want to vendor it).
mathtext creates Python dictionaries for the charmap and inverse charmap for each font. This turns out to be unnecessary: 1) freetype has an API to do a charmap lookup that is faster than a Python dictionary 2) The inverse charmap isn't really necessary if we convert the latex_to_bakoma to use unicode character points rather than glyph indices. This should have a large impact when matplotlib#5241 is merged with larger fonts.
4f11fb6
to
2d56ffe
Compare
This is ready for a final review and merge now. |
I guess the big question here is whether to require or vendor functools32 (required only for Python 2.7). |
I vote for require. |
latex_to_bakoma = { | ||
r'\oint' : ('cmex10', 45), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume these are the same numbers, just written in octal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, actually.
Here's the gist of this change. TrueType fonts have the concept of charcodes
(which loosely corresponds to a Unicode codepoint if it's a Unicode font which most are these days) and glyph indices
(which is just an array index into the location of the glyph within a file and mostly arbitrary). Font files contain a very fast N-1 mapping from charcode
to gind
, and FreeType has an API for this. However, the reverse mapping is not directly available.
Prior to this change, Python dictionaries were created for the forward and reverse mapping, consuming a lot of memory for large fonts and being entirely redundant in the case of ccode
to gind
.
The latex_to_bakoma
mapping used to map from LaTeX name to gind
(for reasons that are lost in the sands of time, and are different from every other table in this file). Since there's now no gind
to ccode
mapping, there was no longer a way to get both gind
and ccode
from the LaTeX name. This table has now been changed to map LaTeX names to ccode
instead (and this was done automatically by a script, so I'm reasonably confident everything is correct).
Sorry for the long explanation. It's all bizarro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, your last commit message makes sense now
I support require too. |
ENH: Faster character mapping
ENH: Faster character mapping
backported to v2.0.x as 454c330 |
This is a follow-on to #5295.