Faster character mapping #5299

mdboom · 2015-10-22T00:02:28Z

This is a follow-on to #5295.

Don't cache the charmap and inverse charmap

mathtext creates Python dictionaries for the charmap and inverse charmap
for each font.  This turns out to be unnecessary:

1) freetype has an API to do a charmap lookup that is faster than a
Python dictionary

2) The inverse charmap isn't really necessary if we convert the
latex_to_bakoma to use unicode character points rather than glyph
indices.

This should have a large impact when #5241 is merged with larger fonts.

tacaswell · 2015-10-25T19:51:13Z

lib/matplotlib/backends/backend_agg.py

-from matplotlib.font_manager import findfont
-from matplotlib.ft2font import FT2Font, LOAD_FORCE_AUTOHINT, LOAD_NO_HINTING, \
+from matplotlib.font_manager import findfont, get_font
+from matplotlib.ft2font import LOAD_FORCE_AUTOHINT, LOAD_NO_HINTING, \


As long as you are touching this can you get rid of the \

tacaswell · 2015-10-26T02:33:18Z

Did you mean to include the script M commits in this PR?

mdboom · 2015-10-26T14:49:27Z

Yes -- I meant to include the M script commit here. It turns out that these changes causes that bug to manifest itself in a different way. So to avoid that failure, I just included the fix here. Now that the fix is merged though, I can take it back out after rebasing.

mdboom · 2015-10-27T11:29:36Z

~~Status update: This is ready to go once we turn off Python 2.6 (#5215) and include the file-handle fix (#5295)~~

This should hopefully address the long-reported "Too many open files" error message (Fix matplotlib#3315). To reproduce: On a Mac or Windows box with starvation for file handles (Linux has a much higher file handle limit by default), build the docs, then immediately build again. This will trigger the caching bug. The font cache in the mathtext renderer was broken. It was caching a font file once for every *combination* of font properties, including things like size. Therefore, in a complex math expression containing many different sizes of the same font, the font file was opened once for each of those sizes. Font files are opened and kept open (rather than opened, read, and closed) so that FreeType only needs to load the actual glyphs that are used, rather than the entire font. In an era of cheap memory and fast disk, it probably doesn't matter for our current fonts, but once matplotlib#5214 is merged, we will have larger font files with many more glyphs and this loading time will matter more. The solution here is to do all font file loading in one place and to use `lru_cache` (available since Python 3.2) to do the caching, and to use only the file name and hinting parameters as a cache key. For earlier versions of Python, the functools32 backport package is required. (Or we can discuss whether we want to vendor it).

mathtext creates Python dictionaries for the charmap and inverse charmap for each font. This turns out to be unnecessary: 1) freetype has an API to do a charmap lookup that is faster than a Python dictionary 2) The inverse charmap isn't really necessary if we convert the latex_to_bakoma to use unicode character points rather than glyph indices. This should have a large impact when matplotlib#5241 is merged with larger fonts.

mdboom · 2015-10-29T14:26:11Z

This is ready for a final review and merge now.

mdboom · 2015-10-30T12:02:13Z

I guess the big question here is whether to require or vendor functools32 (required only for Python 2.7).

tacaswell · 2015-10-30T12:40:41Z

I vote for require.

tacaswell · 2015-10-30T12:41:23Z

lib/matplotlib/_mathtext_data.py

 latex_to_bakoma = {
-    r'\oint'                     : ('cmex10',  45),


I assume these are the same numbers, just written in octal?

No, actually.

Here's the gist of this change. TrueType fonts have the concept of charcodes (which loosely corresponds to a Unicode codepoint if it's a Unicode font which most are these days) and glyph indices (which is just an array index into the location of the glyph within a file and mostly arbitrary). Font files contain a very fast N-1 mapping from charcode to gind, and FreeType has an API for this. However, the reverse mapping is not directly available.

Prior to this change, Python dictionaries were created for the forward and reverse mapping, consuming a lot of memory for large fonts and being entirely redundant in the case of ccode to gind.

The latex_to_bakoma mapping used to map from LaTeX name to gind (for reasons that are lost in the sands of time, and are different from every other table in this file). Since there's now no gind to ccode mapping, there was no longer a way to get both gind and ccode from the LaTeX name. This table has now been changed to map LaTeX names to ccode instead (and this was done automatically by a script, so I'm reasonably confident everything is correct).

Sorry for the long explanation. It's all bizarro.

Ah, your last commit message makes sense now

jenshnielsen · 2015-10-30T12:43:20Z

I support require too.

ENH: Faster character mapping

tacaswell · 2015-10-30T21:48:01Z

backported to v2.0.x as 454c330

mdboom added the status: needs review label Oct 22, 2015

mdboom added this to the next major release (2.0) milestone Oct 22, 2015

tacaswell reviewed Oct 25, 2015
View reviewed changes

mdboom force-pushed the faster-character-mapping branch 2 times, most recently from 81ff794 to 4b2306d Compare October 26, 2015 16:48

mdboom force-pushed the faster-character-mapping branch 2 times, most recently from c4af76a to 4f11fb6 Compare October 27, 2015 19:16

mdboom added 4 commits October 28, 2015 11:33

Add INSTALL note about functools32

082a3a5

functools32 has no version

5e93dfc

mdboom force-pushed the faster-character-mapping branch from 4f11fb6 to 2d56ffe Compare October 28, 2015 15:33

tacaswell reviewed Oct 30, 2015
View reviewed changes

tacaswell added a commit that referenced this pull request Oct 30, 2015

Merge pull request #5299 from mdboom/faster-character-mapping

aaa34e0

ENH: Faster character mapping

tacaswell merged commit aaa34e0 into matplotlib:master Oct 30, 2015

tacaswell removed the status: needs review label Oct 30, 2015

tacaswell added a commit that referenced this pull request Oct 30, 2015

Merge pull request #5299 from mdboom/faster-character-mapping

454c330

ENH: Faster character mapping

jenshnielsen mentioned this pull request Oct 31, 2015

Remove various Python 2.6 related workarounds #5373

Merged

This was referenced Nov 5, 2015

Remove uses of font.get_charmap #5410

Merged

Use DejaVu fonts as default for text and mathtext #5214

Merged

mdboom deleted the faster-character-mapping branch November 10, 2015 02:46

tacaswell mentioned this pull request Oct 14, 2019

Change in OSX Catalina makes matplotlib + multiprocessing crash #15410

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Faster character mapping #5299

Faster character mapping #5299

Uh oh!

mdboom commented Oct 22, 2015

Uh oh!

tacaswell Oct 25, 2015

Uh oh!

mdboom Oct 26, 2015

Uh oh!

tacaswell commented Oct 26, 2015

Uh oh!

mdboom commented Oct 26, 2015

Uh oh!

mdboom commented Oct 27, 2015

Uh oh!

mdboom commented Oct 29, 2015

Uh oh!

mdboom commented Oct 30, 2015

Uh oh!

tacaswell commented Oct 30, 2015

Uh oh!

tacaswell Oct 30, 2015

Uh oh!

mdboom Oct 30, 2015

Uh oh!

tacaswell Oct 30, 2015

Uh oh!

jenshnielsen commented Oct 30, 2015

Uh oh!

tacaswell commented Oct 30, 2015

Uh oh!

Uh oh!

Uh oh!

Faster character mapping #5299

Faster character mapping #5299

Uh oh!

Conversation

mdboom commented Oct 22, 2015

Uh oh!

tacaswell Oct 25, 2015

Choose a reason for hiding this comment

Uh oh!

mdboom Oct 26, 2015

Choose a reason for hiding this comment

Uh oh!

tacaswell commented Oct 26, 2015

Uh oh!

mdboom commented Oct 26, 2015

Uh oh!

mdboom commented Oct 27, 2015

Uh oh!

mdboom commented Oct 29, 2015

Uh oh!

mdboom commented Oct 30, 2015

Uh oh!

tacaswell commented Oct 30, 2015

Uh oh!

tacaswell Oct 30, 2015

Choose a reason for hiding this comment

Uh oh!

mdboom Oct 30, 2015

Choose a reason for hiding this comment

Uh oh!

tacaswell Oct 30, 2015

Choose a reason for hiding this comment

Uh oh!

jenshnielsen commented Oct 30, 2015

Uh oh!

tacaswell commented Oct 30, 2015

Uh oh!

Uh oh!