Description
Bug report
Bug summary
Characters from beyond the Basic Multilingual Plane (BMP) in Unicode are not displayed in PDFs with Type 42 fonts. Characters beyond the BMP have a code point greater than 65535 and cannot be encoded in a fixed-size 2-byte encoding, such as the obsolete UCS-2. My understanding is that the CID maps still use such an encoding and cannot handle code points beyond that.
In the case of the m
in STIX Sans, it's implemented via a virtual font embedded in the base font with chars shifted to higher code points, here code point 120366.
Code for reproduction
import matplotlib.pyplot as plt
from matplotlib import rcParams
rcParams["pdf.fonttype"] = 42
rcParams["mathtext.fontset"] = "stixsans"
plt.text(0.5, 0.5, "Mass $m$ \U00010308")
plt.savefig("beyond_bmp.pdf")
Actual outcome
Possible solutions
We could take a similar approach as we did for Type 3 fonts. There any char>255 is embedded via an XObject
and not via the font directly. So, here the solution would be to use XObjects
if the char>65535 in text and math mode.
I have a local, modified version of matplotlib based on #20615 which implements this approach. It extends the use of _font_supports_char
and restricts the supported range for Type 42 to <=65535. I can polish this local fix into a PR if people what to go this route.
Matplotlib version
- Operating system: Debian 11
- Matplotlib version (
import matplotlib; print(matplotlib.__version__)
): 3.4.2 - Matplotlib backend (
print(matplotlib.get_backend())
): TkAgg (but I assume pdf) - Python version: 3.9.2
- Matplotlib installed with pip in a virtual env