Skip to content

Type 42 chars beyond BMP not displayed in PDF #20616

Closed
@sauerburger

Description

@sauerburger

Bug report

Bug summary

Characters from beyond the Basic Multilingual Plane (BMP) in Unicode are not displayed in PDFs with Type 42 fonts. Characters beyond the BMP have a code point greater than 65535 and cannot be encoded in a fixed-size 2-byte encoding, such as the obsolete UCS-2. My understanding is that the CID maps still use such an encoding and cannot handle code points beyond that.

In the case of the m in STIX Sans, it's implemented via a virtual font embedded in the base font with chars shifted to higher code points, here code point 120366.

Code for reproduction

import matplotlib.pyplot as plt
from matplotlib import rcParams

rcParams["pdf.fonttype"] = 42
rcParams["mathtext.fontset"] = "stixsans"

plt.text(0.5, 0.5, "Mass $m$ \U00010308")
plt.savefig("beyond_bmp.pdf")

Actual outcome

beyond_bmp_pdf
beyond_bmp.pdf

Expected outcome
beyond_bmp

Possible solutions
We could take a similar approach as we did for Type 3 fonts. There any char>255 is embedded via an XObject and not via the font directly. So, here the solution would be to use XObjects if the char>65535 in text and math mode.

I have a local, modified version of matplotlib based on #20615 which implements this approach. It extends the use of _font_supports_char and restricts the supported range for Type 42 to <=65535. I can polish this local fix into a PR if people what to go this route.

Matplotlib version

  • Operating system: Debian 11
  • Matplotlib version (import matplotlib; print(matplotlib.__version__)): 3.4.2
  • Matplotlib backend (print(matplotlib.get_backend())): TkAgg (but I assume pdf)
  • Python version: 3.9.2
  • Matplotlib installed with pip in a virtual env

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions