TST: Remove redundant font tests #30513

QuLogic · 2025-09-04T06:09:14Z

PR summary

I extracted this out of #30512 because it was causing issues with the pre-loading of test images. I may update this as/when I find more redundant tests.

test_backend_ps::test_type3_font is covered by test_backend_ps::test_multi_font_type3
test_text::test_pdf_chars_beyond_bmp is covered by test_backend_pdf::test_multi_font_type3 and test_backend_pdf::test_multi_font_type42
test_text::test_pdf_kerning is covered by test_backend_pdf::test_kerning
test_text::test_pdf_type42_kerning is covered by test_backend_pdf::test_kerning

PR checklist

[n/a] "closes #0000" is in the body of the PR description to link the related issue
new and changed code is tested
[n/a] Plotting related features are demonstrated in an example
[n/a] New Features and API Changes are noted with a directive and release note
[n/a] Documentation complies with general and docstring guidelines

With libraqm, string layout produces glyph indices, not character codes, and font features may even produce different glyphs for the same character code (e.g., by picking a different Stylistic Set). Thus we cannot rely on character codes as unique items within a font, and must move toward glyph indices everywhere.

Currently, we split text into single byte chunks and multi-byte glyphs, then iterate through single byte chunks for output and multi-byte glyphs for output. Instead, output the single byte chunks as we finish them, then do the multi-byte glyphs at the end.

For a Type 3 font, its encoding is entirely defined by its `Encoding` dictionary (which we create), so there's no reason to use a specific encoding like `cp1252`. Instead, switch to Latin-1, which corresponds exactly to the first 256 character codes in Unicode, and can be mapped directly with `ord`.

By tracking both character codes and glyph indices, we can handle producing multiple font subsets if needed by a file format.

For character codes outside the embedded font limits (256 for type 3 and 65536 for type 42), we output them as XObjects instead of using text commands. But there is nothing in the PDF spec that requires any specific encoding like this. Since we now support subsetting all fonts before embedding, split each font into groups based on the maximum character code (e.g., 256-entry groups for type 3), then switch text strings to a different font subset and re-map character codes to it when necessary. This means all text is true text (albeit with some strange encoding), and we no longer need any XObjects for glyphs. For users of non-English text, this means it will become selectable and copyable again. Fixes matplotlib#21797

For Type 3 fonts, add a `ToUnicode` mapping (which was added in PDF 1.2), and for Type 42 fonts, correct the Unicode encoding, which should be UTF-16BE, not UCS2.

These characters are outside the BMP and should test subset splitting for type 42 output in PDF.

- `test_backend_ps::test_type3_font` is covered by `test_backend_ps::test_multi_font_type3` - `test_text::test_pdf_chars_beyond_bmp` is covered by `test_backend_pdf::test_multi_font_type3` and `test_backend_pdf::test_multi_font_type42` - `test_text::test_pdf_kerning` is covered by `test_backend_pdf::test_kerning` - `test_text::test_pdf_type42_kerning` is covered by `test_backend_pdf::test_kerning`

QuLogic added 9 commits September 3, 2025 05:06

pdf/ps: Track full character map in CharacterTracker

dbd689f

By tracking both character codes and glyph indices, we can handle producing multiple font subsets if needed by a file format.

pdf: Correct Unicode mapping for out-of-range font chunks

ab8981f

For Type 3 fonts, add a `ToUnicode` mapping (which was added in PDF 1.2), and for Type 42 fonts, correct the Unicode encoding, which should be UTF-16BE, not UCS2.

Add emoji to multi-font text

72deb44

These characters are outside the BMP and should test subset splitting for type 42 output in PDF.

Update test images for previous change

3fc92f4

QuLogic added this to the v3.11.0 milestone Sep 4, 2025

QuLogic added the status: waiting for other PR label Sep 4, 2025

github-actions bot added topic: text backend: ps backend: pdf backend: svg backend: cairo topic: text/mathtext status: needs rebase labels Sep 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST: Remove redundant font tests #30513

TST: Remove redundant font tests #30513

Uh oh!

QuLogic commented Sep 4, 2025

Uh oh!

Uh oh!

Uh oh!

TST: Remove redundant font tests #30513

Are you sure you want to change the base?

TST: Remove redundant font tests #30513

Uh oh!

Conversation

QuLogic commented Sep 4, 2025

PR summary

PR checklist

Uh oh!

Uh oh!