Skip to content

FIX: always decode byte strings from AFM files as utf8 #9198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 24, 2017

Conversation

tacaswell
Copy link
Member

@tacaswell tacaswell commented Sep 18, 2017

closes #9196

This fixes a (long standing) 'can not import' bug.

PR Summary

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant

@tacaswell tacaswell added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Sep 18, 2017
@tacaswell tacaswell added this to the 2.1 (next point release) milestone Sep 18, 2017
Copy link
Contributor

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anntzer
Copy link
Contributor

anntzer commented Sep 18, 2017

Actually, can someone point to an actual example of a file that fails? http://www.adobe.com/content/dam/Adobe/en/devnet/font/pdfs/5004.AFM_Spec.pdf suggests that everything should be ASCII compatible (page 10). I'm just curious whether it's some afm file that doesn't follow the spec, or if I'm misreading the spec...

@tacaswell
Copy link
Member Author

The OP in #9196 has such a font. From digging into this for the headers it tends to be comments and names that are non-ascii. Not sure what could be in the metrics that would be non-ascii though...

@tacaswell
Copy link
Member Author

It may also be the comment lines which are not explicitly constrained to ascii.

@anntzer
Copy link
Contributor

anntzer commented Sep 18, 2017

Would

diff --git a/lib/matplotlib/afm.py b/lib/matplotlib/afm.py
index 30d335879..3a06c4618 100644
--- a/lib/matplotlib/afm.py
+++ b/lib/matplotlib/afm.py
@@ -189,23 +189,22 @@ def _parse_char_metrics(fh):
     ascii_d = {}
     name_d = {}
     for line in fh:
-        line = line.rstrip().decode('ascii')  # Convert from byte-literal
-        if line.startswith('EndCharMetrics'):
+        line = line.rstrip()
+        if line.startswith(b'EndCharMetrics'):
             return ascii_d, name_d
         # Split the metric line into a dictionary, keyed by metric identifiers
-        vals = dict(s.strip().split(' ', 1) for s in line.split(';') if s)
+        vals = dict(s.strip().split(b' ', 1) for s in line.split(b';') if s)
         # There may be other metrics present, but only these are needed
-        if not {'C', 'WX', 'N', 'B'}.issubset(vals):
+        if not {b'C', b'WX', b'N', b'B'}.issubset(vals):
             raise RuntimeError('Bad char metrics line: %s' % line)
-        num = _to_int(vals['C'])
-        wx = _to_float(vals['WX'])
-        name = vals['N']
-        bbox = _to_list_of_floats(vals['B'])
-        bbox = list(map(int, bbox))
+        num = _to_int(vals[b'C'])
+        wx = _to_float(vals[b'WX'])
+        name = vals[b'N'].decode('ascii')
+        bbox = _to_list_of_ints(vals[b'B'])
         # Workaround: If the character name is 'Euro', give it the
         # corresponding character code, according to WinAnsiEncoding (see PDF
         # Reference).
-        if name == 'Euro':
+        if name == b'Euro':
             num = 128
         if num != -1:
             ascii_d[num] = (wx, name, bbox)

(i.e. use bytes throughout, just decode at the very end) not be better?

@tacaswell
Copy link
Member Author

Until we know what the offending font looks like, I am inclined to be more defensive.

In either case, we should not fail to import on a non-compliant font.

@anntzer
Copy link
Contributor

anntzer commented Sep 19, 2017

Please add a comment to the code explaining why we are seemingly not following the spec, then. Otherwise lgtm.

@tacaswell
Copy link
Member Author

@anntzer done and force-pushed

@tacaswell
Copy link
Member Author

Self-merging this as it has 2 approvals.

@tacaswell tacaswell merged commit cece2d4 into matplotlib:v2.1.x Sep 24, 2017
@tacaswell tacaswell deleted the fix_afm_unicode branch September 24, 2017 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants