-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
FIX: always decode byte strings from AFM files as utf8 #9198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Actually, can someone point to an actual example of a file that fails? http://www.adobe.com/content/dam/Adobe/en/devnet/font/pdfs/5004.AFM_Spec.pdf suggests that everything should be ASCII compatible (page 10). I'm just curious whether it's some afm file that doesn't follow the spec, or if I'm misreading the spec... |
The OP in #9196 has such a font. From digging into this for the headers it tends to be comments and names that are non-ascii. Not sure what could be in the metrics that would be non-ascii though... |
It may also be the comment lines which are not explicitly constrained to ascii. |
Would diff --git a/lib/matplotlib/afm.py b/lib/matplotlib/afm.py
index 30d335879..3a06c4618 100644
--- a/lib/matplotlib/afm.py
+++ b/lib/matplotlib/afm.py
@@ -189,23 +189,22 @@ def _parse_char_metrics(fh):
ascii_d = {}
name_d = {}
for line in fh:
- line = line.rstrip().decode('ascii') # Convert from byte-literal
- if line.startswith('EndCharMetrics'):
+ line = line.rstrip()
+ if line.startswith(b'EndCharMetrics'):
return ascii_d, name_d
# Split the metric line into a dictionary, keyed by metric identifiers
- vals = dict(s.strip().split(' ', 1) for s in line.split(';') if s)
+ vals = dict(s.strip().split(b' ', 1) for s in line.split(b';') if s)
# There may be other metrics present, but only these are needed
- if not {'C', 'WX', 'N', 'B'}.issubset(vals):
+ if not {b'C', b'WX', b'N', b'B'}.issubset(vals):
raise RuntimeError('Bad char metrics line: %s' % line)
- num = _to_int(vals['C'])
- wx = _to_float(vals['WX'])
- name = vals['N']
- bbox = _to_list_of_floats(vals['B'])
- bbox = list(map(int, bbox))
+ num = _to_int(vals[b'C'])
+ wx = _to_float(vals[b'WX'])
+ name = vals[b'N'].decode('ascii')
+ bbox = _to_list_of_ints(vals[b'B'])
# Workaround: If the character name is 'Euro', give it the
# corresponding character code, according to WinAnsiEncoding (see PDF
# Reference).
- if name == 'Euro':
+ if name == b'Euro':
num = 128
if num != -1:
ascii_d[num] = (wx, name, bbox) (i.e. use bytes throughout, just decode at the very end) not be better? |
Until we know what the offending font looks like, I am inclined to be more defensive. In either case, we should not fail to import on a non-compliant font. |
Please add a comment to the code explaining why we are seemingly not following the spec, then. Otherwise lgtm. |
@anntzer done and force-pushed |
7758b08
to
e3e78be
Compare
Self-merging this as it has 2 approvals. |
closes #9196
This fixes a (long standing) 'can not import' bug.
PR Summary
PR Checklist