Skip to content

Commit 2619bda

Browse files
committed
Our handling of numeric entities, converting them to characters, is overly complex and the comment is misleading.
1 parent b39e8a0 commit 2619bda

File tree

1 file changed

+4
-10
lines changed

1 file changed

+4
-10
lines changed

src/html5lib/tokenizer.py

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -128,17 +128,11 @@ def consumeNumberEntity(self, isHex):
128128
"illegal-codepoint-for-numeric-entity",
129129
"datavars": {"charAsInt": charAsInt}})
130130
try:
131-
# XXX We should have a separate function that does "int" to
132-
# "unicodestring" conversion since this doesn't always work
133-
# according to hsivonen. Also, unichr has a limitation of 65535
131+
# Try/except needed as UCS-2 Python builds' unichar only works
132+
# within the BMP.
134133
char = unichr(charAsInt)
135-
except:
136-
try:
137-
char = eval("u'\\U%08x'" % charAsInt)
138-
except:
139-
self.tokenQueue.append({"type": tokenTypes["ParseError"], "data":
140-
"cant-convert-numeric-entity",
141-
"datavars": {"charAsInt": charAsInt}})
134+
except ValueError:
135+
char = eval("u'\\U%08x'" % charAsInt)
142136

143137
# Discard the ; if present. Otherwise, put it back on the queue and
144138
# invoke parseError on parser.

0 commit comments

Comments
 (0)