Skip to content

Commit 6a6b74a

Browse files
author
Leif Arne Storset
committed
Replace invalid characters with U+FFFD (fixes html5lib#96)
1 parent f5fd711 commit 6a6b74a

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

html5lib/inputstream.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -270,6 +270,7 @@ def readChunk(self, chunkSize=None):
270270
# Replace invalid characters
271271
# Note U+0000 is dealt with in the tokenizer
272272
data = self.replaceCharactersRegexp.sub("\ufffd", data)
273+
data = invalid_unicode_re.sub("\ufffd", data)
273274

274275
data = data.replace("\r\n", "\n")
275276
data = data.replace("\r", "\n")

0 commit comments

Comments
 (0)