Skip to content

Commit 61942b6

Browse files
committed
Fix html5lib#19: lone surrogates should not be replaced by U+FFFD.
These can only occur in the input stream from arbitrary unicode strings being passed into the parser (e.g., from script); no decoder will emit them nowadays. This fixes us to match the current spec.
1 parent 1c6c2c0 commit 61942b6

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

tokenizer/unicodeCharsProblematic.test

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,22 +2,22 @@
22
{"description": "Invalid Unicode character U+DFFF",
33
"doubleEscaped":true,
44
"input": "\\uDFFF",
5-
"output":["ParseError", ["Character", "\\uFFFD"]]},
5+
"output":["ParseError", ["Character", "\\uDFFF"]]},
66

77
{"description": "Invalid Unicode character U+D800",
88
"doubleEscaped":true,
99
"input": "\\uD800",
10-
"output":["ParseError", ["Character", "\\uFFFD"]]},
10+
"output":["ParseError", ["Character", "\\uD800"]]},
1111

1212
{"description": "Invalid Unicode character U+DFFF with valid preceding character",
1313
"doubleEscaped":true,
1414
"input": "a\\uDFFF",
15-
"output":[["Character", "a"], "ParseError", ["Character", "\\uFFFD"]]},
15+
"output":[["Character", "a"], "ParseError", ["Character", "\\uDFFF"]]},
1616

1717
{"description": "Invalid Unicode character U+D800 with valid following character",
1818
"doubleEscaped":true,
1919
"input": "\\uD800a",
20-
"output":["ParseError", ["Character", "\\uFFFDa"]]},
20+
"output":["ParseError", ["Character", "\\uD800a"]]},
2121

2222
{"description":"CR followed by U+0000",
2323
"input":"\r\u0000",

0 commit comments

Comments
 (0)