Skip to content

Link in code comment no longer relevant for HTML unescaping #100210

Closed
@jcamiel

Description

@jcamiel

Documentation

The link in

# see http://www.w3.org/TR/html5/syntax.html#tokenizing-character-references
is not longer relevant and should be replace:

The link should explain the source of the replacements table:

# see http://www.w3.org/TR/html5/syntax.html#tokenizing-character-references

_invalid_charrefs = {
    0x00: '\ufffd',  # REPLACEMENT CHARACTER
    0x0d: '\r',      # CARRIAGE RETURN
    0x80: '\u20ac',  # EURO SIGN
    0x81: '\x81',    # <control>
    0x82: '\u201a',  # SINGLE LOW-9 QUOTATION MARK
    0x83: '\u0192',  # LATIN SMALL LETTER F WITH HOOK
    0x84: '\u201e',  # DOUBLE LOW-9 QUOTATION MARK
    0x85: '\u2026',  # HORIZONTAL ELLIPSIS
    0x86: '\u2020',  # DAGGER
    0x87: '\u2021',  # DOUBLE DAGGER
    0x88: '\u02c6',  # MODIFIER LETTER CIRCUMFLEX ACCENT
    0x89: '\u2030',  # PER MILLE SIGN
    0x8a: '\u0160',  # LATIN CAPITAL LETTER S WITH CARON

Linked PRs

Metadata

Metadata

Assignees

Labels

docsDocumentation in the Doc dir

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions