Skip to content

Support other RAWTEXT and PLAINTEXT elements in HTMLParser #137836

@serhiy-storchaka

Description

@serhiy-storchaka

Bug report

HTMLParser initially only supported RAWTEXT elements "style" and "script". Then support of RCDATA elements "title" and "textarea" was added in #118350. But there are more RAWTEXT elements: "xmp", "iframe", "noembed", and "noframes".

"noscript" is also switches to the RAWTEXT mode if the scripting flag is enabled.

And the "plaintext" tag switches to the PLAINTEXT state from which there is no exit.

Support of other RAWTEXT elements can be enabled from the user code by adding them to HTMLParser.CDATA_CONTENT_ELEMENTS (this can be done for separate HTMLParser instance), but it would be better to support them by default. "plaintext" needs a special code.

Linked PRs

Metadata

Metadata

Labels

stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or errortype-securityA security issue

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions