Skip to content

gh-135661: Fix CDATA section parsing in HTMLParser #135665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open
Prev Previous commit
Next Next commit
Make the setter method private.
  • Loading branch information
serhiy-storchaka committed Aug 3, 2025
commit 50fd4b35fe640dc5d23f3864fa4c549b5fa852f7
11 changes: 0 additions & 11 deletions Doc/library/html.parser.rst
Original file line number Diff line number Diff line change
Expand Up @@ -121,17 +121,6 @@ The output will then be:
attributes can be preserved, etc.).


.. method:: HTMLParser.support_cdata(flag)

Sets how the parser will parse CDATA declarations.
If *flag* is true, then the :meth:`unknown_decl` method will be called
for the CDATA section ``<![CDATA[...]]>``.
If *flag* is false, then the :meth:`handle_comment` method will be called
for ``<![CDATA[...>``.

.. versionadded:: 3.13.6


The following methods are called when data or markup elements are encountered
and they are meant to be overridden in a subclass. The base class
implementations do nothing (except for :meth:`~HTMLParser.handle_startendtag`):
Expand Down
12 changes: 11 additions & 1 deletion Lib/html/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,17 @@ def clear_cdata_mode(self):
self.cdata_elem = None
self._escapable = True

def support_cdata(self, flag=True):
def _set_support_cdata(self, flag=True):
"""Enable or disable support of the CDATA sections.
If enabled, "<[CDATA[" starts a CDATA section which ends with "]]>".
If disabled, "<[CDATA[" starts a bogus comments which ends with ">".

This method is not called by default. Its purpose is to be called
in custom handle_starttag() and handle_endtag() methods, with
value that depends on the adjusted current node.
See https://html.spec.whatwg.org/multipage/parsing.html#markup-declaration-open-state
for details.
"""
self._support_cdata = flag

# Internal -- handle data as far as reasonable. May leave state
Expand Down
6 changes: 3 additions & 3 deletions Lib/test/test_htmlparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ def __init__(self, *args, autocdata=False, **kw):
self.append = self.events.append
html.parser.HTMLParser.__init__(self, *args, **kw)
if autocdata:
self.support_cdata(False)
self._set_support_cdata(False)

def get_events(self):
# Normalize the list of events so that buffer artefacts don't
Expand All @@ -38,15 +38,15 @@ def get_events(self):
def handle_starttag(self, tag, attrs):
self.append(("starttag", tag, attrs))
if self.autocdata and tag == 'svg':
self.support_cdata(True)
self._set_support_cdata(True)

def handle_startendtag(self, tag, attrs):
self.append(("startendtag", tag, attrs))

def handle_endtag(self, tag):
self.append(("endtag", tag))
if self.autocdata and tag == 'svg':
self.support_cdata(False)
self._set_support_cdata(False)

# all other markup

Expand Down
Loading