From ab7341d85f03cf94d3dc4c97801dacc8a4704f17 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Sat, 20 Apr 2019 09:46:28 +0200 Subject: [PATCH 01/22] bpo-36673: Implement comment/PI parsing support for the TreeBuilder in ElementTree. --- Doc/library/xml.etree.elementtree.rst | 56 ++++- Lib/test/test_xml_etree.py | 77 +++++- Lib/xml/etree/ElementTree.py | 60 ++++- .../2019-04-20-09-50-32.bpo-36673.XF4Egb.rst | 3 + Modules/_elementtree.c | 237 +++++++++++++++--- Modules/clinic/_elementtree.c.h | 72 +++++- 6 files changed, 452 insertions(+), 53 deletions(-) create mode 100644 Misc/NEWS.d/next/Library/2019-04-20-09-50-32.bpo-36673.XF4Egb.rst diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index 9e2c295867ca3a..5c683c74f24e2a 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -523,8 +523,9 @@ Functions Parses an XML section into an element tree incrementally, and reports what's going on to the user. *source* is a filename or :term:`file object` containing XML data. *events* is a sequence of events to report back. The - supported events are the strings ``"start"``, ``"end"``, ``"start-ns"`` and - ``"end-ns"`` (the "ns" events are used to get detailed namespace + supported events are the strings ``"start"``, ``"end"``, ``"comment"``, + ``"pi"``, ``"start-ns"`` and ``"end-ns"`` + (the "ns" events are used to get detailed namespace information). If *events* is omitted, only ``"end"`` events are reported. *parser* is an optional parser instance. If not given, the standard :class:`XMLParser` parser is used. *parser* must be a subclass of @@ -549,6 +550,10 @@ Functions .. deprecated:: 3.4 The *parser* argument. + .. versionchanged:: 3.8 + The ``comment`` and ``pi`` events were added. + + .. function:: parse(source, parser=None) Parses an XML section into an element tree. *source* is a filename or file @@ -1021,14 +1026,24 @@ TreeBuilder Objects ^^^^^^^^^^^^^^^^^^^ -.. class:: TreeBuilder(element_factory=None) +.. class:: TreeBuilder(element_factory=None, comment_factory=None, \ + pi_factory=None) Generic element structure builder. This builder converts a sequence of start, data, and end method calls to a well-formed element structure. You can use this class to build an element structure using a custom XML parser, - or a parser for some other XML-like format. *element_factory*, when given, - must be a callable accepting two positional arguments: a tag and - a dict of attributes. It is expected to return a new element instance. + or a parser for some other XML-like format. + + *element_factory*, when given, must be a callable accepting two positional + arguments: a tag and a dict of attributes. It is expected to return a new + element instance. + + The *comment_factory* and *pi_factory* functions, when given, should behave + like the :func:`Comment` and :func:`ProcessingInstruction` functions to + create comments and processing instructions. When not given, no comments + or processing instructions will be created. Note that these objects will + not currently be appended to the tree when they appear outside of the root + element. .. method:: close() @@ -1053,6 +1068,21 @@ TreeBuilder Objects Opens a new element. *tag* is the element name. *attrs* is a dictionary containing element attributes. Returns the opened element. + .. method:: comment(text) + + Adds a comment with the given *text*. If *comment_factory* is + :const:`None`, this will just return the text. + + .. versionadded:: 3.8 + + .. method:: pi(target, text) + + Adds a comment with the given *target* name and *text*. If + *pi_factory* is :const:`None`, this will return a ``(target, text)`` + tuple. + + .. versionadded:: 3.8 + In addition, a custom :class:`TreeBuilder` object can provide the following method: @@ -1150,9 +1180,9 @@ XMLPullParser Objects callback target, :class:`XMLPullParser` collects an internal list of parsing events and lets the user read from it. *events* is a sequence of events to report back. The supported events are the strings ``"start"``, ``"end"``, - ``"start-ns"`` and ``"end-ns"`` (the "ns" events are used to get detailed - namespace information). If *events* is omitted, only ``"end"`` events are - reported. + ``"comment"``, ``"pi"``, ``"start-ns"`` and ``"end-ns"`` (the "ns" events + are used to get detailed namespace information). If *events* is omitted, + only ``"end"`` events are reported. .. method:: feed(data) @@ -1172,6 +1202,10 @@ XMLPullParser Objects parser. The iterator yields ``(event, elem)`` pairs, where *event* is a string representing the type of event (e.g. ``"end"``) and *elem* is the encountered :class:`Element` object. + For ``start-ns`` events, the ``elem`` is a tuple ``(prefix, uri)`` naming + the declared namespace mapping. For ``end-ns`` events, the ``elem`` is + :const:`None`. For ``comment`` events, the second value is the comment + text and for ``pi`` events a tuple ``(target, text)``. Events provided in a previous call to :meth:`read_events` will not be yielded again. Events are consumed from the internal queue only when @@ -1191,6 +1225,10 @@ XMLPullParser Objects .. versionadded:: 3.4 + .. versionchanged:: 3.8 + The ``comment`` and ``pi`` events were added. + + Exceptions ^^^^^^^^^^ diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index 14ce32af802624..c022906bd938bf 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -1193,6 +1193,9 @@ def _feed(self, parser, data, chunk_size=None): for i in range(0, len(data), chunk_size): parser.feed(data[i:i+chunk_size]) + def assert_events(self, parser, expected): + self.assertEqual(list(parser.read_events()), expected) + def assert_event_tags(self, parser, expected): events = parser.read_events() self.assertEqual([(action, elem.tag) for action, elem in events], @@ -1275,8 +1278,10 @@ def test_events(self): self.assert_event_tags(parser, []) parser = ET.XMLPullParser(events=('start', 'end')) - self._feed(parser, "\n") - self.assert_event_tags(parser, []) + self._feed(parser, "\n") + self.assert_events(parser, []) + + parser = ET.XMLPullParser(events=('start', 'end')) self._feed(parser, "\n") self.assert_event_tags(parser, [('start', 'root')]) self._feed(parser, "text") self.assertIsNone(parser.close()) + def test_events_comment(self): + parser = ET.XMLPullParser(events=('start', 'comment', 'end')) + self._feed(parser, "\n") + self.assert_events(parser, [('comment', ' text here ')]) + self._feed(parser, "\n") + self.assert_events(parser, [('comment', ' more text here ')]) + self._feed(parser, "text") + self.assert_event_tags(parser, [('start', 'root-tag')]) + self._feed(parser, "\n") + self.assert_events(parser, [('comment', ' inner comment')]) + self._feed(parser, "\n") + self.assert_event_tags(parser, [('end', 'root-tag')]) + self._feed(parser, "\n") + self.assert_events(parser, [('comment', ' outer comment ')]) + + parser = ET.XMLPullParser(events=('comment',)) + self._feed(parser, "\n") + self.assert_events(parser, [('comment', ' text here ')]) + + def test_events_pi(self): + parser = ET.XMLPullParser(events=('start', 'pi', 'end')) + self._feed(parser, "\n") + self.assert_events(parser, [('pi', ('pitarget', ''))]) + parser = ET.XMLPullParser(events=('pi',)) + self._feed(parser, "\n") + self.assert_events(parser, [('pi', ('pitarget', 'some text '))]) + + def test_events_sequence(self): # Test that events can be some sequence that's not just a tuple or list eventset = {'end', 'start'} @@ -2658,6 +2691,31 @@ class DummyBuilder(BaseDummyBuilder): parser.feed(self.sample1) self.assertIsNone(parser.close()) + def test_treebuilder_comment(self): + b = ET.TreeBuilder() + self.assertEqual(b.comment('ctext'), 'ctext') + + b = ET.TreeBuilder(comment_factory=ET.Comment) + self.assertEqual(b.comment('ctext').tag, ET.Comment) + self.assertEqual(b.comment('ctext').text, 'ctext') + + b = ET.TreeBuilder(comment_factory=len) + self.assertEqual(b.comment('ctext'), len('ctext')) + + def test_treebuilder_pi(self): + b = ET.TreeBuilder() + self.assertEqual(b.pi('target', None), ('target', None)) + + b = ET.TreeBuilder(pi_factory=ET.PI) + self.assertEqual(b.pi('target').tag, ET.PI) + self.assertEqual(b.pi('target').text, "target") + self.assertEqual(b.pi('pitarget', ' text ').tag, ET.PI) + self.assertEqual(b.pi('pitarget', ' text ').text, "pitarget text ") + + b = ET.TreeBuilder(pi_factory=lambda target, text: (len(target), text)) + self.assertEqual(b.pi('target'), (len('target'), None)) + self.assertEqual(b.pi('pitarget', ' text '), (len('pitarget'), ' text ')) + def test_treebuilder_elementfactory_none(self): parser = ET.XMLParser(target=ET.TreeBuilder(element_factory=None)) parser.feed(self.sample1) @@ -2678,6 +2736,21 @@ def foobar(self, x): e = parser.close() self._check_sample1_element(e) + def test_subclass_comment_pi(self): + class MyTreeBuilder(ET.TreeBuilder): + def foobar(self, x): + return x * 2 + + tb = MyTreeBuilder(comment_factory=ET.Comment, pi_factory=ET.PI) + self.assertEqual(tb.foobar(10), 20) + + parser = ET.XMLParser(target=tb) + parser.feed(self.sample1) + parser.feed('') + + e = parser.close() + self._check_sample1_element(e) + def test_element_factory(self): lst = [] def myfactory(tag, attrib): diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index c9e2f36835021e..c2fab3798d87ab 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1374,12 +1374,22 @@ class TreeBuilder: *element_factory* is an optional element factory which is called to create new Element instances, as necessary. + *comment_factory* is a factory to create comments. If not provided, + comments will not be inserted into the tree and "comment" pull parser + events will only return the plain text. + + *pi_factory* is a factory to create processing instructions. If not + provided, PIs will not be inserted into the tree and "pi" pull parser + events will only return a (target, text) tuple. """ - def __init__(self, element_factory=None): + def __init__(self, element_factory=None, comment_factory=None, pi_factory=None): self._data = [] # data collector self._elem = [] # element stack self._last = None # last element + self._root = None # root element self._tail = None # true if we're after an end tag + self._comment_factory = comment_factory + self._pi_factory = pi_factory if element_factory is None: element_factory = Element self._factory = element_factory @@ -1387,8 +1397,8 @@ def __init__(self, element_factory=None): def close(self): """Flush builder buffers and return toplevel document Element.""" assert len(self._elem) == 0, "missing end tags" - assert self._last is not None, "missing toplevel element" - return self._last + assert self._root is not None, "missing toplevel element" + return self._root def _flush(self): if self._data: @@ -1417,6 +1427,8 @@ def start(self, tag, attrs): self._last = elem = self._factory(tag, attrs) if self._elem: self._elem[-1].append(elem) + elif self._root is None: + self._root = elem self._elem.append(elem) self._tail = 0 return elem @@ -1435,6 +1447,39 @@ def end(self, tag): self._tail = 1 return self._last + def comment(self, text): + """Create a comment using the comment_factory. + + If no factory is provided, comments are ignored + and the text returned as is. + + *text* is the text of the comment. + """ + if self._comment_factory is None: + return text + return self._handle_single(self._comment_factory, text) + + def pi(self, target, text=None): + """Create a processing instruction using the pi_factory. + + If no factory is provided, PIs are ignored and a (target, text) + tuple is returned. + + *target* is the target name of the processing instruction. + *text* is the data of the processing instruction, or ''. + """ + if self._pi_factory is None: + return (target, text) + return self._handle_single(self._pi_factory, target, text) + + def _handle_single(self, factory, *args): + self._flush() + self._last = elem = factory(*args) + if self._elem: + self._elem[-1].append(elem) + self._tail = 1 + return elem + # also see ElementTree and TreeBuilder class XMLParser: @@ -1519,6 +1564,15 @@ def handler(prefix, uri, event=event_name, append=append): def handler(prefix, event=event_name, append=append): append((event, None)) parser.EndNamespaceDeclHandler = handler + elif event_name == 'comment': + def handler(text, event=event_name, append=append, self=self): + append((event, self.target.comment(text))) + parser.CommentHandler = handler + elif event_name == 'pi': + def handler(pi_target, data, event=event_name, append=append, + self=self): + append((event, self.target.pi(pi_target, data))) + parser.ProcessingInstructionHandler = handler else: raise ValueError("unknown event %r" % event_name) diff --git a/Misc/NEWS.d/next/Library/2019-04-20-09-50-32.bpo-36673.XF4Egb.rst b/Misc/NEWS.d/next/Library/2019-04-20-09-50-32.bpo-36673.XF4Egb.rst new file mode 100644 index 00000000000000..76bf914e22b196 --- /dev/null +++ b/Misc/NEWS.d/next/Library/2019-04-20-09-50-32.bpo-36673.XF4Egb.rst @@ -0,0 +1,3 @@ +The TreeBuilder and XMLPullParser in xml.etree.ElementTree gained support +for parsing comments and processing instructions. +Patch by Stefan Behnel. diff --git a/Modules/_elementtree.c b/Modules/_elementtree.c index 1e58cd05b51237..663337d42dc768 100644 --- a/Modules/_elementtree.c +++ b/Modules/_elementtree.c @@ -2385,6 +2385,8 @@ typedef struct { Py_ssize_t index; /* current stack size (0 means empty) */ PyObject *element_factory; + PyObject *comment_factory; + PyObject *pi_factory; /* element tracing */ PyObject *events_append; /* the append method of the list of events, or NULL */ @@ -2392,6 +2394,8 @@ typedef struct { PyObject *end_event_obj; PyObject *start_ns_event_obj; PyObject *end_ns_event_obj; + PyObject *comment_event_obj; + PyObject *pi_event_obj; } TreeBuilderObject; #define TreeBuilder_CheckExact(op) (Py_TYPE(op) == &TreeBuilder_Type) @@ -2413,6 +2417,8 @@ treebuilder_new(PyTypeObject *type, PyObject *args, PyObject *kwds) t->data = NULL; t->element_factory = NULL; + t->comment_factory = NULL; + t->pi_factory = NULL; t->stack = PyList_New(20); if (!t->stack) { Py_DECREF(t->this); @@ -2425,6 +2431,7 @@ treebuilder_new(PyTypeObject *type, PyObject *args, PyObject *kwds) t->events_append = NULL; t->start_event_obj = t->end_event_obj = NULL; t->start_ns_event_obj = t->end_ns_event_obj = NULL; + t->comment_event_obj = t->pi_event_obj = NULL; } return (PyObject *)t; } @@ -2433,17 +2440,35 @@ treebuilder_new(PyTypeObject *type, PyObject *args, PyObject *kwds) _elementtree.TreeBuilder.__init__ element_factory: object = NULL + comment_factory: object = NULL + pi_factory: object = NULL [clinic start generated code]*/ static int _elementtree_TreeBuilder___init___impl(TreeBuilderObject *self, - PyObject *element_factory) -/*[clinic end generated code: output=91cfa7558970ee96 input=1b424eeefc35249c]*/ + PyObject *element_factory, + PyObject *comment_factory, + PyObject *pi_factory) +/*[clinic end generated code: output=da49f5ab76aee6d6 input=9b7d938a273ab7ad]*/ { - if (element_factory) { + if (element_factory && element_factory != Py_None) { Py_INCREF(element_factory); Py_XSETREF(self->element_factory, element_factory); + } else { + Py_CLEAR(self->element_factory); + } + if (comment_factory && comment_factory != Py_None) { + Py_INCREF(comment_factory); + Py_XSETREF(self->comment_factory, comment_factory); + } else { + Py_CLEAR(self->comment_factory); + } + if (pi_factory && pi_factory != Py_None) { + Py_INCREF(pi_factory); + Py_XSETREF(self->pi_factory, pi_factory); + } else { + Py_CLEAR(self->pi_factory); } return 0; @@ -2452,6 +2477,8 @@ _elementtree_TreeBuilder___init___impl(TreeBuilderObject *self, static int treebuilder_gc_traverse(TreeBuilderObject *self, visitproc visit, void *arg) { + Py_VISIT(self->pi_event_obj); + Py_VISIT(self->comment_event_obj); Py_VISIT(self->end_ns_event_obj); Py_VISIT(self->start_ns_event_obj); Py_VISIT(self->end_event_obj); @@ -2462,6 +2489,8 @@ treebuilder_gc_traverse(TreeBuilderObject *self, visitproc visit, void *arg) Py_VISIT(self->last); Py_VISIT(self->data); Py_VISIT(self->stack); + Py_VISIT(self->pi_factory); + Py_VISIT(self->comment_factory); Py_VISIT(self->element_factory); return 0; } @@ -2469,6 +2498,8 @@ treebuilder_gc_traverse(TreeBuilderObject *self, visitproc visit, void *arg) static int treebuilder_gc_clear(TreeBuilderObject *self) { + Py_CLEAR(self->pi_event_obj); + Py_CLEAR(self->comment_event_obj); Py_CLEAR(self->end_ns_event_obj); Py_CLEAR(self->start_ns_event_obj); Py_CLEAR(self->end_event_obj); @@ -2478,6 +2509,8 @@ treebuilder_gc_clear(TreeBuilderObject *self) Py_CLEAR(self->data); Py_CLEAR(self->last); Py_CLEAR(self->this); + Py_CLEAR(self->pi_factory); + Py_CLEAR(self->comment_factory); Py_CLEAR(self->element_factory); Py_CLEAR(self->root); return 0; @@ -2569,7 +2602,7 @@ treebuilder_append_event(TreeBuilderObject *self, PyObject *action, PyObject *event = PyTuple_Pack(2, action, node); if (event == NULL) return -1; - res = PyObject_CallFunctionObjArgs(self->events_append, event, NULL); + res = _PyObject_FastCall(self->events_append, &event, 1); Py_DECREF(event); if (res == NULL) return -1; @@ -2593,7 +2626,7 @@ treebuilder_handle_start(TreeBuilderObject* self, PyObject* tag, return NULL; } - if (!self->element_factory || self->element_factory == Py_None) { + if (!self->element_factory) { node = create_new_element(tag, attrib); } else if (attrib == Py_None) { attrib = PyDict_New(); @@ -2721,6 +2754,84 @@ treebuilder_handle_end(TreeBuilderObject* self, PyObject* tag) return (PyObject*) self->last; } +LOCAL(PyObject*) +treebuilder_handle_comment(TreeBuilderObject* self, PyObject* text) +{ + PyObject* comment = NULL; + PyObject* this; + + if (treebuilder_flush_data(self) < 0) { + return NULL; + } + + if (self->comment_factory) { + comment = _PyObject_FastCall(self->comment_factory, &text, 1); + if (!comment) + return NULL; + + this = self->this; + if (this != Py_None) { + if (treebuilder_add_subelement(this, comment) < 0) + goto error; + } + } else { + Py_INCREF(text); + comment = text; + } + + if (self->events_append && self->comment_event_obj) { + if (treebuilder_append_event(self, self->comment_event_obj, comment) < 0) + goto error; + } + + return comment; + + error: + Py_DECREF(comment); + return NULL; +} + +LOCAL(PyObject*) +treebuilder_handle_pi(TreeBuilderObject* self, PyObject* target, PyObject* text) +{ + PyObject* pi = NULL; + PyObject* this; + PyObject* stack[2] = {target, text}; + + if (treebuilder_flush_data(self) < 0) { + return NULL; + } + + if (self->pi_factory) { + pi = _PyObject_FastCall(self->pi_factory, stack, 2); + if (!pi) { + return NULL; + } + + this = self->this; + if (this != Py_None) { + if (treebuilder_add_subelement(this, pi) < 0) + goto error; + } + } else { + pi = PyTuple_Pack(2, target, text); + if (!pi) { + return NULL; + } + } + + if (self->events_append && self->pi_event_obj) { + if (treebuilder_append_event(self, self->pi_event_obj, pi) < 0) + goto error; + } + + return pi; + + error: + Py_DECREF(pi); + return NULL; +} + /* -------------------------------------------------------------------- */ /* methods (in alphabetical order) */ @@ -2754,6 +2865,38 @@ _elementtree_TreeBuilder_end(TreeBuilderObject *self, PyObject *tag) return treebuilder_handle_end(self, tag); } +/*[clinic input] +_elementtree.TreeBuilder.comment + + text: object + / + +[clinic start generated code]*/ + +static PyObject * +_elementtree_TreeBuilder_comment(TreeBuilderObject *self, PyObject *text) +/*[clinic end generated code: output=22835be41deeaa27 input=47e7ebc48ed01dfa]*/ +{ + return treebuilder_handle_comment(self, text); +} + +/*[clinic input] +_elementtree.TreeBuilder.pi + + target: object + text: object = None + / + +[clinic start generated code]*/ + +static PyObject * +_elementtree_TreeBuilder_pi_impl(TreeBuilderObject *self, PyObject *target, + PyObject *text) +/*[clinic end generated code: output=21eb95ec9d04d1d9 input=349342bd79c35570]*/ +{ + return treebuilder_handle_pi(self, target, text); +} + LOCAL(PyObject*) treebuilder_done(TreeBuilderObject* self) { @@ -2925,7 +3068,7 @@ expat_set_error(enum XML_Error error_code, Py_ssize_t line, Py_ssize_t column, if (errmsg == NULL) return; - error = PyObject_CallFunctionObjArgs(st->parseerror_obj, errmsg, NULL); + error = _PyObject_FastCall(st->parseerror_obj, &errmsg, 1); Py_DECREF(errmsg); if (!error) return; @@ -2988,7 +3131,7 @@ expat_default_handler(XMLParserObject* self, const XML_Char* data_in, (TreeBuilderObject*) self->target, value ); else if (self->handle_data) - res = PyObject_CallFunctionObjArgs(self->handle_data, value, NULL); + res = _PyObject_FastCall(self->handle_data, &value, 1); else res = NULL; Py_XDECREF(res); @@ -3099,7 +3242,7 @@ expat_data_handler(XMLParserObject* self, const XML_Char* data_in, /* shortcut */ res = treebuilder_handle_data((TreeBuilderObject*) self->target, data); else if (self->handle_data) - res = PyObject_CallFunctionObjArgs(self->handle_data, data, NULL); + res = _PyObject_FastCall(self->handle_data, &data, 1); else res = NULL; @@ -3126,7 +3269,7 @@ expat_end_handler(XMLParserObject* self, const XML_Char* tag_in) else if (self->handle_end) { tag = makeuniversal(self, tag_in); if (tag) { - res = PyObject_CallFunctionObjArgs(self->handle_end, tag, NULL); + res = _PyObject_FastCall(self->handle_end, &tag, 1); Py_DECREF(tag); } } @@ -3176,21 +3319,31 @@ expat_end_ns_handler(XMLParserObject* self, const XML_Char* prefix_in) static void expat_comment_handler(XMLParserObject* self, const XML_Char* comment_in) { - PyObject* comment; - PyObject* res; + PyObject* comment = NULL; + PyObject* res = NULL; if (PyErr_Occurred()) return; - if (self->handle_comment) { + if (TreeBuilder_CheckExact(self->target)) { + /* shortcut */ + TreeBuilderObject *target = (TreeBuilderObject*) self->target; + comment = PyUnicode_DecodeUTF8(comment_in, strlen(comment_in), "strict"); - if (comment) { - res = PyObject_CallFunctionObjArgs(self->handle_comment, - comment, NULL); - Py_XDECREF(res); - Py_DECREF(comment); - } + if (!comment) + return; /* parser will look for errors */ + + res = treebuilder_handle_comment(target, comment); + } else if (self->handle_comment) { + comment = PyUnicode_DecodeUTF8(comment_in, strlen(comment_in), "strict"); + if (!comment) + return; + + res = _PyObject_FastCall(self->handle_comment, &comment, 1); } + + Py_XDECREF(res); + Py_DECREF(comment); } static void @@ -3258,26 +3411,30 @@ static void expat_pi_handler(XMLParserObject* self, const XML_Char* target_in, const XML_Char* data_in) { - PyObject* target; - PyObject* data; + PyObject* parcel; PyObject* res; if (PyErr_Occurred()) return; - if (self->handle_pi) { - target = PyUnicode_DecodeUTF8(target_in, strlen(target_in), "strict"); - data = PyUnicode_DecodeUTF8(data_in, strlen(data_in), "strict"); - if (target && data) { - res = PyObject_CallFunctionObjArgs(self->handle_pi, - target, data, NULL); - Py_XDECREF(res); - Py_DECREF(data); - Py_DECREF(target); - } else { - Py_XDECREF(data); - Py_XDECREF(target); + if (TreeBuilder_CheckExact(self->target)) { + /* shortcut: TreeBuilder does not handle PIs */ + TreeBuilderObject *target = (TreeBuilderObject*) self->target; + + if (target->events_append && target->pi_event_obj) { + parcel = Py_BuildValue("ss", target_in, data_in); + if (!parcel) + return; + treebuilder_append_event(target, target->pi_event_obj, parcel); + Py_DECREF(parcel); } + } else if (self->handle_pi) { + parcel = Py_BuildValue("ss", target_in, data_in); + if (!parcel) + return; + res = PyObject_Call(self->handle_pi, parcel, NULL); + Py_XDECREF(res); + Py_DECREF(parcel); } } @@ -3695,6 +3852,8 @@ _elementtree_XMLParser__setevents_impl(XMLParserObject *self, Py_CLEAR(target->end_event_obj); Py_CLEAR(target->start_ns_event_obj); Py_CLEAR(target->end_ns_event_obj); + Py_CLEAR(target->comment_event_obj); + Py_CLEAR(target->pi_event_obj); if (events_to_report == Py_None) { /* default is "end" only */ @@ -3740,6 +3899,18 @@ _elementtree_XMLParser__setevents_impl(XMLParserObject *self, (XML_StartNamespaceDeclHandler) expat_start_ns_handler, (XML_EndNamespaceDeclHandler) expat_end_ns_handler ); + } else if (strcmp(event_name, "comment") == 0) { + Py_XSETREF(target->comment_event_obj, event_name_obj); + EXPAT(SetCommentHandler)( + self->parser, + (XML_CommentHandler) expat_comment_handler + ); + } else if (strcmp(event_name, "pi") == 0) { + Py_XSETREF(target->pi_event_obj, event_name_obj); + EXPAT(SetProcessingInstructionHandler)( + self->parser, + (XML_ProcessingInstructionHandler) expat_pi_handler + ); } else { Py_DECREF(event_name_obj); Py_DECREF(events_seq); @@ -3882,6 +4053,8 @@ static PyMethodDef treebuilder_methods[] = { _ELEMENTTREE_TREEBUILDER_DATA_METHODDEF _ELEMENTTREE_TREEBUILDER_START_METHODDEF _ELEMENTTREE_TREEBUILDER_END_METHODDEF + _ELEMENTTREE_TREEBUILDER_COMMENT_METHODDEF + _ELEMENTTREE_TREEBUILDER_PI_METHODDEF _ELEMENTTREE_TREEBUILDER_CLOSE_METHODDEF {NULL, NULL} }; diff --git a/Modules/clinic/_elementtree.c.h b/Modules/clinic/_elementtree.c.h index d239c802583c6c..b1c5f8e25d205f 100644 --- a/Modules/clinic/_elementtree.c.h +++ b/Modules/clinic/_elementtree.c.h @@ -635,30 +635,46 @@ _elementtree_Element_set(ElementObject *self, PyObject *const *args, Py_ssize_t static int _elementtree_TreeBuilder___init___impl(TreeBuilderObject *self, - PyObject *element_factory); + PyObject *element_factory, + PyObject *comment_factory, + PyObject *pi_factory); static int _elementtree_TreeBuilder___init__(PyObject *self, PyObject *args, PyObject *kwargs) { int return_value = -1; - static const char * const _keywords[] = {"element_factory", NULL}; + static const char * const _keywords[] = {"element_factory", "comment_factory", "pi_factory", NULL}; static _PyArg_Parser _parser = {NULL, _keywords, "TreeBuilder", 0}; - PyObject *argsbuf[1]; + PyObject *argsbuf[3]; PyObject * const *fastargs; Py_ssize_t nargs = PyTuple_GET_SIZE(args); Py_ssize_t noptargs = nargs + (kwargs ? PyDict_GET_SIZE(kwargs) : 0) - 0; PyObject *element_factory = NULL; + PyObject *comment_factory = NULL; + PyObject *pi_factory = NULL; - fastargs = _PyArg_UnpackKeywords(_PyTuple_CAST(args)->ob_item, nargs, kwargs, NULL, &_parser, 0, 1, 0, argsbuf); + fastargs = _PyArg_UnpackKeywords(_PyTuple_CAST(args)->ob_item, nargs, kwargs, NULL, &_parser, 0, 3, 0, argsbuf); if (!fastargs) { goto exit; } if (!noptargs) { goto skip_optional_pos; } - element_factory = fastargs[0]; + if (fastargs[0]) { + element_factory = fastargs[0]; + if (!--noptargs) { + goto skip_optional_pos; + } + } + if (fastargs[1]) { + comment_factory = fastargs[1]; + if (!--noptargs) { + goto skip_optional_pos; + } + } + pi_factory = fastargs[2]; skip_optional_pos: - return_value = _elementtree_TreeBuilder___init___impl((TreeBuilderObject *)self, element_factory); + return_value = _elementtree_TreeBuilder___init___impl((TreeBuilderObject *)self, element_factory, comment_factory, pi_factory); exit: return return_value; @@ -680,6 +696,48 @@ PyDoc_STRVAR(_elementtree_TreeBuilder_end__doc__, #define _ELEMENTTREE_TREEBUILDER_END_METHODDEF \ {"end", (PyCFunction)_elementtree_TreeBuilder_end, METH_O, _elementtree_TreeBuilder_end__doc__}, +PyDoc_STRVAR(_elementtree_TreeBuilder_comment__doc__, +"comment($self, text, /)\n" +"--\n" +"\n"); + +#define _ELEMENTTREE_TREEBUILDER_COMMENT_METHODDEF \ + {"comment", (PyCFunction)_elementtree_TreeBuilder_comment, METH_O, _elementtree_TreeBuilder_comment__doc__}, + +PyDoc_STRVAR(_elementtree_TreeBuilder_pi__doc__, +"pi($self, target, text=None, /)\n" +"--\n" +"\n"); + +#define _ELEMENTTREE_TREEBUILDER_PI_METHODDEF \ + {"pi", (PyCFunction)(void(*)(void))_elementtree_TreeBuilder_pi, METH_FASTCALL, _elementtree_TreeBuilder_pi__doc__}, + +static PyObject * +_elementtree_TreeBuilder_pi_impl(TreeBuilderObject *self, PyObject *target, + PyObject *text); + +static PyObject * +_elementtree_TreeBuilder_pi(TreeBuilderObject *self, PyObject *const *args, Py_ssize_t nargs) +{ + PyObject *return_value = NULL; + PyObject *target; + PyObject *text = Py_None; + + if (!_PyArg_CheckPositional("pi", nargs, 1, 2)) { + goto exit; + } + target = args[0]; + if (nargs < 2) { + goto skip_optional; + } + text = args[1]; +skip_optional: + return_value = _elementtree_TreeBuilder_pi_impl(self, target, text); + +exit: + return return_value; +} + PyDoc_STRVAR(_elementtree_TreeBuilder_close__doc__, "close($self, /)\n" "--\n" @@ -853,4 +911,4 @@ _elementtree_XMLParser__setevents(XMLParserObject *self, PyObject *const *args, exit: return return_value; } -/*[clinic end generated code: output=440b5d90a4b86590 input=a9049054013a1b77]*/ +/*[clinic end generated code: output=94ec504fdbcea1d3 input=a9049054013a1b77]*/ From 2d2df114bb0b29d988ee45105951cd9b91c2d43c Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Sat, 20 Apr 2019 22:36:37 +0200 Subject: [PATCH 02/22] bpo-36673: Rewrite the comment/PI factory handling for the TreeBuilder in "_elementtree" to make it use the same factories as the ElementTree module, and to make it explicit when the comments/PIs are inserted into the tree and when they are not (which is the default). --- Doc/library/xml.etree.elementtree.rst | 32 +++--- Lib/test/test_xml_etree.py | 35 ++++--- Lib/xml/etree/ElementTree.py | 59 ++++++----- Modules/_elementtree.c | 136 ++++++++++++++++++++++---- Modules/clinic/_elementtree.c.h | 76 ++++++++++++-- 5 files changed, 258 insertions(+), 80 deletions(-) diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index 5c683c74f24e2a..1e4134aa1e4ad3 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -1026,13 +1026,13 @@ TreeBuilder Objects ^^^^^^^^^^^^^^^^^^^ -.. class:: TreeBuilder(element_factory=None, comment_factory=None, \ - pi_factory=None) +.. class:: TreeBuilder(element_factory=None, *, comment_factory=None, \ + pi_factory=None, insert_comments=False, insert_pis=False) Generic element structure builder. This builder converts a sequence of - start, data, and end method calls to a well-formed element structure. You - can use this class to build an element structure using a custom XML parser, - or a parser for some other XML-like format. + start, data, end, comment and pi method calls to a well-formed element + structure. You can use this class to build an element structure using + a custom XML parser, or a parser for some other XML-like format. *element_factory*, when given, must be a callable accepting two positional arguments: a tag and a dict of attributes. It is expected to return a new @@ -1040,10 +1040,10 @@ TreeBuilder Objects The *comment_factory* and *pi_factory* functions, when given, should behave like the :func:`Comment` and :func:`ProcessingInstruction` functions to - create comments and processing instructions. When not given, no comments - or processing instructions will be created. Note that these objects will - not currently be appended to the tree when they appear outside of the root - element. + create comments and processing instructions. When not given, the default + factories will be used. When *insert_comments* and/or *insert_pis* is true, + comments/pis will be inserted into the tree if they appear within the root + element (but not outside of it). .. method:: close() @@ -1068,6 +1068,7 @@ TreeBuilder Objects Opens a new element. *tag* is the element name. *attrs* is a dictionary containing element attributes. Returns the opened element. + .. method:: comment(text) Adds a comment with the given *text*. If *comment_factory* is @@ -1075,6 +1076,7 @@ TreeBuilder Objects .. versionadded:: 3.8 + .. method:: pi(target, text) Adds a comment with the given *target* name and *text*. If @@ -1201,11 +1203,13 @@ XMLPullParser Objects data fed to the parser. The iterator yields ``(event, elem)`` pairs, where *event* is a string representing the type of event (e.g. ``"end"``) and *elem* is the - encountered :class:`Element` object. - For ``start-ns`` events, the ``elem`` is a tuple ``(prefix, uri)`` naming - the declared namespace mapping. For ``end-ns`` events, the ``elem`` is - :const:`None`. For ``comment`` events, the second value is the comment - text and for ``pi`` events a tuple ``(target, text)``. + encountered :class:`Element` object, or other context value as follows. + + * ``start``, ``end``: the current Element. + * ``comment``, ``pi``: the current comment / processing instruction + * ``start-ns``: a tuple ``(prefix, uri)`` naming the declared namespace + mapping. + * ``end-ns``: :const:`None` (this may change in a future version) Events provided in a previous call to :meth:`read_events` will not be yielded again. Events are consumed from the internal queue only when diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index c022906bd938bf..94a22882cb8343 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -1194,7 +1194,10 @@ def _feed(self, parser, data, chunk_size=None): parser.feed(data[i:i+chunk_size]) def assert_events(self, parser, expected): - self.assertEqual(list(parser.read_events()), expected) + self.assertEqual( + [(event, (elem.tag, elem.text)) + for event, elem in parser.read_events()], + expected) def assert_event_tags(self, parser, expected): events = parser.read_events() @@ -1321,30 +1324,29 @@ def test_events(self): def test_events_comment(self): parser = ET.XMLPullParser(events=('start', 'comment', 'end')) self._feed(parser, "\n") - self.assert_events(parser, [('comment', ' text here ')]) + self.assert_events(parser, [('comment', (ET.Comment, ' text here '))]) self._feed(parser, "\n") - self.assert_events(parser, [('comment', ' more text here ')]) + self.assert_events(parser, [('comment', (ET.Comment, ' more text here '))]) self._feed(parser, "text") self.assert_event_tags(parser, [('start', 'root-tag')]) self._feed(parser, "\n") - self.assert_events(parser, [('comment', ' inner comment')]) + self.assert_events(parser, [('comment', (ET.Comment, ' inner comment'))]) self._feed(parser, "\n") self.assert_event_tags(parser, [('end', 'root-tag')]) self._feed(parser, "\n") - self.assert_events(parser, [('comment', ' outer comment ')]) + self.assert_events(parser, [('comment', (ET.Comment, ' outer comment '))]) parser = ET.XMLPullParser(events=('comment',)) self._feed(parser, "\n") - self.assert_events(parser, [('comment', ' text here ')]) + self.assert_events(parser, [('comment', (ET.Comment, ' text here '))]) def test_events_pi(self): parser = ET.XMLPullParser(events=('start', 'pi', 'end')) self._feed(parser, "\n") - self.assert_events(parser, [('pi', ('pitarget', ''))]) + self.assert_events(parser, [('pi', (ET.PI, 'pitarget'))]) parser = ET.XMLPullParser(events=('pi',)) self._feed(parser, "\n") - self.assert_events(parser, [('pi', ('pitarget', 'some text '))]) - + self.assert_events(parser, [('pi', (ET.PI, 'pitarget some text '))]) def test_events_sequence(self): # Test that events can be some sequence that's not just a tuple or list @@ -1365,7 +1367,6 @@ def __next__(self): self._feed(parser, "bar") self.assert_event_tags(parser, [('start', 'foo'), ('end', 'foo')]) - def test_unknown_event(self): with self.assertRaises(ValueError): ET.XMLPullParser(events=('start', 'end', 'bogus')) @@ -2693,7 +2694,8 @@ class DummyBuilder(BaseDummyBuilder): def test_treebuilder_comment(self): b = ET.TreeBuilder() - self.assertEqual(b.comment('ctext'), 'ctext') + self.assertEqual(b.comment('ctext').tag, ET.Comment) + self.assertEqual(b.comment('ctext').text, 'ctext') b = ET.TreeBuilder(comment_factory=ET.Comment) self.assertEqual(b.comment('ctext').tag, ET.Comment) @@ -2704,7 +2706,8 @@ def test_treebuilder_comment(self): def test_treebuilder_pi(self): b = ET.TreeBuilder() - self.assertEqual(b.pi('target', None), ('target', None)) + self.assertEqual(b.pi('target', None).tag, ET.PI) + self.assertEqual(b.pi('target', None).text, 'target') b = ET.TreeBuilder(pi_factory=ET.PI) self.assertEqual(b.pi('target').tag, ET.PI) @@ -3408,6 +3411,12 @@ def test_main(module=None): # Copy the path cache (should be empty) path_cache = ElementPath._cache ElementPath._cache = path_cache.copy() + # Align the Comment/PI factories. + if hasattr(ET, '_set_factories'): + old_factories = ET._set_factories(ET.Comment, ET.PI) + else: + old_factories = None + try: support.run_unittest(*test_classes) finally: @@ -3416,6 +3425,8 @@ def test_main(module=None): nsmap.clear() nsmap.update(nsmap_copy) ElementPath._cache = path_cache + if old_factories is not None: + ET._set_factories(*old_factories) # don't interfere with subsequent tests ET = pyET = None diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index c2fab3798d87ab..c6400480f5b4b4 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1374,22 +1374,30 @@ class TreeBuilder: *element_factory* is an optional element factory which is called to create new Element instances, as necessary. - *comment_factory* is a factory to create comments. If not provided, - comments will not be inserted into the tree and "comment" pull parser - events will only return the plain text. + *comment_factory* is a factory to create comments to be used instead of + the standard factory. If *insert_comments* is false (the default), + comments will not be inserted into the tree. - *pi_factory* is a factory to create processing instructions. If not - provided, PIs will not be inserted into the tree and "pi" pull parser - events will only return a (target, text) tuple. + *pi_factory* is a factory to create processing instructions to be used + instead of the standard factory. If *insert_pis* is false (the default), + processing instructions will not be inserted into the tree. """ - def __init__(self, element_factory=None, comment_factory=None, pi_factory=None): + def __init__(self, element_factory=None, *, + comment_factory=None, pi_factory=None, + insert_comments=False, insert_pis=False): self._data = [] # data collector self._elem = [] # element stack self._last = None # last element self._root = None # root element self._tail = None # true if we're after an end tag + if comment_factory is None: + comment_factory = Comment self._comment_factory = comment_factory + self.insert_comments = insert_comments + if pi_factory is None: + pi_factory = ProcessingInstruction self._pi_factory = pi_factory + self.insert_pis = insert_pis if element_factory is None: element_factory = Element self._factory = element_factory @@ -1450,34 +1458,28 @@ def end(self, tag): def comment(self, text): """Create a comment using the comment_factory. - If no factory is provided, comments are ignored - and the text returned as is. - *text* is the text of the comment. """ - if self._comment_factory is None: - return text - return self._handle_single(self._comment_factory, text) + return self._handle_single( + self._comment_factory, self.insert_comments, text) def pi(self, target, text=None): """Create a processing instruction using the pi_factory. - If no factory is provided, PIs are ignored and a (target, text) - tuple is returned. - *target* is the target name of the processing instruction. *text* is the data of the processing instruction, or ''. """ - if self._pi_factory is None: - return (target, text) - return self._handle_single(self._pi_factory, target, text) - - def _handle_single(self, factory, *args): - self._flush() - self._last = elem = factory(*args) - if self._elem: - self._elem[-1].append(elem) - self._tail = 1 + return self._handle_single( + self._pi_factory, self.insert_pis, target, text) + + def _handle_single(self, factory, insert, *args): + elem = factory(*args) + if insert: + self._flush() + self._last = elem + if self._elem: + self._elem[-1].append(elem) + self._tail = 1 return elem @@ -1694,7 +1696,10 @@ def close(self): # (see tests) _Element_Py = Element - # Element, SubElement, ParseError, TreeBuilder, XMLParser + # Element, SubElement, ParseError, TreeBuilder, XMLParser, _set_factories from _elementtree import * + from _elementtree import _set_factories except ImportError: pass +else: + _set_factories(Comment, ProcessingInstruction) diff --git a/Modules/_elementtree.c b/Modules/_elementtree.c index 663337d42dc768..5481c61678712b 100644 --- a/Modules/_elementtree.c +++ b/Modules/_elementtree.c @@ -92,6 +92,8 @@ typedef struct { PyObject *parseerror_obj; PyObject *deepcopy_obj; PyObject *elementpath_obj; + PyObject *comment_factory; + PyObject *pi_factory; } elementtreestate; static struct PyModuleDef elementtreemodule; @@ -114,6 +116,8 @@ elementtree_clear(PyObject *m) Py_CLEAR(st->parseerror_obj); Py_CLEAR(st->deepcopy_obj); Py_CLEAR(st->elementpath_obj); + Py_CLEAR(st->comment_factory); + Py_CLEAR(st->pi_factory); return 0; } @@ -124,6 +128,8 @@ elementtree_traverse(PyObject *m, visitproc visit, void *arg) Py_VISIT(st->parseerror_obj); Py_VISIT(st->deepcopy_obj); Py_VISIT(st->elementpath_obj); + Py_VISIT(st->comment_factory); + Py_VISIT(st->pi_factory); return 0; } @@ -2396,6 +2402,9 @@ typedef struct { PyObject *end_ns_event_obj; PyObject *comment_event_obj; PyObject *pi_event_obj; + + char insert_comments; + char insert_pis; } TreeBuilderObject; #define TreeBuilder_CheckExact(op) (Py_TYPE(op) == &TreeBuilder_Type) @@ -2432,6 +2441,7 @@ treebuilder_new(PyTypeObject *type, PyObject *args, PyObject *kwds) t->start_event_obj = t->end_event_obj = NULL; t->start_ns_event_obj = t->end_ns_event_obj = NULL; t->comment_event_obj = t->pi_event_obj = NULL; + t->insert_comments = t->insert_pis = 0; } return (PyObject *)t; } @@ -2440,8 +2450,11 @@ treebuilder_new(PyTypeObject *type, PyObject *args, PyObject *kwds) _elementtree.TreeBuilder.__init__ element_factory: object = NULL + * comment_factory: object = NULL pi_factory: object = NULL + insert_comments: bool = False + insert_pis: bool = False [clinic start generated code]*/ @@ -2449,8 +2462,9 @@ static int _elementtree_TreeBuilder___init___impl(TreeBuilderObject *self, PyObject *element_factory, PyObject *comment_factory, - PyObject *pi_factory) -/*[clinic end generated code: output=da49f5ab76aee6d6 input=9b7d938a273ab7ad]*/ + PyObject *pi_factory, + int insert_comments, int insert_pis) +/*[clinic end generated code: output=8571d4dcadfdf952 input=1f967b5c245e0a71]*/ { if (element_factory && element_factory != Py_None) { Py_INCREF(element_factory); @@ -2458,17 +2472,31 @@ _elementtree_TreeBuilder___init___impl(TreeBuilderObject *self, } else { Py_CLEAR(self->element_factory); } - if (comment_factory && comment_factory != Py_None) { + + if (!comment_factory || comment_factory == Py_None) { + elementtreestate *st = ET_STATE_GLOBAL; + comment_factory = st->comment_factory; + } + if (comment_factory) { Py_INCREF(comment_factory); Py_XSETREF(self->comment_factory, comment_factory); + self->insert_comments = insert_comments; } else { Py_CLEAR(self->comment_factory); + self->insert_comments = 0; } - if (pi_factory && pi_factory != Py_None) { + + if (!pi_factory || pi_factory == Py_None) { + elementtreestate *st = ET_STATE_GLOBAL; + pi_factory = st->pi_factory; + } + if (pi_factory) { Py_INCREF(pi_factory); Py_XSETREF(self->pi_factory, pi_factory); + self->insert_pis = insert_pis; } else { Py_CLEAR(self->pi_factory); + self->insert_pis = 0; } return 0; @@ -2527,6 +2555,57 @@ treebuilder_dealloc(TreeBuilderObject *self) /* -------------------------------------------------------------------- */ /* helpers for handling of arbitrary element-like objects */ +/*[clinic input] +_elementtree._set_factories + + comment_factory: object + pi_factory: object + / + +Change the factories used to create comments and processing instructions. + +For internal use only. +[clinic start generated code]*/ + +static PyObject * +_elementtree__set_factories_impl(PyObject *module, PyObject *comment_factory, + PyObject *pi_factory) +/*[clinic end generated code: output=813b408adee26535 input=99d17627aea7fb3b]*/ +{ + elementtreestate *st = ET_STATE_GLOBAL; + PyObject *old; + + if (!PyCallable_Check(comment_factory) && comment_factory != Py_None) { + PyErr_Format(PyExc_TypeError, "Comment factory must be callable, not %.100s", + Py_TYPE(comment_factory)->tp_name); + return NULL; + } + if (!PyCallable_Check(pi_factory) && pi_factory != Py_None) { + PyErr_Format(PyExc_TypeError, "PI factory must be callable, not %.100s", + Py_TYPE(pi_factory)->tp_name); + return NULL; + } + + old = PyTuple_Pack(2, + st->comment_factory ? st->comment_factory : Py_None, + st->pi_factory ? st->pi_factory : Py_None); + + if (comment_factory == Py_None) { + Py_CLEAR(st->comment_factory); + } else { + Py_INCREF(comment_factory); + Py_XSETREF(st->comment_factory, comment_factory); + } + if (pi_factory == Py_None) { + Py_CLEAR(st->pi_factory); + } else { + Py_INCREF(pi_factory); + Py_XSETREF(st->pi_factory, pi_factory); + } + + return old; +} + static int treebuilder_set_element_text_or_tail(PyObject *element, PyObject **data, PyObject **dest, _Py_Identifier *name) @@ -2770,7 +2849,7 @@ treebuilder_handle_comment(TreeBuilderObject* self, PyObject* text) return NULL; this = self->this; - if (this != Py_None) { + if (self->insert_comments && this != Py_None) { if (treebuilder_add_subelement(this, comment) < 0) goto error; } @@ -2809,7 +2888,7 @@ treebuilder_handle_pi(TreeBuilderObject* self, PyObject* target, PyObject* text) } this = self->this; - if (this != Py_None) { + if (self->insert_pis && this != Py_None) { if (treebuilder_add_subelement(this, pi) < 0) goto error; } @@ -3411,31 +3490,51 @@ static void expat_pi_handler(XMLParserObject* self, const XML_Char* target_in, const XML_Char* data_in) { - PyObject* parcel; + PyObject* pi_target = NULL; + PyObject* data; PyObject* res; + PyObject* stack[2]; if (PyErr_Occurred()) return; if (TreeBuilder_CheckExact(self->target)) { - /* shortcut: TreeBuilder does not handle PIs */ + /* shortcut */ TreeBuilderObject *target = (TreeBuilderObject*) self->target; if (target->events_append && target->pi_event_obj) { - parcel = Py_BuildValue("ss", target_in, data_in); - if (!parcel) - return; - treebuilder_append_event(target, target->pi_event_obj, parcel); - Py_DECREF(parcel); + pi_target = PyUnicode_DecodeUTF8(target_in, strlen(target_in), "strict"); + if (!pi_target) + goto error; + data = PyUnicode_DecodeUTF8(data_in, strlen(data_in), "strict"); + if (!data) + goto error; + res = treebuilder_handle_pi(target, pi_target, data); + Py_XDECREF(res); + Py_DECREF(data); + Py_DECREF(pi_target); } } else if (self->handle_pi) { - parcel = Py_BuildValue("ss", target_in, data_in); - if (!parcel) - return; - res = PyObject_Call(self->handle_pi, parcel, NULL); + pi_target = PyUnicode_DecodeUTF8(target_in, strlen(target_in), "strict"); + if (!pi_target) + goto error; + data = PyUnicode_DecodeUTF8(data_in, strlen(data_in), "strict"); + if (!data) + goto error; + + stack[0] = pi_target; + stack[1] = data; + res = _PyObject_FastCall(self->handle_pi, stack, 2); Py_XDECREF(res); - Py_DECREF(parcel); + Py_DECREF(data); + Py_DECREF(pi_target); } + + return; + + error: + Py_XDECREF(pi_target); + return; } /* -------------------------------------------------------------------- */ @@ -4156,6 +4255,7 @@ static PyTypeObject XMLParser_Type = { static PyMethodDef _functions[] = { {"SubElement", (PyCFunction)(void(*)(void)) subelement, METH_VARARGS | METH_KEYWORDS}, + _ELEMENTTREE__SET_FACTORIES_METHODDEF {NULL, NULL} }; diff --git a/Modules/clinic/_elementtree.c.h b/Modules/clinic/_elementtree.c.h index b1c5f8e25d205f..0f55480140b315 100644 --- a/Modules/clinic/_elementtree.c.h +++ b/Modules/clinic/_elementtree.c.h @@ -637,23 +637,26 @@ static int _elementtree_TreeBuilder___init___impl(TreeBuilderObject *self, PyObject *element_factory, PyObject *comment_factory, - PyObject *pi_factory); + PyObject *pi_factory, + int insert_comments, int insert_pis); static int _elementtree_TreeBuilder___init__(PyObject *self, PyObject *args, PyObject *kwargs) { int return_value = -1; - static const char * const _keywords[] = {"element_factory", "comment_factory", "pi_factory", NULL}; + static const char * const _keywords[] = {"element_factory", "comment_factory", "pi_factory", "insert_comments", "insert_pis", NULL}; static _PyArg_Parser _parser = {NULL, _keywords, "TreeBuilder", 0}; - PyObject *argsbuf[3]; + PyObject *argsbuf[5]; PyObject * const *fastargs; Py_ssize_t nargs = PyTuple_GET_SIZE(args); Py_ssize_t noptargs = nargs + (kwargs ? PyDict_GET_SIZE(kwargs) : 0) - 0; PyObject *element_factory = NULL; PyObject *comment_factory = NULL; PyObject *pi_factory = NULL; + int insert_comments = 0; + int insert_pis = 0; - fastargs = _PyArg_UnpackKeywords(_PyTuple_CAST(args)->ob_item, nargs, kwargs, NULL, &_parser, 0, 3, 0, argsbuf); + fastargs = _PyArg_UnpackKeywords(_PyTuple_CAST(args)->ob_item, nargs, kwargs, NULL, &_parser, 0, 1, 0, argsbuf); if (!fastargs) { goto exit; } @@ -666,15 +669,70 @@ _elementtree_TreeBuilder___init__(PyObject *self, PyObject *args, PyObject *kwar goto skip_optional_pos; } } +skip_optional_pos: + if (!noptargs) { + goto skip_optional_kwonly; + } if (fastargs[1]) { comment_factory = fastargs[1]; if (!--noptargs) { - goto skip_optional_pos; + goto skip_optional_kwonly; } } - pi_factory = fastargs[2]; -skip_optional_pos: - return_value = _elementtree_TreeBuilder___init___impl((TreeBuilderObject *)self, element_factory, comment_factory, pi_factory); + if (fastargs[2]) { + pi_factory = fastargs[2]; + if (!--noptargs) { + goto skip_optional_kwonly; + } + } + if (fastargs[3]) { + insert_comments = PyObject_IsTrue(fastargs[3]); + if (insert_comments < 0) { + goto exit; + } + if (!--noptargs) { + goto skip_optional_kwonly; + } + } + insert_pis = PyObject_IsTrue(fastargs[4]); + if (insert_pis < 0) { + goto exit; + } +skip_optional_kwonly: + return_value = _elementtree_TreeBuilder___init___impl((TreeBuilderObject *)self, element_factory, comment_factory, pi_factory, insert_comments, insert_pis); + +exit: + return return_value; +} + +PyDoc_STRVAR(_elementtree__set_factories__doc__, +"_set_factories($module, comment_factory, pi_factory, /)\n" +"--\n" +"\n" +"Change the factories used to create comments and processing instructions.\n" +"\n" +"For internal use only."); + +#define _ELEMENTTREE__SET_FACTORIES_METHODDEF \ + {"_set_factories", (PyCFunction)(void(*)(void))_elementtree__set_factories, METH_FASTCALL, _elementtree__set_factories__doc__}, + +static PyObject * +_elementtree__set_factories_impl(PyObject *module, PyObject *comment_factory, + PyObject *pi_factory); + +static PyObject * +_elementtree__set_factories(PyObject *module, PyObject *const *args, Py_ssize_t nargs) +{ + PyObject *return_value = NULL; + PyObject *comment_factory; + PyObject *pi_factory; + + if (!_PyArg_CheckPositional("_set_factories", nargs, 2, 2)) { + goto exit; + } + comment_factory = args[0]; + pi_factory = args[1]; + return_value = _elementtree__set_factories_impl(module, comment_factory, pi_factory); exit: return return_value; @@ -911,4 +969,4 @@ _elementtree_XMLParser__setevents(XMLParserObject *self, PyObject *const *args, exit: return return_value; } -/*[clinic end generated code: output=94ec504fdbcea1d3 input=a9049054013a1b77]*/ +/*[clinic end generated code: output=386a68425d072b5c input=a9049054013a1b77]*/ From aa52e04255765c47cce2abd4c3b9845861a24eb5 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Sat, 20 Apr 2019 13:18:29 +0200 Subject: [PATCH 03/22] bpo-36676: Implement namespace prefix aware parsing support for the XMLParser target in ElementTree. --- Doc/library/xml.etree.elementtree.rst | 12 ++ Lib/test/test_xml_etree.py | 71 +++++++++- Lib/xml/etree/ElementTree.py | 30 +++- .../2019-04-20-13-10-34.bpo-36676.XF4Egb.rst | 3 + Modules/_elementtree.c | 133 +++++++++++++++--- 5 files changed, 221 insertions(+), 28 deletions(-) create mode 100644 Misc/NEWS.d/next/Library/2019-04-20-13-10-34.bpo-36676.XF4Egb.rst diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index 1e4134aa1e4ad3..413fe7485cfc7e 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -1169,6 +1169,18 @@ XMLParser Objects >>> parser.close() 4 + Additionally, if the target object provides one or both of the methods + ``start_ns(self, prefix, uri)`` and ``end_ns(self, prefix)``, then they + are called whenever the parser encounters a new namespace declaration. + The ``prefix`` is ``''`` for the default namespace and the declared + namespace prefix otherwise. The ``start_ns()`` method is called before + the ``start()`` callback of the opening tag that defines the namespace, + and the ``end_ns()`` method is called after the corresponding ``end()`` + callback. + + .. versionchanged:: 3.8 + The ``start_ns()`` and ``end_ns()`` callbacks were added. + .. _elementtree-xmlpullparser-objects: diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index 94a22882cb8343..29aee69ed47757 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -18,7 +18,7 @@ import warnings import weakref -from itertools import product +from itertools import product, islice from test import support from test.support import TESTFN, findfile, import_fresh_module, gc_collect, swap_attr @@ -693,12 +693,17 @@ def pi(self, target, data): self.append(("pi", target, data)) def comment(self, data): self.append(("comment", data)) + def start_ns(self, prefix, uri): + self.append(("start-ns", prefix, uri)) + def end_ns(self, prefix): + self.append(("end-ns", prefix)) builder = Builder() parser = ET.XMLParser(target=builder) parser.feed(data) self.assertEqual(builder, [ ('pi', 'pi', 'data'), ('comment', ' comment '), + ('start-ns', '', 'namespace'), ('start', '{namespace}root'), ('start', '{namespace}element'), ('end', '{namespace}element'), @@ -707,6 +712,7 @@ def comment(self, data): ('start', '{namespace}empty-element'), ('end', '{namespace}empty-element'), ('end', '{namespace}root'), + ('end-ns', ''), ]) @@ -1193,14 +1199,19 @@ def _feed(self, parser, data, chunk_size=None): for i in range(0, len(data), chunk_size): parser.feed(data[i:i+chunk_size]) - def assert_events(self, parser, expected): + def assert_events(self, parser, expected, max_events=None): self.assertEqual( [(event, (elem.tag, elem.text)) - for event, elem in parser.read_events()], + for event, elem in islice(parser.read_events(), max_events)], expected) - def assert_event_tags(self, parser, expected): - events = parser.read_events() + def assert_event_tuples(self, parser, expected, max_events=None): + self.assertEqual( + list(islice(parser.read_events(), max_events)), + expected) + + def assert_event_tags(self, parser, expected, max_events=None): + events = islice(parser.read_events(), max_events) self.assertEqual([(action, elem.tag) for action, elem in events], expected) @@ -1275,6 +1286,56 @@ def test_ns_events(self): self.assertEqual(list(parser.read_events()), [('end-ns', None)]) self.assertIsNone(parser.close()) + def test_ns_events_start(self): + parser = ET.XMLPullParser(events=('start-ns', 'start', 'end')) + self._feed(parser, "\n") + self.assert_event_tuples(parser, [ + ('start-ns', ('', 'abc')), + ('start-ns', ('p', 'xyz')), + ], max_events=2) + self.assert_event_tags(parser, [ + ('start', '{abc}tag'), + ], max_events=1) + + self._feed(parser, "\n") + self.assert_event_tags(parser, [ + ('start', '{abc}child'), + ('end', '{abc}child'), + ]) + + self._feed(parser, "\n") + parser.close() + self.assert_event_tags(parser, [ + ('end', '{abc}tag'), + ]) + + def test_ns_events_start_end(self): + parser = ET.XMLPullParser(events=('start-ns', 'start', 'end', 'end-ns')) + self._feed(parser, "\n") + self.assert_event_tuples(parser, [ + ('start-ns', ('', 'abc')), + ('start-ns', ('p', 'xyz')), + ], max_events=2) + self.assert_event_tags(parser, [ + ('start', '{abc}tag'), + ], max_events=1) + + self._feed(parser, "\n") + self.assert_event_tags(parser, [ + ('start', '{abc}child'), + ('end', '{abc}child'), + ]) + + self._feed(parser, "\n") + parser.close() + self.assert_event_tags(parser, [ + ('end', '{abc}tag'), + ], max_events=1) + self.assert_event_tuples(parser, [ + ('end-ns', None), + ('end-ns', None), + ]) + def test_events(self): parser = ET.XMLPullParser(events=()) self._feed(parser, "\n") diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index c6400480f5b4b4..5b26ac72fd1aae 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1518,6 +1518,10 @@ def __init__(self, *, target=None, encoding=None): parser.StartElementHandler = self._start if hasattr(target, 'end'): parser.EndElementHandler = self._end + if hasattr(target, 'start_ns'): + parser.StartNamespaceDeclHandler = self._start_ns + if hasattr(target, 'end_ns'): + parser.EndNamespaceDeclHandler = self._end_ns if hasattr(target, 'data'): parser.CharacterDataHandler = target.data # miscellaneous callbacks @@ -1559,12 +1563,24 @@ def handler(tag, event=event_name, append=append, append((event, end(tag))) parser.EndElementHandler = handler elif event_name == "start-ns": - def handler(prefix, uri, event=event_name, append=append): - append((event, (prefix or "", uri or ""))) + # TreeBuilder does not implement .start_ns() + if hasattr(self.target, "start_ns"): + def handler(prefix, uri, event=event_name, append=append, + start_ns=self._start_ns): + append((event, start_ns(prefix, uri))) + else: + def handler(prefix, uri, event=event_name, append=append): + append((event, (prefix or '', uri or ''))) parser.StartNamespaceDeclHandler = handler elif event_name == "end-ns": - def handler(prefix, event=event_name, append=append): - append((event, None)) + # TreeBuilder does not implement .end_ns() + if hasattr(self.target, "end_ns"): + def handler(prefix, event=event_name, append=append, + end_ns=self._end_ns): + append((event, end_ns(prefix))) + else: + def handler(prefix, event=event_name, append=append): + append((event, None)) parser.EndNamespaceDeclHandler = handler elif event_name == 'comment': def handler(text, event=event_name, append=append, self=self): @@ -1595,6 +1611,12 @@ def _fixname(self, key): self._names[key] = name return name + def _start_ns(self, prefix, uri): + return self.target.start_ns(prefix or '', uri or '') + + def _end_ns(self, prefix): + return self.target.end_ns(prefix or '') + def _start(self, tag, attr_list): # Handler for expat's StartElementHandler. Since ordered_attributes # is set, the attributes are reported as a list of alternating diff --git a/Misc/NEWS.d/next/Library/2019-04-20-13-10-34.bpo-36676.XF4Egb.rst b/Misc/NEWS.d/next/Library/2019-04-20-13-10-34.bpo-36676.XF4Egb.rst new file mode 100644 index 00000000000000..e0bede81eec108 --- /dev/null +++ b/Misc/NEWS.d/next/Library/2019-04-20-13-10-34.bpo-36676.XF4Egb.rst @@ -0,0 +1,3 @@ +The XMLParser() in xml.etree.ElementTree provides namespace prefix context to the +parser target if it defines the callback methods "start_ns()" and/or "end_ns()". +Patch by Stefan Behnel. diff --git a/Modules/_elementtree.c b/Modules/_elementtree.c index 5481c61678712b..50d0f20571bcea 100644 --- a/Modules/_elementtree.c +++ b/Modules/_elementtree.c @@ -2911,6 +2911,39 @@ treebuilder_handle_pi(TreeBuilderObject* self, PyObject* target, PyObject* text) return NULL; } +LOCAL(PyObject*) +treebuilder_handle_start_ns(TreeBuilderObject* self, PyObject* prefix, PyObject* uri) +{ + PyObject* parcel; + + if (self->events_append && self->start_ns_event_obj) { + parcel = PyTuple_Pack(2, prefix, uri); + if (!parcel) { + return NULL; + } + + if (treebuilder_append_event(self, self->start_ns_event_obj, parcel) < 0) { + Py_DECREF(parcel); + return NULL; + } + Py_DECREF(parcel); + } + + Py_RETURN_NONE; +} + +LOCAL(PyObject*) +treebuilder_handle_end_ns(TreeBuilderObject* self, PyObject* prefix) +{ + if (self->events_append && self->end_ns_event_obj) { + if (treebuilder_append_event(self, self->end_ns_event_obj, prefix) < 0) { + return NULL; + } + } + + Py_RETURN_NONE; +} + /* -------------------------------------------------------------------- */ /* methods (in alphabetical order) */ @@ -3046,6 +3079,8 @@ typedef struct { PyObject *names; + PyObject *handle_start_ns; + PyObject *handle_end_ns; PyObject *handle_start; PyObject *handle_data; PyObject *handle_end; @@ -3357,42 +3392,85 @@ expat_end_handler(XMLParserObject* self, const XML_Char* tag_in) } static void -expat_start_ns_handler(XMLParserObject* self, const XML_Char* prefix, - const XML_Char *uri) +expat_start_ns_handler(XMLParserObject* self, const XML_Char* prefix_in, + const XML_Char *uri_in) { - TreeBuilderObject *target = (TreeBuilderObject*) self->target; - PyObject *parcel; + PyObject* res = NULL; + PyObject* uri; + PyObject* prefix; + PyObject* stack[2]; if (PyErr_Occurred()) return; - if (!target->events_append || !target->start_ns_event_obj) - return; + if (!uri_in) + uri_in = ""; + if (!prefix_in) + prefix_in = ""; + + if (TreeBuilder_CheckExact(self->target)) { + /* shortcut - TreeBuilder does not actually implement .start_ns() */ + TreeBuilderObject *target = (TreeBuilderObject*) self->target; - if (!uri) - uri = ""; - if (!prefix) - prefix = ""; + if (target->events_append && target->start_ns_event_obj) { + prefix = PyUnicode_DecodeUTF8(prefix_in, strlen(prefix_in), "strict"); + if (!prefix) + return; + uri = PyUnicode_DecodeUTF8(uri_in, strlen(uri_in), "strict"); + if (!uri) + return; - parcel = Py_BuildValue("ss", prefix, uri); - if (!parcel) - return; - treebuilder_append_event(target, target->start_ns_event_obj, parcel); - Py_DECREF(parcel); + res = treebuilder_handle_start_ns(target, prefix, uri); + Py_DECREF(uri); + Py_DECREF(prefix); + } + } else if (self->handle_start_ns) { + prefix = PyUnicode_DecodeUTF8(prefix_in, strlen(prefix_in), "strict"); + if (!prefix) + return; + uri = PyUnicode_DecodeUTF8(uri_in, strlen(uri_in), "strict"); + if (!uri) + return; + + stack[0] = prefix; + stack[1] = uri; + res = _PyObject_FastCall(self->handle_start_ns, stack, 2); + Py_DECREF(uri); + Py_DECREF(prefix); + } + + Py_XDECREF(res); } static void expat_end_ns_handler(XMLParserObject* self, const XML_Char* prefix_in) { - TreeBuilderObject *target = (TreeBuilderObject*) self->target; + PyObject *res = NULL; + PyObject* prefix; if (PyErr_Occurred()) return; - if (!target->events_append) - return; + if (!prefix_in) + prefix_in = ""; - treebuilder_append_event(target, target->end_ns_event_obj, Py_None); + if (TreeBuilder_CheckExact(self->target)) { + /* shortcut - TreeBuilder does not actually implement .end_ns() */ + TreeBuilderObject *target = (TreeBuilderObject*) self->target; + + if (target->events_append && target->end_ns_event_obj) { + res = treebuilder_handle_end_ns(target, Py_None); + } + } else if (self->handle_end_ns) { + prefix = PyUnicode_DecodeUTF8(prefix_in, strlen(prefix_in), "strict"); + if (!prefix) + return; + + res = _PyObject_FastCall(self->handle_end_ns, &prefix, 1); + Py_DECREF(prefix); + } + + Py_XDECREF(res); } static void @@ -3546,6 +3624,7 @@ xmlparser_new(PyTypeObject *type, PyObject *args, PyObject *kwds) if (self) { self->parser = NULL; self->target = self->entity = self->names = NULL; + self->handle_start_ns = self->handle_end_ns = NULL; self->handle_start = self->handle_data = self->handle_end = NULL; self->handle_comment = self->handle_pi = self->handle_close = NULL; self->handle_doctype = NULL; @@ -3614,6 +3693,14 @@ _elementtree_XMLParser___init___impl(XMLParserObject *self, PyObject *target, } self->target = target; + self->handle_start_ns = PyObject_GetAttrString(target, "start_ns"); + if (ignore_attribute_error(self->handle_start_ns)) { + return -1; + } + self->handle_end_ns = PyObject_GetAttrString(target, "end_ns"); + if (ignore_attribute_error(self->handle_end_ns)) { + return -1; + } self->handle_start = PyObject_GetAttrString(target, "start"); if (ignore_attribute_error(self->handle_start)) { return -1; @@ -3645,6 +3732,12 @@ _elementtree_XMLParser___init___impl(XMLParserObject *self, PyObject *target, /* configure parser */ EXPAT(SetUserData)(self->parser, self); + if (self->handle_start_ns || self->handle_end_ns) + EXPAT(SetNamespaceDeclHandler)( + self->parser, + (XML_StartNamespaceDeclHandler) expat_start_ns_handler, + (XML_EndNamespaceDeclHandler) expat_end_ns_handler + ); EXPAT(SetElementHandler)( self->parser, (XML_StartElementHandler) expat_start_handler, @@ -3689,6 +3782,7 @@ xmlparser_gc_traverse(XMLParserObject *self, visitproc visit, void *arg) Py_VISIT(self->handle_end); Py_VISIT(self->handle_data); Py_VISIT(self->handle_start); + Py_VISIT(self->handle_start_ns); Py_VISIT(self->target); Py_VISIT(self->entity); @@ -3712,6 +3806,7 @@ xmlparser_gc_clear(XMLParserObject *self) Py_CLEAR(self->handle_end); Py_CLEAR(self->handle_data); Py_CLEAR(self->handle_start); + Py_CLEAR(self->handle_start_ns); Py_CLEAR(self->handle_doctype); Py_CLEAR(self->target); From 3b332644b22152865018fc91971fa3bc17373602 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Mon, 22 Apr 2019 08:27:03 +0200 Subject: [PATCH 04/22] bpo-36676: Add test to see if a target only with an "end_ns()" callback receives the calls in the right order. --- Lib/test/test_xml_etree.py | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index 29aee69ed47757..0b03d077448838 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -13,6 +13,7 @@ import operator import pickle import sys +import textwrap import types import unittest import warnings @@ -715,6 +716,27 @@ def end_ns(self, prefix): ('end-ns', ''), ]) + def test_custom_builder_only_end_ns(self): + class Builder(list): + def end_ns(self, prefix): + self.append(("end-ns", prefix)) + + builder = Builder() + parser = ET.XMLParser(target=builder) + parser.feed(textwrap.dedent("""\ + + + + text + texttail + + + """)) + self.assertEqual(builder, [ + ('end-ns', 'a'), + ('end-ns', 'p'), + ('end-ns', ''), + ]) # Element.getchildren() and ElementTree.getiterator() are deprecated. @checkwarnings(("This method will be removed in future versions. " From 7f0ed4841be3352ab02a80ba2ad5235248ac04a5 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Fri, 26 Apr 2019 10:33:05 +0200 Subject: [PATCH 05/22] Implement C14N 2.0 as a new canonicalize() function in ElementTree. Missing features: - prefix renaming in XPath expressions (tag and attribute text is supported) - preservation of original prefixes given redundant namespace declarations --- Lib/test/test_xml_etree.py | 180 +++++++++-- Lib/test/xmltestdata/c14n-20/c14nComment.xml | 4 + Lib/test/xmltestdata/c14n-20/c14nDefault.xml | 3 + Lib/test/xmltestdata/c14n-20/c14nPrefix.xml | 4 + .../xmltestdata/c14n-20/c14nPrefixQname.xml | 7 + .../c14n-20/c14nPrefixQnameXpathElem.xml | 8 + Lib/test/xmltestdata/c14n-20/c14nQname.xml | 6 + .../xmltestdata/c14n-20/c14nQnameElem.xml | 6 + .../c14n-20/c14nQnameXpathElem.xml | 7 + Lib/test/xmltestdata/c14n-20/c14nTrim.xml | 4 + Lib/test/xmltestdata/c14n-20/doc.dtd | 6 + Lib/test/xmltestdata/c14n-20/doc.xsl | 5 + Lib/test/xmltestdata/c14n-20/inC14N1.xml | 14 + Lib/test/xmltestdata/c14n-20/inC14N2.xml | 11 + Lib/test/xmltestdata/c14n-20/inC14N3.xml | 18 ++ Lib/test/xmltestdata/c14n-20/inC14N4.xml | 13 + Lib/test/xmltestdata/c14n-20/inC14N5.xml | 12 + Lib/test/xmltestdata/c14n-20/inC14N6.xml | 2 + Lib/test/xmltestdata/c14n-20/inNsContent.xml | 4 + Lib/test/xmltestdata/c14n-20/inNsDefault.xml | 3 + Lib/test/xmltestdata/c14n-20/inNsPushdown.xml | 6 + Lib/test/xmltestdata/c14n-20/inNsRedecl.xml | 3 + Lib/test/xmltestdata/c14n-20/inNsSort.xml | 4 + .../xmltestdata/c14n-20/inNsSuperfluous.xml | 4 + Lib/test/xmltestdata/c14n-20/inNsXml.xml | 3 + .../c14n-20/out_inC14N1_c14nComment.xml | 6 + .../c14n-20/out_inC14N1_c14nDefault.xml | 4 + .../c14n-20/out_inC14N2_c14nDefault.xml | 11 + .../c14n-20/out_inC14N2_c14nTrim.xml | 1 + .../c14n-20/out_inC14N3_c14nDefault.xml | 14 + .../c14n-20/out_inC14N3_c14nPrefix.xml | 14 + .../c14n-20/out_inC14N3_c14nTrim.xml | 1 + .../c14n-20/out_inC14N4_c14nDefault.xml | 10 + .../c14n-20/out_inC14N4_c14nTrim.xml | 2 + .../c14n-20/out_inC14N5_c14nDefault.xml | 3 + .../c14n-20/out_inC14N5_c14nTrim.xml | 1 + .../c14n-20/out_inC14N6_c14nDefault.xml | 1 + .../c14n-20/out_inNsContent_c14nDefault.xml | 4 + ...t_inNsContent_c14nPrefixQnameXpathElem.xml | 4 + .../c14n-20/out_inNsContent_c14nQnameElem.xml | 4 + .../out_inNsContent_c14nQnameXpathElem.xml | 4 + .../c14n-20/out_inNsDefault_c14nDefault.xml | 3 + .../c14n-20/out_inNsDefault_c14nPrefix.xml | 3 + .../c14n-20/out_inNsPushdown_c14nDefault.xml | 6 + .../c14n-20/out_inNsPushdown_c14nPrefix.xml | 6 + .../c14n-20/out_inNsRedecl_c14nDefault.xml | 3 + .../c14n-20/out_inNsRedecl_c14nPrefix.xml | 3 + .../c14n-20/out_inNsSort_c14nDefault.xml | 4 + .../c14n-20/out_inNsSort_c14nPrefix.xml | 4 + .../out_inNsSuperfluous_c14nDefault.xml | 4 + .../out_inNsSuperfluous_c14nPrefix.xml | 4 + .../c14n-20/out_inNsXml_c14nDefault.xml | 3 + .../c14n-20/out_inNsXml_c14nPrefix.xml | 3 + .../c14n-20/out_inNsXml_c14nPrefixQname.xml | 3 + .../c14n-20/out_inNsXml_c14nQname.xml | 3 + Lib/test/xmltestdata/c14n-20/world.txt | 1 + Lib/xml/etree/ElementTree.py | 292 ++++++++++++++++++ 57 files changed, 744 insertions(+), 22 deletions(-) create mode 100644 Lib/test/xmltestdata/c14n-20/c14nComment.xml create mode 100644 Lib/test/xmltestdata/c14n-20/c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/c14nPrefix.xml create mode 100644 Lib/test/xmltestdata/c14n-20/c14nPrefixQname.xml create mode 100644 Lib/test/xmltestdata/c14n-20/c14nPrefixQnameXpathElem.xml create mode 100644 Lib/test/xmltestdata/c14n-20/c14nQname.xml create mode 100644 Lib/test/xmltestdata/c14n-20/c14nQnameElem.xml create mode 100644 Lib/test/xmltestdata/c14n-20/c14nQnameXpathElem.xml create mode 100644 Lib/test/xmltestdata/c14n-20/c14nTrim.xml create mode 100644 Lib/test/xmltestdata/c14n-20/doc.dtd create mode 100644 Lib/test/xmltestdata/c14n-20/doc.xsl create mode 100644 Lib/test/xmltestdata/c14n-20/inC14N1.xml create mode 100644 Lib/test/xmltestdata/c14n-20/inC14N2.xml create mode 100644 Lib/test/xmltestdata/c14n-20/inC14N3.xml create mode 100644 Lib/test/xmltestdata/c14n-20/inC14N4.xml create mode 100644 Lib/test/xmltestdata/c14n-20/inC14N5.xml create mode 100644 Lib/test/xmltestdata/c14n-20/inC14N6.xml create mode 100644 Lib/test/xmltestdata/c14n-20/inNsContent.xml create mode 100644 Lib/test/xmltestdata/c14n-20/inNsDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/inNsPushdown.xml create mode 100644 Lib/test/xmltestdata/c14n-20/inNsRedecl.xml create mode 100644 Lib/test/xmltestdata/c14n-20/inNsSort.xml create mode 100644 Lib/test/xmltestdata/c14n-20/inNsSuperfluous.xml create mode 100644 Lib/test/xmltestdata/c14n-20/inNsXml.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inC14N1_c14nComment.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inC14N1_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inC14N2_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inC14N2_c14nTrim.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inC14N3_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inC14N3_c14nPrefix.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inC14N3_c14nTrim.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inC14N4_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inC14N4_c14nTrim.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inC14N5_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inC14N5_c14nTrim.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inC14N6_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nPrefixQnameXpathElem.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nQnameElem.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nQnameXpathElem.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsDefault_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsDefault_c14nPrefix.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsPushdown_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsPushdown_c14nPrefix.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsRedecl_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsRedecl_c14nPrefix.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsSort_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsSort_c14nPrefix.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsSuperfluous_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsSuperfluous_c14nPrefix.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nDefault.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nPrefix.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nPrefixQname.xml create mode 100644 Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nQname.xml create mode 100644 Lib/test/xmltestdata/c14n-20/world.txt diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index 0b03d077448838..b20c3269ad3259 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -11,14 +11,15 @@ import io import locale import operator +import os import pickle import sys -import textwrap import types import unittest import warnings import weakref +from functools import partial from itertools import product, islice from test import support from test.support import TESTFN, findfile, import_fresh_module, gc_collect, swap_attr @@ -716,27 +717,6 @@ def end_ns(self, prefix): ('end-ns', ''), ]) - def test_custom_builder_only_end_ns(self): - class Builder(list): - def end_ns(self, prefix): - self.append(("end-ns", prefix)) - - builder = Builder() - parser = ET.XMLParser(target=builder) - parser.feed(textwrap.dedent("""\ - - - - text - texttail - - - """)) - self.assertEqual(builder, [ - ('end-ns', 'a'), - ('end-ns', 'p'), - ('end-ns', ''), - ]) # Element.getchildren() and ElementTree.getiterator() are deprecated. @checkwarnings(("This method will be removed in future versions. " @@ -3444,6 +3424,160 @@ def test_correct_import_pyET(self): self.assertIsInstance(pyET.Element.__init__, types.FunctionType) self.assertIsInstance(pyET.XMLParser.__init__, types.FunctionType) + +# -------------------------------------------------------------------- + +def c14n_roundtrip(xml, **options): + f = io.StringIO() + pyET.canonicalize(f.write, xml, **options) + return f.getvalue() + + +class C14NTest(unittest.TestCase): + maxDiff = None + + # + # simple roundtrip tests (from c14n.py) + + def test_simple_roundtrip(self): + # Basics + self.assertEqual(c14n_roundtrip(""), '') + self.assertEqual(c14n_roundtrip(""), # FIXME + '') + self.assertEqual(c14n_roundtrip(""), + '') + self.assertEqual(c14n_roundtrip(""), + '') + self.assertEqual(c14n_roundtrip(""), + '') + + # C14N spec + self.assertEqual(c14n_roundtrip("Hello, world!"), + 'Hello, world!') + self.assertEqual(c14n_roundtrip("2"), + '2') + self.assertEqual(c14n_roundtrip('"0" && value<"10" ?"valid":"error"]]>'), + 'value>"0" && value<"10" ?"valid":"error"') + self.assertEqual(c14n_roundtrip('''valid'''), + 'valid') + self.assertEqual(c14n_roundtrip(""), + '') + self.assertEqual(c14n_roundtrip(""), + '') + self.assertEqual(c14n_roundtrip(""), + '') + + # fragments from PJ's tests + #self.assertEqual(c14n_roundtrip(""), + #'') + + # + # basic method=c14n tests from the c14n 2.0 specification. uses + # test files under xmltestdata/c14n-20. + + # note that this uses generated C14N versions of the standard ET.write + # output, not roundtripped C14N (see above). + + def test_xml_c14n2(self): + datadir = findfile("c14n-20", subdir="xmltestdata") + full_path = partial(os.path.join, datadir) + + files = [filename[:-4] for filename in sorted(os.listdir(datadir)) + if filename.endswith('.xml')] + input_files = [ + filename for filename in files + if filename.startswith('in') + ] + configs = { + filename: { + # sequential + option.tag.split('}')[-1]: ((option.text or '').strip(), option) + for option in ET.parse(full_path(filename) + ".xml").getroot() + } + for filename in files + if filename.startswith('c14n') + } + + tests = { + input_file: [ + (filename, configs[filename.rsplit('_', 1)[-1]]) + for filename in files + if filename.startswith(f'out_{input_file}_') + and filename.rsplit('_', 1)[-1] in configs + ] + for input_file in input_files + } + + # Make sure we found all test cases. + self.assertEqual(30, len([ + output_file for output_files in tests.values() + for output_file in output_files])) + + def get_option(config, option_name, default=None): + return config.get(option_name, (default, ()))[0] + + for input_file, output_files in tests.items(): + for output_file, config in output_files: + keep_comments = get_option( + config, 'IgnoreComments') == 'true' # no, it's right :) + strip_text = get_option( + config, 'TrimTextNodes') == 'true' + rewrite_prefixes = get_option( + config, 'PrefixRewrite') == 'sequential' + if 'QNameAware' in config: + qattrs = [ + f"{{{el.get('NS')}}}{el.get('Name')}" + for el in config['QNameAware'][1].findall( + '{http://www.w3.org/2010/xml-c14n2}QualifiedAttr') + ] + qtags = [ + f"{{{el.get('NS')}}}{el.get('Name')}" + for el in config['QNameAware'][1].findall( + '{http://www.w3.org/2010/xml-c14n2}Element') + ] + else: + qtags = qattrs = None + + # Build subtest description from config. + config_descr = ','.join( + f"{name}={value or ','.join(c.tag.split('}')[-1] for c in children)}" + for name, (value, children) in sorted(config.items()) + ) + + with self.subTest(f"{output_file}({config_descr})"): + if input_file == 'inNsRedecl' and not rewrite_prefixes: + self.skipTest( + f"Redeclared namespace handling is not supported in {output_file}") + if input_file == 'inNsSuperfluous' and not rewrite_prefixes: + self.skipTest( + f"Redeclared namespace handling is not supported in {output_file}") + if 'QNameAware' in config and config['QNameAware'][1].find( + '{http://www.w3.org/2010/xml-c14n2}XPathElement') is not None: + self.skipTest( + f"QName rewriting in XPath text is not supported in {output_file}") + + out = io.StringIO() + with open(full_path(input_file + ".xml"), 'r', encoding='utf8') as f: + if input_file == 'inC14N5': + # Hack: avoid setting up external entity resolution in the parser. + with open(full_path('world.txt'), 'r', encoding='utf8') as entity_file: + f = io.StringIO(f.read().replace('&ent2;', entity_file.read())) + + ET.canonicalize( + out.write, file=f, + comments=keep_comments, + strip_text=strip_text, + rewrite_prefixes=rewrite_prefixes, + qname_aware_tags=qtags, qname_aware_attrs=qattrs) + text = out.getvalue() + with open(full_path(output_file + ".xml"), 'r', encoding='utf8') as f: + expected = f.read() + if input_file == 'inC14N3': + # FIXME: cET resolves default attributes but ET does not! + expected = expected.replace(' attr="default"', '') + text = text.replace(' attr="default"', '') + self.assertEqual(expected, text) + # -------------------------------------------------------------------- @@ -3476,6 +3610,8 @@ def test_main(module=None): XMLParserTest, XMLPullParserTest, BugsTest, + KeywordArgsTest, + C14NTest, ] # These tests will only run for the pure-Python version that doesn't import diff --git a/Lib/test/xmltestdata/c14n-20/c14nComment.xml b/Lib/test/xmltestdata/c14n-20/c14nComment.xml new file mode 100644 index 00000000000000..e95aa302d04fdb --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/c14nComment.xml @@ -0,0 +1,4 @@ + + true + + diff --git a/Lib/test/xmltestdata/c14n-20/c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/c14nDefault.xml new file mode 100644 index 00000000000000..c1364142cc59bf --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/c14nDefault.xml @@ -0,0 +1,3 @@ + + + diff --git a/Lib/test/xmltestdata/c14n-20/c14nPrefix.xml b/Lib/test/xmltestdata/c14n-20/c14nPrefix.xml new file mode 100644 index 00000000000000..fb233b42b1334f --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/c14nPrefix.xml @@ -0,0 +1,4 @@ + + sequential + + diff --git a/Lib/test/xmltestdata/c14n-20/c14nPrefixQname.xml b/Lib/test/xmltestdata/c14n-20/c14nPrefixQname.xml new file mode 100644 index 00000000000000..23188eedbc2451 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/c14nPrefixQname.xml @@ -0,0 +1,7 @@ + + sequential + + + + + diff --git a/Lib/test/xmltestdata/c14n-20/c14nPrefixQnameXpathElem.xml b/Lib/test/xmltestdata/c14n-20/c14nPrefixQnameXpathElem.xml new file mode 100644 index 00000000000000..626fc48f410fa0 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/c14nPrefixQnameXpathElem.xml @@ -0,0 +1,8 @@ + + sequential + + + + + + diff --git a/Lib/test/xmltestdata/c14n-20/c14nQname.xml b/Lib/test/xmltestdata/c14n-20/c14nQname.xml new file mode 100644 index 00000000000000..919e5903f5ce6e --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/c14nQname.xml @@ -0,0 +1,6 @@ + + + + + + diff --git a/Lib/test/xmltestdata/c14n-20/c14nQnameElem.xml b/Lib/test/xmltestdata/c14n-20/c14nQnameElem.xml new file mode 100644 index 00000000000000..0321f8061952e6 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/c14nQnameElem.xml @@ -0,0 +1,6 @@ + + + + + + diff --git a/Lib/test/xmltestdata/c14n-20/c14nQnameXpathElem.xml b/Lib/test/xmltestdata/c14n-20/c14nQnameXpathElem.xml new file mode 100644 index 00000000000000..c4890bc8b01d5e --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/c14nQnameXpathElem.xml @@ -0,0 +1,7 @@ + + + + + + + diff --git a/Lib/test/xmltestdata/c14n-20/c14nTrim.xml b/Lib/test/xmltestdata/c14n-20/c14nTrim.xml new file mode 100644 index 00000000000000..ccb9cf65db7235 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/c14nTrim.xml @@ -0,0 +1,4 @@ + + true + + diff --git a/Lib/test/xmltestdata/c14n-20/doc.dtd b/Lib/test/xmltestdata/c14n-20/doc.dtd new file mode 100644 index 00000000000000..5c5d544a0df845 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/doc.dtd @@ -0,0 +1,6 @@ + + + + + + diff --git a/Lib/test/xmltestdata/c14n-20/doc.xsl b/Lib/test/xmltestdata/c14n-20/doc.xsl new file mode 100644 index 00000000000000..a3f2348cc2f2b3 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/doc.xsl @@ -0,0 +1,5 @@ + + + diff --git a/Lib/test/xmltestdata/c14n-20/inC14N1.xml b/Lib/test/xmltestdata/c14n-20/inC14N1.xml new file mode 100644 index 00000000000000..ed450c7341d382 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inC14N1.xml @@ -0,0 +1,14 @@ + + + + + + +Hello, world! + + + + + + diff --git a/Lib/test/xmltestdata/c14n-20/inC14N2.xml b/Lib/test/xmltestdata/c14n-20/inC14N2.xml new file mode 100644 index 00000000000000..74eeea147c3791 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inC14N2.xml @@ -0,0 +1,11 @@ + + + A B + + A + + B + A B + C + + diff --git a/Lib/test/xmltestdata/c14n-20/inC14N3.xml b/Lib/test/xmltestdata/c14n-20/inC14N3.xml new file mode 100644 index 00000000000000..fea78213f1ae69 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inC14N3.xml @@ -0,0 +1,18 @@ +]> + + + + + + + + + + + + + + diff --git a/Lib/test/xmltestdata/c14n-20/inC14N4.xml b/Lib/test/xmltestdata/c14n-20/inC14N4.xml new file mode 100644 index 00000000000000..909a847435b86c --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inC14N4.xml @@ -0,0 +1,13 @@ + + +]> + + First line Second line + 2 + "0" && value<"10" ?"valid":"error"]]> + valid + + + + diff --git a/Lib/test/xmltestdata/c14n-20/inC14N5.xml b/Lib/test/xmltestdata/c14n-20/inC14N5.xml new file mode 100644 index 00000000000000..501161bad5187f --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inC14N5.xml @@ -0,0 +1,12 @@ + + + + + +]> + + &ent1;, &ent2;! + + + diff --git a/Lib/test/xmltestdata/c14n-20/inC14N6.xml b/Lib/test/xmltestdata/c14n-20/inC14N6.xml new file mode 100644 index 00000000000000..31e2071867257c --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inC14N6.xml @@ -0,0 +1,2 @@ + +© diff --git a/Lib/test/xmltestdata/c14n-20/inNsContent.xml b/Lib/test/xmltestdata/c14n-20/inNsContent.xml new file mode 100644 index 00000000000000..b9924660ba6da3 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inNsContent.xml @@ -0,0 +1,4 @@ + + xsd:string + /soap-env:body/child::b:foo[@att1 != "c:val" and @att2 != 'xsd:string'] + diff --git a/Lib/test/xmltestdata/c14n-20/inNsDefault.xml b/Lib/test/xmltestdata/c14n-20/inNsDefault.xml new file mode 100644 index 00000000000000..3e0d323bad27c2 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inNsDefault.xml @@ -0,0 +1,3 @@ + + + diff --git a/Lib/test/xmltestdata/c14n-20/inNsPushdown.xml b/Lib/test/xmltestdata/c14n-20/inNsPushdown.xml new file mode 100644 index 00000000000000..daa67d83f15914 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inNsPushdown.xml @@ -0,0 +1,6 @@ + + + + + + diff --git a/Lib/test/xmltestdata/c14n-20/inNsRedecl.xml b/Lib/test/xmltestdata/c14n-20/inNsRedecl.xml new file mode 100644 index 00000000000000..10bd97beda3baa --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inNsRedecl.xml @@ -0,0 +1,3 @@ + + + diff --git a/Lib/test/xmltestdata/c14n-20/inNsSort.xml b/Lib/test/xmltestdata/c14n-20/inNsSort.xml new file mode 100644 index 00000000000000..8e9fc01c647b24 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inNsSort.xml @@ -0,0 +1,4 @@ + + + + diff --git a/Lib/test/xmltestdata/c14n-20/inNsSuperfluous.xml b/Lib/test/xmltestdata/c14n-20/inNsSuperfluous.xml new file mode 100644 index 00000000000000..f77720f7b0b09d --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inNsSuperfluous.xml @@ -0,0 +1,4 @@ + + + + diff --git a/Lib/test/xmltestdata/c14n-20/inNsXml.xml b/Lib/test/xmltestdata/c14n-20/inNsXml.xml new file mode 100644 index 00000000000000..7520cf3fb9eb28 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/inNsXml.xml @@ -0,0 +1,3 @@ + + data + diff --git a/Lib/test/xmltestdata/c14n-20/out_inC14N1_c14nComment.xml b/Lib/test/xmltestdata/c14n-20/out_inC14N1_c14nComment.xml new file mode 100644 index 00000000000000..d98d16840c6bcc --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inC14N1_c14nComment.xml @@ -0,0 +1,6 @@ + +Hello, world! + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inC14N1_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inC14N1_c14nDefault.xml new file mode 100644 index 00000000000000..af9a9770578e9d --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inC14N1_c14nDefault.xml @@ -0,0 +1,4 @@ + +Hello, world! + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inC14N2_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inC14N2_c14nDefault.xml new file mode 100644 index 00000000000000..2afa15ccb36382 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inC14N2_c14nDefault.xml @@ -0,0 +1,11 @@ + + + A B + + A + + B + A B + C + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inC14N2_c14nTrim.xml b/Lib/test/xmltestdata/c14n-20/out_inC14N2_c14nTrim.xml new file mode 100644 index 00000000000000..7a1dc32946bce3 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inC14N2_c14nTrim.xml @@ -0,0 +1 @@ +A BABA BC \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inC14N3_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inC14N3_c14nDefault.xml new file mode 100644 index 00000000000000..662e108aa8a1e4 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inC14N3_c14nDefault.xml @@ -0,0 +1,14 @@ + + + + + + + + + + + + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inC14N3_c14nPrefix.xml b/Lib/test/xmltestdata/c14n-20/out_inC14N3_c14nPrefix.xml new file mode 100644 index 00000000000000..041e1ec8ebe59a --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inC14N3_c14nPrefix.xml @@ -0,0 +1,14 @@ + + + + + + + + + + + + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inC14N3_c14nTrim.xml b/Lib/test/xmltestdata/c14n-20/out_inC14N3_c14nTrim.xml new file mode 100644 index 00000000000000..4f35ad9662df3b --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inC14N3_c14nTrim.xml @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inC14N4_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inC14N4_c14nDefault.xml new file mode 100644 index 00000000000000..243d0e61f2e94f --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inC14N4_c14nDefault.xml @@ -0,0 +1,10 @@ + + First line +Second line + 2 + value>"0" && value<"10" ?"valid":"error" + valid + + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inC14N4_c14nTrim.xml b/Lib/test/xmltestdata/c14n-20/out_inC14N4_c14nTrim.xml new file mode 100644 index 00000000000000..24d83ba8ab0012 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inC14N4_c14nTrim.xml @@ -0,0 +1,2 @@ +First line +Second line2value>"0" && value<"10" ?"valid":"error"valid \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inC14N5_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inC14N5_c14nDefault.xml new file mode 100644 index 00000000000000..c232e740aee4a7 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inC14N5_c14nDefault.xml @@ -0,0 +1,3 @@ + + Hello, world! + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inC14N5_c14nTrim.xml b/Lib/test/xmltestdata/c14n-20/out_inC14N5_c14nTrim.xml new file mode 100644 index 00000000000000..3fa84b1e986014 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inC14N5_c14nTrim.xml @@ -0,0 +1 @@ +Hello, world! \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inC14N6_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inC14N6_c14nDefault.xml new file mode 100644 index 00000000000000..0be38f98cb1398 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inC14N6_c14nDefault.xml @@ -0,0 +1 @@ +© \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nDefault.xml new file mode 100644 index 00000000000000..62d7e004a44034 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nDefault.xml @@ -0,0 +1,4 @@ + + xsd:string + /soap-env:body/child::b:foo[@att1 != "c:val" and @att2 != 'xsd:string'] + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nPrefixQnameXpathElem.xml b/Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nPrefixQnameXpathElem.xml new file mode 100644 index 00000000000000..20e1c2e9d6dfb4 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nPrefixQnameXpathElem.xml @@ -0,0 +1,4 @@ + + n1:string + /n3:body/child::n2:foo[@att1 != "c:val" and @att2 != 'xsd:string'] + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nQnameElem.xml b/Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nQnameElem.xml new file mode 100644 index 00000000000000..db8680daa033d7 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nQnameElem.xml @@ -0,0 +1,4 @@ + + xsd:string + /soap-env:body/child::b:foo[@att1 != "c:val" and @att2 != 'xsd:string'] + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nQnameXpathElem.xml b/Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nQnameXpathElem.xml new file mode 100644 index 00000000000000..df3b21579fac5e --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsContent_c14nQnameXpathElem.xml @@ -0,0 +1,4 @@ + + xsd:string + /soap-env:body/child::b:foo[@att1 != "c:val" and @att2 != 'xsd:string'] + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsDefault_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inNsDefault_c14nDefault.xml new file mode 100644 index 00000000000000..674b076dd6d9a6 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsDefault_c14nDefault.xml @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsDefault_c14nPrefix.xml b/Lib/test/xmltestdata/c14n-20/out_inNsDefault_c14nPrefix.xml new file mode 100644 index 00000000000000..83edaae91e7423 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsDefault_c14nPrefix.xml @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsPushdown_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inNsPushdown_c14nDefault.xml new file mode 100644 index 00000000000000..fa4f21b5d0af55 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsPushdown_c14nDefault.xml @@ -0,0 +1,6 @@ + + + + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsPushdown_c14nPrefix.xml b/Lib/test/xmltestdata/c14n-20/out_inNsPushdown_c14nPrefix.xml new file mode 100644 index 00000000000000..6d579200c9dc8c --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsPushdown_c14nPrefix.xml @@ -0,0 +1,6 @@ + + + + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsRedecl_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inNsRedecl_c14nDefault.xml new file mode 100644 index 00000000000000..ba37f925103c70 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsRedecl_c14nDefault.xml @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsRedecl_c14nPrefix.xml b/Lib/test/xmltestdata/c14n-20/out_inNsRedecl_c14nPrefix.xml new file mode 100644 index 00000000000000..af3bb2d6f062cd --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsRedecl_c14nPrefix.xml @@ -0,0 +1,3 @@ + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsSort_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inNsSort_c14nDefault.xml new file mode 100644 index 00000000000000..8a92c5c61c2c2c --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsSort_c14nDefault.xml @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsSort_c14nPrefix.xml b/Lib/test/xmltestdata/c14n-20/out_inNsSort_c14nPrefix.xml new file mode 100644 index 00000000000000..8d44c84fe5d307 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsSort_c14nPrefix.xml @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsSuperfluous_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inNsSuperfluous_c14nDefault.xml new file mode 100644 index 00000000000000..6bb862d763d737 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsSuperfluous_c14nDefault.xml @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsSuperfluous_c14nPrefix.xml b/Lib/test/xmltestdata/c14n-20/out_inNsSuperfluous_c14nPrefix.xml new file mode 100644 index 00000000000000..700a16d42a7746 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsSuperfluous_c14nPrefix.xml @@ -0,0 +1,4 @@ + + + + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nDefault.xml b/Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nDefault.xml new file mode 100644 index 00000000000000..1689f3bf423dc5 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nDefault.xml @@ -0,0 +1,3 @@ + + data + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nPrefix.xml b/Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nPrefix.xml new file mode 100644 index 00000000000000..38508a47f6b904 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nPrefix.xml @@ -0,0 +1,3 @@ + + data + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nPrefixQname.xml b/Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nPrefixQname.xml new file mode 100644 index 00000000000000..867980f82bfa59 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nPrefixQname.xml @@ -0,0 +1,3 @@ + + data + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nQname.xml b/Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nQname.xml new file mode 100644 index 00000000000000..0300f9d562db30 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/out_inNsXml_c14nQname.xml @@ -0,0 +1,3 @@ + + data + \ No newline at end of file diff --git a/Lib/test/xmltestdata/c14n-20/world.txt b/Lib/test/xmltestdata/c14n-20/world.txt new file mode 100644 index 00000000000000..04fea06420ca60 --- /dev/null +++ b/Lib/test/xmltestdata/c14n-20/world.txt @@ -0,0 +1 @@ +world \ No newline at end of file diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index 5b26ac72fd1aae..cf2c7f91ff1564 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -87,6 +87,7 @@ "XML", "XMLID", "XMLParser", "XMLPullParser", "register_namespace", + "canonicalize", "C14NWriterTarget", ] VERSION = "1.3.0" @@ -1711,6 +1712,297 @@ def close(self): del self.target, self._target +# -------------------------------------------------------------------- +# C14N 2.0 + +def canonicalize(write, xml_data=None, *, file=None, **options): + """Convert XML to its C14N 2.0 serialised form. + + The C14N serialised output is written using the *write* function. + To write to a file, open it in text mode with encoding "utf-8" and pass + its ``.write`` method. + + Either *xml_data* (an XML string) or *file* (a file-like object) must be + provided as input. + + The configuration options are the same as for the ``C14NWriterTarget``. + """ + parser = XMLParser(target=C14NWriterTarget(write, **options)) + + try: + if xml_data is not None: + parser.feed(xml_data) + elif file is not None: + while (d := file.read(64*1024)): + parser.feed(d) + finally: + parser.close() + + +_looks_like_prefix_name = re.compile('^\w+:\w+$', re.UNICODE).match + + +class C14NWriterTarget: + """ + Canonicalization writer target for the XMLParser. + + Serialises parse events to XML C14N 2.0. + + Configuration options: + + - *comments*: set to true to include comments + - *strip_text*: set to true to strip whitespace before and after text content + - *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}" + - *qname_aware_tags*: a set of qname aware tag names in which prefixes + should be replaced in text content + - *qname_aware_attrs*: a set of qname aware attribute names in which prefixes + should be replaced in text content + """ + def __init__(self, write, *, + comments=False, strip_text=False, rewrite_prefixes=False, + qname_aware_tags=None, qname_aware_attrs=None): + self._write = write + self._data = [] + self._comments = comments + self._strip_text = strip_text + + self._rewrite_prefixes = rewrite_prefixes + if qname_aware_tags: + self._qname_aware_tags = set(qname_aware_tags) + else: + self._qname_aware_tags = None + if qname_aware_attrs: + self._find_qname_aware_attrs = set(qname_aware_attrs).intersection + else: + self._find_qname_aware_attrs = None + + # Stack with globally and newly declared namespaces as (uri, prefix) pairs. + self._declared_ns_stack = [[ + ("http://www.w3.org/XML/1998/namespace", "xml"), + ]] + # Stack with user declared namespace prefixes as (uri, prefix) pairs. + self._ns_stack = [] + if not rewrite_prefixes: + self._ns_stack.append(list(_namespace_map.items())) + self._ns_stack.append([]) + self._prefix_map = {} + self._preserve_space = [False] + self._pending_start = None + self._root_seen = False + self._root_done = False + + def _iter_namespaces(self, ns_stack, _reversed=reversed): + for namespaces in _reversed(ns_stack): + if namespaces: # almost no element declares new namespaces + yield from namespaces + + def _resolve_prefix_name(self, prefixed_name): + prefix, name = prefixed_name.split(':', 1) + for uri, p in self._iter_namespaces(self._ns_stack): + if p == prefix: + return f'{{{uri}}}{name}' + raise ValueError(f'Prefix {prefix} of QName "{prefixed_name}" is not declared in scope') + + def _qname(self, qname, uri=None): + if uri is None: + uri, tag = qname[1:].rsplit('}', 1) if qname[:1] == '{' else ('', qname) + else: + tag = qname + + prefixes_seen = set() + for u, prefix in self._iter_namespaces(self._declared_ns_stack): + if u == uri and prefix not in prefixes_seen: + return f'{prefix}:{tag}' if prefix else tag, tag, uri + prefixes_seen.add(prefix) + + # Not declared yet => add new declaration. + if self._rewrite_prefixes: + if uri in self._prefix_map: + prefix = self._prefix_map[uri] + else: + prefix = self._prefix_map[uri] = f'n{len(self._prefix_map)}' + self._declared_ns_stack[-1].append((uri, prefix)) + return f'{prefix}:{tag}', tag, uri + + if not uri and '' not in prefixes_seen: + # No default namespace declared => no prefix needed. + return tag, tag, uri + + for u, prefix in self._iter_namespaces(self._ns_stack): + if u == uri: + self._declared_ns_stack[-1].append((uri, prefix)) + return f'{prefix}:{tag}' if prefix else tag, tag, uri + + raise ValueError(f'Namespace "{uri}" is not declared in scope') + + def data(self, data): + self._data.append(data) + + def _flush(self, _join_text=''.join): + data = _join_text(self._data) + del self._data[:] + if self._strip_text and not self._preserve_space[-1]: + data = data.strip() + if self._pending_start is not None: + args, self._pending_start = self._pending_start, None + qname_text = data if data and _looks_like_prefix_name(data) else None + self._start(*args, qname_text) + if qname_text is not None: + return + if data and self._root_seen: + self._write(_escape_cdata_c14n(data)) + + def start_ns(self, prefix, uri): + # we may have to resolve qnames in text content + if self._data: + self._flush() + self._ns_stack[-1].append((uri, prefix)) + + def start(self, tag, attrs): + if self._data: + self._flush() + + new_namespaces = [] + self._declared_ns_stack.append(new_namespaces) + + if self._qname_aware_tags is not None and tag in self._qname_aware_tags: + # Need to parse text first to see if it requires a prefix declaration. + self._pending_start = (tag, attrs, new_namespaces) + return + self._start(tag, attrs, new_namespaces) + + def _start(self, tag, attrs, new_namespaces, qname_text=None): + qnames = {tag, *attrs} + resolved_names = {} + + # Resolve prefixes in attribute and tag text. + if qname_text is not None: + qname = resolved_names[qname_text] = self._resolve_prefix_name(qname_text) + qnames.add(qname) + if self._find_qname_aware_attrs is not None and attrs: + qattrs = self._find_qname_aware_attrs(attrs) + if qattrs: + for attr_name in qattrs: + value = attrs[attr_name] + if _looks_like_prefix_name(value): + qname = resolved_names[value] = self._resolve_prefix_name(value) + qnames.add(qname) + else: + qattrs = None + else: + qattrs = None + + # Assign prefixes in lexicographical order of used URIs. + parse_qname = self._qname + parsed_qnames = {n: parse_qname(n) for n in sorted( + qnames, key=lambda n: n.split('}', 1))} + + # Write namespace declarations in prefix order ... + attr_list = sorted( + ('xmlns:' + prefix if prefix else 'xmlns', uri) + for uri, prefix in new_namespaces + ) if new_namespaces else [] # almost always empty + + # ... followed by attributes in URI+name order + for k, v in sorted(attrs.items()): + if qattrs is not None and k in qattrs and v in resolved_names: + v = parsed_qnames[resolved_names[v]][0] + attr_qname, attr_name, uri = parsed_qnames[k] + # No prefix for attributes in default ('') namespace. + attr_list.append((attr_qname if uri else attr_name, v)) + + # Honour xml:space attributes. + space_behaviour = attrs.get('{http://www.w3.org/XML/1998/namespace}space') + self._preserve_space.append( + space_behaviour == 'preserve' if space_behaviour + else self._preserve_space[-1]) + + # Write the tag. + write = self._write + write('<' + parsed_qnames[tag][0]) + for k, v in attr_list: + write(f' {k}="{_escape_attrib_c14n(v)}"') + write('>') + + # Write the resolved qname text content. + if qname_text is not None: + write(_escape_cdata_c14n(parsed_qnames[resolved_names[qname_text]][0])) + + self._root_seen = True + self._ns_stack.append([]) + + def end(self, tag): + if self._data: + self._flush() + self._write(f'') + self._preserve_space.pop() + self._root_done = len(self._preserve_space) == 1 + self._declared_ns_stack.pop() + self._ns_stack.pop() + + def comment(self, text): + if not self._comments: + return + if self._root_done: + self._write('\n') + elif self._root_seen and self._data: + self._flush() + self._write(f'') + if not self._root_seen: + self._write('\n') + + def pi(self, target, data): + if self._root_done: + self._write('\n') + elif self._root_seen and self._data: + self._flush() + self._write( + f'' if data else f'') + if not self._root_seen: + self._write('\n') + + +def _escape_cdata_c14n(text): + # escape character data + try: + # it's worth avoiding do-nothing calls for strings that are + # shorter than 500 character, or so. assume that's, by far, + # the most common case in most applications. + if '&' in text: + text = text.replace('&', '&') + if '<' in text: + text = text.replace('<', '<') + if '>' in text: + text = text.replace('>', '>') + if '\r' in text: + text = text.replace('\r', ' ') + return text + except (TypeError, AttributeError): + _raise_serialization_error(text) + + +def _escape_attrib_c14n(text): + # escape attribute value + try: + if '&' in text: + text = text.replace('&', '&') + if '<' in text: + text = text.replace('<', '<') + if '"' in text: + text = text.replace('"', '"') + if '\t' in text: + text = text.replace('\t', ' ') + if '\n' in text: + text = text.replace('\n', ' ') + if '\r' in text: + text = text.replace('\r', ' ') + return text + except (TypeError, AttributeError): + _raise_serialization_error(text) + + +# -------------------------------------------------------------------- + # Import the C accelerators try: # Element is going to be shadowed by the C implementation. We need to keep From c00dd43725844462c062dbf2bcd97a7e06816ce5 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Fri, 26 Apr 2019 11:07:43 +0200 Subject: [PATCH 06/22] Add news entry --- .../next/Library/2019-04-26-10-10-34.bpo-13611.XEF4bg.rst | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 Misc/NEWS.d/next/Library/2019-04-26-10-10-34.bpo-13611.XEF4bg.rst diff --git a/Misc/NEWS.d/next/Library/2019-04-26-10-10-34.bpo-13611.XEF4bg.rst b/Misc/NEWS.d/next/Library/2019-04-26-10-10-34.bpo-13611.XEF4bg.rst new file mode 100644 index 00000000000000..d01decb9617ab5 --- /dev/null +++ b/Misc/NEWS.d/next/Library/2019-04-26-10-10-34.bpo-13611.XEF4bg.rst @@ -0,0 +1,2 @@ +The xml.etree.ElementTree packages gained support for C14N 2.0 serialisation. +Patch by Stefan Behnel. From 08f11370e013aacd4123a0666132625297513d01 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Fri, 26 Apr 2019 17:06:59 +0200 Subject: [PATCH 07/22] Correct input file handling in test: must not decode it on the way in, especially since we don't really know the correct encoding (some files use Latin-1, others UTF-8). --- Lib/test/test_xml_etree.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index b20c3269ad3259..a17b57bbd493b4 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -3557,11 +3557,11 @@ def get_option(config, option_name, default=None): f"QName rewriting in XPath text is not supported in {output_file}") out = io.StringIO() - with open(full_path(input_file + ".xml"), 'r', encoding='utf8') as f: + with open(full_path(input_file + ".xml"), 'rb') as f: if input_file == 'inC14N5': # Hack: avoid setting up external entity resolution in the parser. - with open(full_path('world.txt'), 'r', encoding='utf8') as entity_file: - f = io.StringIO(f.read().replace('&ent2;', entity_file.read())) + with open(full_path('world.txt'), 'rb') as entity_file: + f = io.BytesIO(f.read().replace(b'&ent2;', entity_file.read())) ET.canonicalize( out.write, file=f, From 5d96e2f14b9beb0dfa5061cc6810a9eddef6ad5d Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Fri, 26 Apr 2019 17:25:40 +0200 Subject: [PATCH 08/22] Slightly faster attribute serialisation. --- Lib/xml/etree/ElementTree.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index cf2c7f91ff1564..66575c2a9325db 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1920,8 +1920,8 @@ def _start(self, tag, attrs, new_namespaces, qname_text=None): # Write the tag. write = self._write write('<' + parsed_qnames[tag][0]) - for k, v in attr_list: - write(f' {k}="{_escape_attrib_c14n(v)}"') + if attr_list: + write(''.join([f' {k}="{_escape_attrib_c14n(v)}"' for k, v in attr_list])) write('>') # Write the resolved qname text content. From 35f2e81fcfec256eb511154b656980367066f734 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Fri, 26 Apr 2019 19:07:53 +0200 Subject: [PATCH 09/22] Reduce overhead for the common cases of no new namespace declarations and/or no attributes. --- Lib/xml/etree/ElementTree.py | 26 ++++++++++++++++---------- 1 file changed, 16 insertions(+), 10 deletions(-) diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index 66575c2a9325db..90634c580dc82d 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1898,18 +1898,24 @@ def _start(self, tag, attrs, new_namespaces, qname_text=None): qnames, key=lambda n: n.split('}', 1))} # Write namespace declarations in prefix order ... - attr_list = sorted( - ('xmlns:' + prefix if prefix else 'xmlns', uri) - for uri, prefix in new_namespaces - ) if new_namespaces else [] # almost always empty + if new_namespaces: + attr_list = [ + ('xmlns:' + prefix if prefix else 'xmlns', uri) + for uri, prefix in new_namespaces + ] + attr_list.sort() + else: + # almost always empty + attr_list = [] # ... followed by attributes in URI+name order - for k, v in sorted(attrs.items()): - if qattrs is not None and k in qattrs and v in resolved_names: - v = parsed_qnames[resolved_names[v]][0] - attr_qname, attr_name, uri = parsed_qnames[k] - # No prefix for attributes in default ('') namespace. - attr_list.append((attr_qname if uri else attr_name, v)) + if attrs: + for k, v in sorted(attrs.items()): + if qattrs is not None and k in qattrs and v in resolved_names: + v = parsed_qnames[resolved_names[v]][0] + attr_qname, attr_name, uri = parsed_qnames[k] + # No prefix for attributes in default ('') namespace. + attr_list.append((attr_qname if uri else attr_name, v)) # Honour xml:space attributes. space_behaviour = attrs.get('{http://www.w3.org/XML/1998/namespace}space') From 36ea63941ceba26a931fa0a2985d2a6f8364e236 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Sat, 27 Apr 2019 07:53:34 +0200 Subject: [PATCH 10/22] Rename C14N 'comments' option to 'with_comments' to clarify its purpose (and use what lxml uses for C14N 1.0). --- Lib/test/test_xml_etree.py | 2 +- Lib/xml/etree/ElementTree.py | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index a17b57bbd493b4..8bb98902d6d34f 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -3565,7 +3565,7 @@ def get_option(config, option_name, default=None): ET.canonicalize( out.write, file=f, - comments=keep_comments, + with_comments=keep_comments, strip_text=strip_text, rewrite_prefixes=rewrite_prefixes, qname_aware_tags=qtags, qname_aware_attrs=qattrs) diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index 90634c580dc82d..f2884c74179ae0 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1750,7 +1750,7 @@ class C14NWriterTarget: Configuration options: - - *comments*: set to true to include comments + - *with_comments*: set to true to include comments - *strip_text*: set to true to strip whitespace before and after text content - *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}" - *qname_aware_tags*: a set of qname aware tag names in which prefixes @@ -1759,11 +1759,11 @@ class C14NWriterTarget: should be replaced in text content """ def __init__(self, write, *, - comments=False, strip_text=False, rewrite_prefixes=False, + with_comments=False, strip_text=False, rewrite_prefixes=False, qname_aware_tags=None, qname_aware_attrs=None): self._write = write self._data = [] - self._comments = comments + self._with_comments = with_comments self._strip_text = strip_text self._rewrite_prefixes = rewrite_prefixes @@ -1947,7 +1947,7 @@ def end(self, tag): self._ns_stack.pop() def comment(self, text): - if not self._comments: + if not self._with_comments: return if self._root_done: self._write('\n') From 8bb48f16a3cc65aa3424b1e91f032ab215eff16b Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Sun, 28 Apr 2019 22:48:37 +0200 Subject: [PATCH 11/22] Implement C14N exclusion of specific elements and attributes. --- Lib/test/test_xml_etree.py | 54 ++++++++++++++++++++++++++++++++++++ Lib/xml/etree/ElementTree.py | 27 ++++++++++++++++-- 2 files changed, 79 insertions(+), 2 deletions(-) diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index 8bb98902d6d34f..c0b91856682898 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -14,6 +14,7 @@ import os import pickle import sys +import textwrap import types import unittest import warnings @@ -3471,6 +3472,59 @@ def test_simple_roundtrip(self): #self.assertEqual(c14n_roundtrip(""), #'') + def test_c14n_exclusion(self): + xml = textwrap.dedent("""\ + + + abtext + + btext + + dtext + + + """) + self.assertEqual( + c14n_roundtrip(xml, strip_text=True), + '' + 'abtext' + 'btext' + 'dtext' + '') + self.assertEqual( + c14n_roundtrip(xml, strip_text=True, exclude_attrs=['{http://example.com/x}attr']), + '' + 'abtext' + 'btext' + 'dtext' + '') + self.assertEqual( + c14n_roundtrip(xml, strip_text=True, exclude_tags=['{http://example.com/x}d']), + '' + 'abtext' + 'btext' + '' + '') + self.assertEqual( + c14n_roundtrip(xml, strip_text=True, exclude_attrs=['{http://example.com/x}attr'], + exclude_tags=['{http://example.com/x}d']), + '' + 'abtext' + 'btext' + '' + '') + self.assertEqual( + c14n_roundtrip(xml, strip_text=True, exclude_tags=['a', 'b']), + '' + 'dtext' + '') + self.assertEqual( + c14n_roundtrip(xml, strip_text=True, exclude_tags=['{http://example.com/x}d', 'b']), + '' + '' + '' + '') + # # basic method=c14n tests from the c14n 2.0 specification. uses # test files under xmltestdata/c14n-20. diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index f2884c74179ae0..687f8d1a9c19f6 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1757,14 +1757,19 @@ class C14NWriterTarget: should be replaced in text content - *qname_aware_attrs*: a set of qname aware attribute names in which prefixes should be replaced in text content + - *exclude_attrs*: a set of attribute names that should not be serialised + - *exclude_tags*: a set of tag names that should not be serialised """ def __init__(self, write, *, with_comments=False, strip_text=False, rewrite_prefixes=False, - qname_aware_tags=None, qname_aware_attrs=None): + qname_aware_tags=None, qname_aware_attrs=None, + exclude_attrs=None, exclude_tags=None): self._write = write self._data = [] self._with_comments = with_comments self._strip_text = strip_text + self._exclude_attrs = set(exclude_attrs) if exclude_attrs else None + self._exclude_tags = set(exclude_tags) if exclude_tags else None self._rewrite_prefixes = rewrite_prefixes if qname_aware_tags: @@ -1790,6 +1795,7 @@ def __init__(self, write, *, self._pending_start = None self._root_seen = False self._root_done = False + self._ignored_depth = 0 def _iter_namespaces(self, ns_stack, _reversed=reversed): for namespaces in _reversed(ns_stack): @@ -1836,7 +1842,8 @@ def _qname(self, qname, uri=None): raise ValueError(f'Namespace "{uri}" is not declared in scope') def data(self, data): - self._data.append(data) + if not self._ignored_depth: + self._data.append(data) def _flush(self, _join_text=''.join): data = _join_text(self._data) @@ -1853,12 +1860,18 @@ def _flush(self, _join_text=''.join): self._write(_escape_cdata_c14n(data)) def start_ns(self, prefix, uri): + if self._ignored_depth: + return # we may have to resolve qnames in text content if self._data: self._flush() self._ns_stack[-1].append((uri, prefix)) def start(self, tag, attrs): + if self._exclude_tags is not None and ( + self._ignored_depth or tag in self._exclude_tags): + self._ignored_depth += 1 + return if self._data: self._flush() @@ -1872,6 +1885,9 @@ def start(self, tag, attrs): self._start(tag, attrs, new_namespaces) def _start(self, tag, attrs, new_namespaces, qname_text=None): + if self._exclude_attrs is not None and attrs: + attrs = {k: v for k, v in attrs.items() if k not in self._exclude_attrs} + qnames = {tag, *attrs} resolved_names = {} @@ -1938,6 +1954,9 @@ def _start(self, tag, attrs, new_namespaces, qname_text=None): self._ns_stack.append([]) def end(self, tag): + if self._ignored_depth: + self._ignored_depth -= 1 + return if self._data: self._flush() self._write(f'') @@ -1949,6 +1968,8 @@ def end(self, tag): def comment(self, text): if not self._with_comments: return + if self._ignored_depth: + return if self._root_done: self._write('\n') elif self._root_seen and self._data: @@ -1958,6 +1979,8 @@ def comment(self, text): self._write('\n') def pi(self, target, data): + if self._ignored_depth: + return if self._root_done: self._write('\n') elif self._root_seen and self._data: From dad95e8dd4a6df9af6686bc664ba01121e14189d Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Mon, 29 Apr 2019 07:26:24 +0200 Subject: [PATCH 12/22] Extend exclusion tests to cover the whitespace left-overs of excluded tags. --- Lib/test/test_xml_etree.py | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index c0b91856682898..3688099c2a7fd8 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -3518,12 +3518,32 @@ def test_c14n_exclusion(self): '' 'dtext' '') + self.assertEqual( + c14n_roundtrip(xml, exclude_tags=['a', 'b']), + '\n' + ' \n' + ' \n' + ' \n' + ' dtext\n' + ' \n' + '') self.assertEqual( c14n_roundtrip(xml, strip_text=True, exclude_tags=['{http://example.com/x}d', 'b']), '' '' '' '') + self.assertEqual( + c14n_roundtrip(xml, exclude_tags=['{http://example.com/x}d', 'b']), + '\n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + ' \n' + '') # # basic method=c14n tests from the c14n 2.0 specification. uses From 45f742bc0fe7fc079be8731c019f2a143a5c0d06 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Mon, 29 Apr 2019 07:58:31 +0200 Subject: [PATCH 13/22] Add documentation. --- Doc/library/xml.etree.elementtree.rst | 50 +++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index 413fe7485cfc7e..dda7ae617311e4 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -465,6 +465,45 @@ Reference Functions ^^^^^^^^^ +.. function:: canonicalize(write, xml_data=None, *, file=None, **options) + + `C14N 2.0 `_ transformation function. + + Canonicalization is a way to normalise XML output in a way that allows + byte-by-byte comparisons and digital signatures. It reduced the freedom + that XML serializers have and instead generates a more constrained XML + representation. The main restrictions regard the placement of namespace + declarations, the ordering of attributes, and ignorable whitespace. + + This function takes an XML data string (*xml_data*) or a file-like object + (*file*) as input, converts it to the canonical form, and writes it out + using the provided *write* function, e.g. the ``.write`` method of an + open file object. The write-function receives text, not bytes. Output + files should therefore be opened in text mode with ``utf-8`` encoding. + Typical use:: + + with open("c14n_output.xml", mode='w', encoding='utf-8') as out: + canonicalize(out.write, xml_data) + + The configuration *options* are as follows: + + - *with_comments*: set to true to include comments (default: false) + - *strip_text*: set to true to strip whitespace before and after text content + (default: false) + - *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}" + (default: false) + - *qname_aware_tags*: a set of qname aware tag names in which prefixes + should be replaced in text content (default: empty) + - *qname_aware_attrs*: a set of qname aware attribute names in which prefixes + should be replaced in text content (default: empty) + - *exclude_attrs*: a set of attribute names that should not be serialised + - *exclude_tags*: a set of tag names that should not be serialised + + In the option list above, "a set" refers to any collection or iterable of + strings, no ordering is expected. + + .. versionadded:: 3.8 + .. function:: Comment(text=None) @@ -1098,6 +1137,17 @@ TreeBuilder Objects .. versionadded:: 3.2 +.. class:: C14NWriterTarget(write, *, \ + with_comments=False, strip_text=False, rewrite_prefixes=False, \ + qname_aware_tags=None, qname_aware_attrs=None, \ + exclude_attrs=None, exclude_tags=None) + + A `C14N 2.0 `_ writer. Arguments are the + same as for the :func:`canonicalize` function. This class does not build a + tree but translates the callback events directly into a serialised form + using the *write* function. + + .. _elementtree-xmlparser-objects: XMLParser Objects From 037b64414aba26a490a7d5504170333fd31ff328 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Mon, 29 Apr 2019 08:45:13 +0200 Subject: [PATCH 14/22] Make the canonicalize() function more versatile by letting it return its result as text string if not output file is provided. --- Doc/library/xml.etree.elementtree.rst | 22 +++++++++++-------- Lib/test/test_xml_etree.py | 11 ++++------ Lib/xml/etree/ElementTree.py | 31 +++++++++++++++++++-------- 3 files changed, 39 insertions(+), 25 deletions(-) diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index dda7ae617311e4..823fcde27ebc3c 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -465,7 +465,7 @@ Reference Functions ^^^^^^^^^ -.. function:: canonicalize(write, xml_data=None, *, file=None, **options) +.. function:: canonicalize(xml_data=None, *, out=None, from_file=None, **options) `C14N 2.0 `_ transformation function. @@ -476,14 +476,18 @@ Functions declarations, the ordering of attributes, and ignorable whitespace. This function takes an XML data string (*xml_data*) or a file-like object - (*file*) as input, converts it to the canonical form, and writes it out - using the provided *write* function, e.g. the ``.write`` method of an - open file object. The write-function receives text, not bytes. Output - files should therefore be opened in text mode with ``utf-8`` encoding. - Typical use:: - - with open("c14n_output.xml", mode='w', encoding='utf-8') as out: - canonicalize(out.write, xml_data) + (*from_file*) as input, converts it to the canonical form, and writes it + out using the *out* file(-like) object, if provided, or returns it as a + text string if not. The output file receives text, not bytes. It should + therefore be opened in text mode with ``utf-8`` encoding. + + Typical uses:: + + xml_data = "..." + print(canonicalize(xml_data)) + + with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: + canonicalize(xml_data, out=out_file) The configuration *options* are as follows: diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index 3688099c2a7fd8..1947bae7197289 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -3429,9 +3429,7 @@ def test_correct_import_pyET(self): # -------------------------------------------------------------------- def c14n_roundtrip(xml, **options): - f = io.StringIO() - pyET.canonicalize(f.write, xml, **options) - return f.getvalue() + return pyET.canonicalize(xml, **options) class C14NTest(unittest.TestCase): @@ -3630,20 +3628,19 @@ def get_option(config, option_name, default=None): self.skipTest( f"QName rewriting in XPath text is not supported in {output_file}") - out = io.StringIO() with open(full_path(input_file + ".xml"), 'rb') as f: if input_file == 'inC14N5': # Hack: avoid setting up external entity resolution in the parser. with open(full_path('world.txt'), 'rb') as entity_file: f = io.BytesIO(f.read().replace(b'&ent2;', entity_file.read())) - ET.canonicalize( - out.write, file=f, + text = ET.canonicalize( + from_file=f, with_comments=keep_comments, strip_text=strip_text, rewrite_prefixes=rewrite_prefixes, qname_aware_tags=qtags, qname_aware_attrs=qattrs) - text = out.getvalue() + with open(full_path(output_file + ".xml"), 'r', encoding='utf8') as f: expected = f.read() if input_file == 'inC14N3': diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index 687f8d1a9c19f6..af45fc68aaf6ec 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1715,29 +1715,38 @@ def close(self): # -------------------------------------------------------------------- # C14N 2.0 -def canonicalize(write, xml_data=None, *, file=None, **options): +def canonicalize(xml_data=None, *, out=None, from_file=None, **options): """Convert XML to its C14N 2.0 serialised form. - The C14N serialised output is written using the *write* function. - To write to a file, open it in text mode with encoding "utf-8" and pass - its ``.write`` method. + If *out* is provided, it must be a file or file-like object that receives + the serialised canonical XML output (text, not bytes) through its ``.write()`` + method. To write to a file, open it in text mode with encoding "utf-8". + If *out* is not provided, this function returns the output as text string. - Either *xml_data* (an XML string) or *file* (a file-like object) must be - provided as input. + Either *xml_data* (an XML string, tree or Element) or *from_file* + (a file-like object) must be provided as input. The configuration options are the same as for the ``C14NWriterTarget``. """ - parser = XMLParser(target=C14NWriterTarget(write, **options)) + if xml_data is None and from_file is None: + raise ValueError("Either 'xml_data' or 'from_file' must be provided as input") + sio = None + if out is None: + sio = out = io.StringIO() + + parser = XMLParser(target=C14NWriterTarget(out.write, **options)) try: if xml_data is not None: parser.feed(xml_data) - elif file is not None: - while (d := file.read(64*1024)): + elif from_file is not None: + while (d := from_file.read(64*1024)): parser.feed(d) finally: parser.close() + return sio.getvalue() if sio is not None else None + _looks_like_prefix_name = re.compile('^\w+:\w+$', re.UNICODE).match @@ -1748,6 +1757,10 @@ class C14NWriterTarget: Serialises parse events to XML C14N 2.0. + The *write* function is used for writing out the resulting data stream + as text (not bytes). To write to a file, open it in text mode with encoding + "utf-8" and pass its ``.write`` method. + Configuration options: - *with_comments*: set to true to include comments From 3acd0101a27fd66c8d1fa1b49ec0390393cb2eff Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Mon, 29 Apr 2019 11:26:37 +0200 Subject: [PATCH 15/22] Fix docstring. --- Lib/xml/etree/ElementTree.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index af45fc68aaf6ec..d843748dbe9c36 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1723,8 +1723,8 @@ def canonicalize(xml_data=None, *, out=None, from_file=None, **options): method. To write to a file, open it in text mode with encoding "utf-8". If *out* is not provided, this function returns the output as text string. - Either *xml_data* (an XML string, tree or Element) or *from_file* - (a file-like object) must be provided as input. + Either *xml_data* (an XML string) or *from_file* (a file-like object) + must be provided as input. The configuration options are the same as for the ``C14NWriterTarget``. """ From e0500e81d49ab772e271082d363c1ea8652c617e Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Mon, 29 Apr 2019 11:44:02 +0200 Subject: [PATCH 16/22] Support (and test) canonicalizing from a file path in addition to only file-like objects. --- Lib/test/test_xml_etree.py | 21 +++++++++++---------- Lib/xml/etree/ElementTree.py | 11 ++++------- 2 files changed, 15 insertions(+), 17 deletions(-) diff --git a/Lib/test/test_xml_etree.py b/Lib/test/test_xml_etree.py index 1947bae7197289..bc06ee3c60e1bb 100644 --- a/Lib/test/test_xml_etree.py +++ b/Lib/test/test_xml_etree.py @@ -3628,18 +3628,19 @@ def get_option(config, option_name, default=None): self.skipTest( f"QName rewriting in XPath text is not supported in {output_file}") - with open(full_path(input_file + ".xml"), 'rb') as f: - if input_file == 'inC14N5': - # Hack: avoid setting up external entity resolution in the parser. - with open(full_path('world.txt'), 'rb') as entity_file: + f = full_path(input_file + ".xml") + if input_file == 'inC14N5': + # Hack: avoid setting up external entity resolution in the parser. + with open(full_path('world.txt'), 'rb') as entity_file: + with open(f, 'rb') as f: f = io.BytesIO(f.read().replace(b'&ent2;', entity_file.read())) - text = ET.canonicalize( - from_file=f, - with_comments=keep_comments, - strip_text=strip_text, - rewrite_prefixes=rewrite_prefixes, - qname_aware_tags=qtags, qname_aware_attrs=qattrs) + text = ET.canonicalize( + from_file=f, + with_comments=keep_comments, + strip_text=strip_text, + rewrite_prefixes=rewrite_prefixes, + qname_aware_tags=qtags, qname_aware_attrs=qattrs) with open(full_path(output_file + ".xml"), 'r', encoding='utf8') as f: expected = f.read() diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index d843748dbe9c36..34240672e95b5e 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1736,14 +1736,11 @@ def canonicalize(xml_data=None, *, out=None, from_file=None, **options): parser = XMLParser(target=C14NWriterTarget(out.write, **options)) - try: - if xml_data is not None: - parser.feed(xml_data) - elif from_file is not None: - while (d := from_file.read(64*1024)): - parser.feed(d) - finally: + if xml_data is not None: + parser.feed(xml_data) parser.close() + elif from_file is not None: + parse(from_file, parser=parser) return sio.getvalue() if sio is not None else None From a94b07d7d13a3591c5482451533c28ee4672cb95 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Mon, 29 Apr 2019 11:58:59 +0200 Subject: [PATCH 17/22] Update documentation now that canonicalize() supports file paths as input. --- Doc/library/xml.etree.elementtree.rst | 14 +++++++++----- Lib/xml/etree/ElementTree.py | 4 ++-- 2 files changed, 11 insertions(+), 7 deletions(-) diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index 823fcde27ebc3c..0b63cfe4efad2b 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -475,11 +475,12 @@ Functions representation. The main restrictions regard the placement of namespace declarations, the ordering of attributes, and ignorable whitespace. - This function takes an XML data string (*xml_data*) or a file-like object - (*from_file*) as input, converts it to the canonical form, and writes it - out using the *out* file(-like) object, if provided, or returns it as a - text string if not. The output file receives text, not bytes. It should - therefore be opened in text mode with ``utf-8`` encoding. + This function takes an XML data string (*xml_data*) or a file path or + file-like object (*from_file*) as input, converts it to the canonical + form, and writes it out using the *out* file(-like) object, if provided, + or returns it as a text string if not. The output file receives text, + not bytes. It should therefore be opened in text mode with ``utf-8`` + encoding. Typical uses:: @@ -489,6 +490,9 @@ Functions with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: canonicalize(xml_data, out=out_file) + with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: + canonicalize(from_file="inputfile.xml", out=out_file) + The configuration *options* are as follows: - *with_comments*: set to true to include comments (default: false) diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index 34240672e95b5e..2c3d228f96b67a 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1723,8 +1723,8 @@ def canonicalize(xml_data=None, *, out=None, from_file=None, **options): method. To write to a file, open it in text mode with encoding "utf-8". If *out* is not provided, this function returns the output as text string. - Either *xml_data* (an XML string) or *from_file* (a file-like object) - must be provided as input. + Either *xml_data* (an XML string) or *from_file* (a file path or + file-like object) must be provided as input. The configuration options are the same as for the ``C14NWriterTarget``. """ From 93e2c2072b367b92a77ee906f4f5e6b2e774a4bb Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Wed, 1 May 2019 07:42:02 +0200 Subject: [PATCH 18/22] Add "What's New" entry. --- Doc/whatsnew/3.8.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/Doc/whatsnew/3.8.rst b/Doc/whatsnew/3.8.rst index f866f9ccb8c16f..855e88c7d9c12c 100644 --- a/Doc/whatsnew/3.8.rst +++ b/Doc/whatsnew/3.8.rst @@ -438,6 +438,10 @@ xml external entities by default. (Contributed by Christian Heimes in :issue:`17239`.) +* The :mod:`xml.etree.ElementTree` module provides a new function + :func:`–xml.etree.ElementTree.canonicalize()` that implements C14N 2.0. + (Contributed by Stefan Behnel in :issue:`13611`.) + Optimizations ============= From c37c3db1c839ecfc532fb5eaaef38198083bb8a1 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Wed, 1 May 2019 08:43:30 +0200 Subject: [PATCH 19/22] Fix syntax warning due to invalid string escapes. --- Lib/xml/etree/ElementTree.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Lib/xml/etree/ElementTree.py b/Lib/xml/etree/ElementTree.py index 2c3d228f96b67a..645e999a0be6ca 100644 --- a/Lib/xml/etree/ElementTree.py +++ b/Lib/xml/etree/ElementTree.py @@ -1745,7 +1745,7 @@ def canonicalize(xml_data=None, *, out=None, from_file=None, **options): return sio.getvalue() if sio is not None else None -_looks_like_prefix_name = re.compile('^\w+:\w+$', re.UNICODE).match +_looks_like_prefix_name = re.compile(r'^\w+:\w+$', re.UNICODE).match class C14NWriterTarget: From 6c903a39f8245295edacfc6e133abb1d8009e571 Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Wed, 1 May 2019 19:58:33 +0200 Subject: [PATCH 20/22] Fix reference leaks. --- Modules/_elementtree.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/Modules/_elementtree.c b/Modules/_elementtree.c index 50d0f20571bcea..b69e3a45fe308f 100644 --- a/Modules/_elementtree.c +++ b/Modules/_elementtree.c @@ -3417,8 +3417,10 @@ expat_start_ns_handler(XMLParserObject* self, const XML_Char* prefix_in, if (!prefix) return; uri = PyUnicode_DecodeUTF8(uri_in, strlen(uri_in), "strict"); - if (!uri) + if (!uri) { + Py_DECREF(prefix); return; + } res = treebuilder_handle_start_ns(target, prefix, uri); Py_DECREF(uri); @@ -3429,8 +3431,10 @@ expat_start_ns_handler(XMLParserObject* self, const XML_Char* prefix_in, if (!prefix) return; uri = PyUnicode_DecodeUTF8(uri_in, strlen(uri_in), "strict"); - if (!uri) + if (!uri) { + Py_DECREF(prefix); return; + } stack[0] = prefix; stack[1] = uri; @@ -3783,6 +3787,8 @@ xmlparser_gc_traverse(XMLParserObject *self, visitproc visit, void *arg) Py_VISIT(self->handle_data); Py_VISIT(self->handle_start); Py_VISIT(self->handle_start_ns); + Py_VISIT(self->handle_end_ns); + Py_VISIT(self->handle_doctype); Py_VISIT(self->target); Py_VISIT(self->entity); @@ -3807,6 +3813,7 @@ xmlparser_gc_clear(XMLParserObject *self) Py_CLEAR(self->handle_data); Py_CLEAR(self->handle_start); Py_CLEAR(self->handle_start_ns); + Py_CLEAR(self->handle_end_ns); Py_CLEAR(self->handle_doctype); Py_CLEAR(self->target); From 555593bd453e42589d8aa031b007c2ae69396b0c Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Wed, 1 May 2019 20:09:33 +0200 Subject: [PATCH 21/22] Move the documentation of the start_ns() and end_ns() methods to a more appropriate place. --- Doc/library/xml.etree.elementtree.rst | 34 ++++++++++++++++----------- 1 file changed, 20 insertions(+), 14 deletions(-) diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index 413fe7485cfc7e..70ec6ff01a30a0 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -1087,7 +1087,7 @@ TreeBuilder Objects In addition, a custom :class:`TreeBuilder` object can provide the - following method: + following methods: .. method:: doctype(name, pubid, system) @@ -1097,6 +1097,23 @@ TreeBuilder Objects .. versionadded:: 3.2 + .. method:: start_ns(prefix, uri) + + Is called whenever the parser encounters a new namespace declaration, + before the ``start()`` callback for the opening element that defines it. + *prefix* is ``''`` for the default namespace and the declared + namespace prefix name otherwise. *uri* is the namespace URI. + + .. versionadded:: 3.8 + + .. method:: end_ns(prefix) + + Is called after the ``end()`` callback of an element that declared + a namespace prefix mapping, with the name of the *prefix* that went + out of scope. + + .. versionadded:: 3.8 + .. _elementtree-xmlparser-objects: @@ -1132,7 +1149,8 @@ XMLParser Objects :meth:`XMLParser.feed` calls *target*\'s ``start(tag, attrs_dict)`` method for each opening tag, its ``end(tag)`` method for each closing tag, and data - is processed by method ``data(data)``. :meth:`XMLParser.close` calls + is processed by method ``data(data)``. For further supported callback + methods, see the :class:`TreeBuilder` class. :meth:`XMLParser.close` calls *target*\'s method ``close()``. :class:`XMLParser` can be used not only for building a tree structure. This is an example of counting the maximum depth of an XML file:: @@ -1169,18 +1187,6 @@ XMLParser Objects >>> parser.close() 4 - Additionally, if the target object provides one or both of the methods - ``start_ns(self, prefix, uri)`` and ``end_ns(self, prefix)``, then they - are called whenever the parser encounters a new namespace declaration. - The ``prefix`` is ``''`` for the default namespace and the declared - namespace prefix otherwise. The ``start_ns()`` method is called before - the ``start()`` callback of the opening tag that defines the namespace, - and the ``end_ns()`` method is called after the corresponding ``end()`` - callback. - - .. versionchanged:: 3.8 - The ``start_ns()`` and ``end_ns()`` callbacks were added. - .. _elementtree-xmlpullparser-objects: From 56b6428b1a28b8cb063b3e0c6e17155b1b5d87fe Mon Sep 17 00:00:00 2001 From: Stefan Behnel Date: Wed, 1 May 2019 22:12:26 +0200 Subject: [PATCH 22/22] Add missing "versionadded" tag in docs. --- Doc/library/xml.etree.elementtree.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Doc/library/xml.etree.elementtree.rst b/Doc/library/xml.etree.elementtree.rst index ff734350789c56..ef74d0c852cd75 100644 --- a/Doc/library/xml.etree.elementtree.rst +++ b/Doc/library/xml.etree.elementtree.rst @@ -1171,6 +1171,8 @@ TreeBuilder Objects tree but translates the callback events directly into a serialised form using the *write* function. + .. versionadded:: 3.8 + .. _elementtree-xmlparser-objects: