Skip to content

bpo-13611: C14N 2.0 implementation for ElementTree #12966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 1, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
ab7341d
bpo-36673: Implement comment/PI parsing support for the TreeBuilder i…
scoder Apr 20, 2019
2d2df11
bpo-36673: Rewrite the comment/PI factory handling for the TreeBuilde…
scoder Apr 20, 2019
aa52e04
bpo-36676: Implement namespace prefix aware parsing support for the X…
scoder Apr 20, 2019
3b33264
bpo-36676: Add test to see if a target only with an "end_ns()" callba…
scoder Apr 22, 2019
7f0ed48
Implement C14N 2.0 as a new canonicalize() function in ElementTree.
scoder Apr 26, 2019
c00dd43
Add news entry
scoder Apr 26, 2019
08f1137
Correct input file handling in test: must not decode it on the way in…
scoder Apr 26, 2019
5d96e2f
Slightly faster attribute serialisation.
scoder Apr 26, 2019
35f2e81
Reduce overhead for the common cases of no new namespace declarations…
scoder Apr 26, 2019
36ea639
Rename C14N 'comments' option to 'with_comments' to clarify its purpo…
scoder Apr 27, 2019
8bb48f1
Implement C14N exclusion of specific elements and attributes.
scoder Apr 28, 2019
dad95e8
Extend exclusion tests to cover the whitespace left-overs of excluded…
scoder Apr 29, 2019
45f742b
Add documentation.
scoder Apr 29, 2019
037b644
Make the canonicalize() function more versatile by letting it return …
scoder Apr 29, 2019
3acd010
Fix docstring.
scoder Apr 29, 2019
e0500e8
Support (and test) canonicalizing from a file path in addition to onl…
scoder Apr 29, 2019
a94b07d
Update documentation now that canonicalize() supports file paths as i…
scoder Apr 29, 2019
93e2c20
Add "What's New" entry.
scoder May 1, 2019
c37c3db
Fix syntax warning due to invalid string escapes.
scoder May 1, 2019
6c903a3
Fix reference leaks.
scoder May 1, 2019
555593b
Move the documentation of the start_ns() and end_ns() methods to a mo…
scoder May 1, 2019
b1aadb8
Merge branch 'bpo-36676_etree_startns' into bpo-13611_et_c14n2
scoder May 1, 2019
03bd37e
Merge branch 'master' into bpo-36676_etree_startns
scoder May 1, 2019
19b3f5a
Merge branch 'bpo-36676_etree_startns' into bpo-13611_et_c14n2
scoder May 1, 2019
647fd77
Merge branch master into bpo-13611_et_c14n2
scoder May 1, 2019
56b6428
Add missing "versionadded" tag in docs.
scoder May 1, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions Doc/library/xml.etree.elementtree.rst
Original file line number Diff line number Diff line change
Expand Up @@ -465,6 +465,53 @@ Reference
Functions
^^^^^^^^^

.. function:: canonicalize(xml_data=None, *, out=None, from_file=None, **options)

`C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ transformation function.

Canonicalization is a way to normalise XML output in a way that allows
byte-by-byte comparisons and digital signatures. It reduced the freedom
that XML serializers have and instead generates a more constrained XML
representation. The main restrictions regard the placement of namespace
declarations, the ordering of attributes, and ignorable whitespace.

This function takes an XML data string (*xml_data*) or a file path or
file-like object (*from_file*) as input, converts it to the canonical
form, and writes it out using the *out* file(-like) object, if provided,
or returns it as a text string if not. The output file receives text,
not bytes. It should therefore be opened in text mode with ``utf-8``
encoding.

Typical uses::

xml_data = "<root>...</root>"
print(canonicalize(xml_data))

with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
canonicalize(xml_data, out=out_file)

with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file:
canonicalize(from_file="inputfile.xml", out=out_file)

The configuration *options* are as follows:

- *with_comments*: set to true to include comments (default: false)
- *strip_text*: set to true to strip whitespace before and after text content
(default: false)
- *rewrite_prefixes*: set to true to replace namespace prefixes by "n{number}"
(default: false)
- *qname_aware_tags*: a set of qname aware tag names in which prefixes
should be replaced in text content (default: empty)
- *qname_aware_attrs*: a set of qname aware attribute names in which prefixes
should be replaced in text content (default: empty)
- *exclude_attrs*: a set of attribute names that should not be serialised
- *exclude_tags*: a set of tag names that should not be serialised

In the option list above, "a set" refers to any collection or iterable of
strings, no ordering is expected.

.. versionadded:: 3.8


.. function:: Comment(text=None)

Expand Down Expand Up @@ -1114,6 +1161,19 @@ TreeBuilder Objects
.. versionadded:: 3.8


.. class:: C14NWriterTarget(write, *, \
with_comments=False, strip_text=False, rewrite_prefixes=False, \
qname_aware_tags=None, qname_aware_attrs=None, \
exclude_attrs=None, exclude_tags=None)

A `C14N 2.0 <https://www.w3.org/TR/xml-c14n2/>`_ writer. Arguments are the
same as for the :func:`canonicalize` function. This class does not build a
tree but translates the callback events directly into a serialised form
using the *write* function.

.. versionadded:: 3.8


.. _elementtree-xmlparser-objects:

XMLParser Objects
Expand Down
4 changes: 4 additions & 0 deletions Doc/whatsnew/3.8.rst
Original file line number Diff line number Diff line change
Expand Up @@ -525,6 +525,10 @@ xml
external entities by default.
(Contributed by Christian Heimes in :issue:`17239`.)

* The :mod:`xml.etree.ElementTree` module provides a new function
:func:`–xml.etree.ElementTree.canonicalize()` that implements C14N 2.0.
(Contributed by Stefan Behnel in :issue:`13611`.)


Optimizations
=============
Expand Down
229 changes: 229 additions & 0 deletions Lib/test/test_xml_etree.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import itertools
import locale
import operator
import os
import pickle
import sys
import textwrap
Expand All @@ -20,6 +21,7 @@
import warnings
import weakref

from functools import partial
from itertools import product, islice
from test import support
from test.support import TESTFN, findfile, import_fresh_module, gc_collect, swap_attr
Expand Down Expand Up @@ -3527,6 +3529,231 @@ def test_correct_import_pyET(self):
self.assertIsInstance(pyET.Element.__init__, types.FunctionType)
self.assertIsInstance(pyET.XMLParser.__init__, types.FunctionType)


# --------------------------------------------------------------------

def c14n_roundtrip(xml, **options):
return pyET.canonicalize(xml, **options)


class C14NTest(unittest.TestCase):
maxDiff = None

#
# simple roundtrip tests (from c14n.py)

def test_simple_roundtrip(self):
# Basics
self.assertEqual(c14n_roundtrip("<doc/>"), '<doc></doc>')
self.assertEqual(c14n_roundtrip("<doc xmlns='uri'/>"), # FIXME
'<doc xmlns="uri"></doc>')
self.assertEqual(c14n_roundtrip("<prefix:doc xmlns:prefix='uri'/>"),
'<prefix:doc xmlns:prefix="uri"></prefix:doc>')
self.assertEqual(c14n_roundtrip("<doc xmlns:prefix='uri'><prefix:bar/></doc>"),
'<doc><prefix:bar xmlns:prefix="uri"></prefix:bar></doc>')
self.assertEqual(c14n_roundtrip("<elem xmlns:wsu='http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd' xmlns:SOAP-ENV='http://schemas.xmlsoap.org/soap/envelope/' />"),
'<elem></elem>')

# C14N spec
self.assertEqual(c14n_roundtrip("<doc>Hello, world!<!-- Comment 1 --></doc>"),
'<doc>Hello, world!</doc>')
self.assertEqual(c14n_roundtrip("<value>&#x32;</value>"),
'<value>2</value>')
self.assertEqual(c14n_roundtrip('<compute><![CDATA[value>"0" && value<"10" ?"valid":"error"]]></compute>'),
'<compute>value&gt;"0" &amp;&amp; value&lt;"10" ?"valid":"error"</compute>')
self.assertEqual(c14n_roundtrip('''<compute expr='value>"0" &amp;&amp; value&lt;"10" ?"valid":"error"'>valid</compute>'''),
'<compute expr="value>&quot;0&quot; &amp;&amp; value&lt;&quot;10&quot; ?&quot;valid&quot;:&quot;error&quot;">valid</compute>')
self.assertEqual(c14n_roundtrip("<norm attr=' &apos; &#x20;&#13;&#xa;&#9; &apos; '/>"),
'<norm attr=" \' &#xD;&#xA;&#x9; \' "></norm>')
self.assertEqual(c14n_roundtrip("<normNames attr=' A &#x20;&#13;&#xa;&#9; B '/>"),
'<normNames attr=" A &#xD;&#xA;&#x9; B "></normNames>')
self.assertEqual(c14n_roundtrip("<normId id=' &apos; &#x20;&#13;&#xa;&#9; &apos; '/>"),
'<normId id=" \' &#xD;&#xA;&#x9; \' "></normId>')

# fragments from PJ's tests
#self.assertEqual(c14n_roundtrip("<doc xmlns:x='http://example.com/x' xmlns='http://example.com/default'><b y:a1='1' xmlns='http://example.com/default' a3='3' xmlns:y='http://example.com/y' y:a2='2'/></doc>"),
#'<doc xmlns:x="http://example.com/x"><b xmlns:y="http://example.com/y" a3="3" y:a1="1" y:a2="2"></b></doc>')

def test_c14n_exclusion(self):
xml = textwrap.dedent("""\
<root xmlns:x="http://example.com/x">
<a x:attr="attrx">
<b>abtext</b>
</a>
<b>btext</b>
<c>
<x:d>dtext</x:d>
</c>
</root>
""")
self.assertEqual(
c14n_roundtrip(xml, strip_text=True),
'<root>'
'<a xmlns:x="http://example.com/x" x:attr="attrx"><b>abtext</b></a>'
'<b>btext</b>'
'<c><x:d xmlns:x="http://example.com/x">dtext</x:d></c>'
'</root>')
self.assertEqual(
c14n_roundtrip(xml, strip_text=True, exclude_attrs=['{http://example.com/x}attr']),
'<root>'
'<a><b>abtext</b></a>'
'<b>btext</b>'
'<c><x:d xmlns:x="http://example.com/x">dtext</x:d></c>'
'</root>')
self.assertEqual(
c14n_roundtrip(xml, strip_text=True, exclude_tags=['{http://example.com/x}d']),
'<root>'
'<a xmlns:x="http://example.com/x" x:attr="attrx"><b>abtext</b></a>'
'<b>btext</b>'
'<c></c>'
'</root>')
self.assertEqual(
c14n_roundtrip(xml, strip_text=True, exclude_attrs=['{http://example.com/x}attr'],
exclude_tags=['{http://example.com/x}d']),
'<root>'
'<a><b>abtext</b></a>'
'<b>btext</b>'
'<c></c>'
'</root>')
self.assertEqual(
c14n_roundtrip(xml, strip_text=True, exclude_tags=['a', 'b']),
'<root>'
'<c><x:d xmlns:x="http://example.com/x">dtext</x:d></c>'
'</root>')
self.assertEqual(
c14n_roundtrip(xml, exclude_tags=['a', 'b']),
'<root>\n'
' \n'
' \n'
' <c>\n'
' <x:d xmlns:x="http://example.com/x">dtext</x:d>\n'
' </c>\n'
'</root>')
self.assertEqual(
c14n_roundtrip(xml, strip_text=True, exclude_tags=['{http://example.com/x}d', 'b']),
'<root>'
'<a xmlns:x="http://example.com/x" x:attr="attrx"></a>'
'<c></c>'
'</root>')
self.assertEqual(
c14n_roundtrip(xml, exclude_tags=['{http://example.com/x}d', 'b']),
'<root>\n'
' <a xmlns:x="http://example.com/x" x:attr="attrx">\n'
' \n'
' </a>\n'
' \n'
' <c>\n'
' \n'
' </c>\n'
'</root>')

#
# basic method=c14n tests from the c14n 2.0 specification. uses
# test files under xmltestdata/c14n-20.

# note that this uses generated C14N versions of the standard ET.write
# output, not roundtripped C14N (see above).

def test_xml_c14n2(self):
datadir = findfile("c14n-20", subdir="xmltestdata")
full_path = partial(os.path.join, datadir)

files = [filename[:-4] for filename in sorted(os.listdir(datadir))
if filename.endswith('.xml')]
input_files = [
filename for filename in files
if filename.startswith('in')
]
configs = {
filename: {
# <c14n2:PrefixRewrite>sequential</c14n2:PrefixRewrite>
option.tag.split('}')[-1]: ((option.text or '').strip(), option)
for option in ET.parse(full_path(filename) + ".xml").getroot()
}
for filename in files
if filename.startswith('c14n')
}

tests = {
input_file: [
(filename, configs[filename.rsplit('_', 1)[-1]])
for filename in files
if filename.startswith(f'out_{input_file}_')
and filename.rsplit('_', 1)[-1] in configs
]
for input_file in input_files
}

# Make sure we found all test cases.
self.assertEqual(30, len([
output_file for output_files in tests.values()
for output_file in output_files]))

def get_option(config, option_name, default=None):
return config.get(option_name, (default, ()))[0]

for input_file, output_files in tests.items():
for output_file, config in output_files:
keep_comments = get_option(
config, 'IgnoreComments') == 'true' # no, it's right :)
strip_text = get_option(
config, 'TrimTextNodes') == 'true'
rewrite_prefixes = get_option(
config, 'PrefixRewrite') == 'sequential'
if 'QNameAware' in config:
qattrs = [
f"{{{el.get('NS')}}}{el.get('Name')}"
for el in config['QNameAware'][1].findall(
'{http://www.w3.org/2010/xml-c14n2}QualifiedAttr')
]
qtags = [
f"{{{el.get('NS')}}}{el.get('Name')}"
for el in config['QNameAware'][1].findall(
'{http://www.w3.org/2010/xml-c14n2}Element')
]
else:
qtags = qattrs = None

# Build subtest description from config.
config_descr = ','.join(
f"{name}={value or ','.join(c.tag.split('}')[-1] for c in children)}"
for name, (value, children) in sorted(config.items())
)

with self.subTest(f"{output_file}({config_descr})"):
if input_file == 'inNsRedecl' and not rewrite_prefixes:
self.skipTest(
f"Redeclared namespace handling is not supported in {output_file}")
if input_file == 'inNsSuperfluous' and not rewrite_prefixes:
self.skipTest(
f"Redeclared namespace handling is not supported in {output_file}")
if 'QNameAware' in config and config['QNameAware'][1].find(
'{http://www.w3.org/2010/xml-c14n2}XPathElement') is not None:
self.skipTest(
f"QName rewriting in XPath text is not supported in {output_file}")

f = full_path(input_file + ".xml")
if input_file == 'inC14N5':
# Hack: avoid setting up external entity resolution in the parser.
with open(full_path('world.txt'), 'rb') as entity_file:
with open(f, 'rb') as f:
f = io.BytesIO(f.read().replace(b'&ent2;', entity_file.read()))

text = ET.canonicalize(
from_file=f,
with_comments=keep_comments,
strip_text=strip_text,
rewrite_prefixes=rewrite_prefixes,
qname_aware_tags=qtags, qname_aware_attrs=qattrs)

with open(full_path(output_file + ".xml"), 'r', encoding='utf8') as f:
expected = f.read()
if input_file == 'inC14N3':
# FIXME: cET resolves default attributes but ET does not!
expected = expected.replace(' attr="default"', '')
text = text.replace(' attr="default"', '')
self.assertEqual(expected, text)

# --------------------------------------------------------------------


Expand Down Expand Up @@ -3559,6 +3786,8 @@ def test_main(module=None):
XMLParserTest,
XMLPullParserTest,
BugsTest,
KeywordArgsTest,
C14NTest,
]

# These tests will only run for the pure-Python version that doesn't import
Expand Down
4 changes: 4 additions & 0 deletions Lib/test/xmltestdata/c14n-20/c14nComment.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:IgnoreComments>true</c14n2:IgnoreComments>
</dsig:CanonicalizationMethod>

3 changes: 3 additions & 0 deletions Lib/test/xmltestdata/c14n-20/c14nDefault.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" Algorithm="http://www.w3.org/2010/xml-c14n2">
</dsig:CanonicalizationMethod>

4 changes: 4 additions & 0 deletions Lib/test/xmltestdata/c14n-20/c14nPrefix.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:PrefixRewrite>sequential</c14n2:PrefixRewrite>
</dsig:CanonicalizationMethod>

7 changes: 7 additions & 0 deletions Lib/test/xmltestdata/c14n-20/c14nPrefixQname.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:PrefixRewrite>sequential</c14n2:PrefixRewrite>
<c14n2:QNameAware>
<c14n2:QualifiedAttr Name="type" NS="http://www.w3.org/2001/XMLSchema-instance"/>
</c14n2:QNameAware>
</dsig:CanonicalizationMethod>

8 changes: 8 additions & 0 deletions Lib/test/xmltestdata/c14n-20/c14nPrefixQnameXpathElem.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:PrefixRewrite>sequential</c14n2:PrefixRewrite>
<c14n2:QNameAware>
<c14n2:Element Name="bar" NS="http://a"/>
<c14n2:XPathElement Name="IncludedXPath" NS="http://www.w3.org/2010/xmldsig2#"/>
</c14n2:QNameAware>
</dsig:CanonicalizationMethod>

6 changes: 6 additions & 0 deletions Lib/test/xmltestdata/c14n-20/c14nQname.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:QNameAware>
<c14n2:QualifiedAttr Name="type" NS="http://www.w3.org/2001/XMLSchema-instance"/>
</c14n2:QNameAware>
</dsig:CanonicalizationMethod>

6 changes: 6 additions & 0 deletions Lib/test/xmltestdata/c14n-20/c14nQnameElem.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:QNameAware>
<c14n2:Element Name="bar" NS="http://a"/>
</c14n2:QNameAware>
</dsig:CanonicalizationMethod>

7 changes: 7 additions & 0 deletions Lib/test/xmltestdata/c14n-20/c14nQnameXpathElem.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:QNameAware>
<c14n2:Element Name="bar" NS="http://a"/>
<c14n2:XPathElement Name="IncludedXPath" NS="http://www.w3.org/2010/xmldsig2#"/>
</c14n2:QNameAware>
</dsig:CanonicalizationMethod>

4 changes: 4 additions & 0 deletions Lib/test/xmltestdata/c14n-20/c14nTrim.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<dsig:CanonicalizationMethod xmlns:dsig="http://www.w3.org/2000/09/xmldsig#" xmlns:c14n2="http://www.w3.org/2010/xml-c14n2" Algorithm="http://www.w3.org/2010/xml-c14n2">
<c14n2:TrimTextNodes>true</c14n2:TrimTextNodes>
</dsig:CanonicalizationMethod>

Loading