bpo-17088: Fix handling of XML attributes when serializing with default namespace #11050

mthuurne · 2018-12-09T13:34:06Z

I split the changes over several commits for easier reviewing. Feel free to squash some or all of them later.

I took the approach that Stefan Behnel suggested in the bug discussion: change the qnames cache values from a single serialized value shared by tag and attribute to a pair that contains a separate serialized value for tag and attribute. Often the two will be identical, but unqualified names interact differently with the default namespace depending on whether it's a tag name or an attribute name.

Many thanks to "wiml" for the detailed test case.

https://bugs.python.org/issue17088

The 'qname' argument is not only used for element names, so calling the non-namespace part 'tag' is confusing. The XML namespaces spec calls it 'local part', so I picked 'local' as the new name.

All other callers of add_qname() already check the argument type prior to calling, so by also checking before adding an attribute name, add_qname() doesn't need to deal with TypeError anymore. The diff is actually tiny when whitespace changes are ignored.

This is in preparation of a future change that requires the default namespace to not be applied to attributes. The cache population code will become non-trivial then.

This is in preparation of a future change that requires the default namespace to not be applied to attributes.

Currently there is no real need to do this, since the serialization used for tags and attributes is identical. But in the future, they will use different serializations and we want to avoid defining namespace prefixes for namespaces that are not going to appear in the serialization.

This is in preparation of a future change that requires the default namespace to not be applied to attributes. Because the default namespace is no longer in the mapping, the first generated synthetic namespace prefix is now always "ns0", while previously it was "ns0" or "ns1" depending on whether a default namespace was provided. I updated the expected serialization in one test case to match this new behavior.

Previously, it returned a mapping from namespace to prefix. But later the serialization code sorted that mapping by prefix, so inverting the mapping simplifies things. Another reason to invert the mapping is that currently there is a 1:1 correspondence between prefixes and namespaces, but in the future it will be possible for the default namespace to exist both with an empty prefix and with a actual prefix.

This is not my work: I extracted this test case from the second iteration of wiml's patch for issue 17088. https://bugs.python.org/file33125/bug17088_2.patch I did modify the line wrapping, as the original patch went very wide (over column 100). I also updated the prefix numbering in the expected serialized output, since numbering always starts at 0 now.

Unprefixed attributes are considered to not be in any namespace, according to the XML namespaces spec. This also fixes issue 17088, since the exception that rejects non-qualified names when using the default_namespace option is no longer raised when serializing an attribute name.

When no default_namespace is passed, serialize_qname() returns the same value regardless of whether is_attr is True or False. So we can save some time by storing the serialization for both the tag and attribute in the cache when one of them is computed.

the-knights-who-say-ni · 2018-12-09T13:34:09Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

…namespace

scoder · 2020-04-04T19:24:33Z

Lib/xml/etree/ElementTree.py

-        except TypeError:
-            _raise_serialization_error(qname)


Where's this part gone? Wouldn't this leak a TypeError to the user side?

scoder · 2020-04-04T19:26:08Z

Lib/xml/etree/ElementTree.py

        elif isinstance(tag, str):
-            if tag not in qnames:


These conditions might have been there for performance reasons. Not sure if it matters, but that's up to some benchmarking I think.

scoder · 2020-04-04T19:28:36Z

Lib/xml/etree/ElementTree.py

            add_qname(text.text)
-    return qnames, namespaces
+
+    prefix_map = {prefix: ns for ns, prefix in namespaces.items()}


Prefixes don't have to be globally unique, so this might introduce bugs. (Suggests to me that there might be missing tests.)

scoder · 2020-04-04T19:33:48Z

Lib/xml/etree/ElementTree.py

+                if not default_namespace:
+                    ser_tag = ser_attr


This seems worth a comment in the code (likewise below).

csabella · 2020-05-25T14:43:28Z

@mthuurne, please take a look at the code review. Thanks!

github-actions · 2025-04-13T06:03:41Z

This PR is stale because it has been open for 30 days with no activity.

mthuurne added 11 commits December 9, 2018 14:18

Include rejected unqualified name in exception from ElementTree.write()

a6c78ce

Rename internal variable for clarity

e0d6f6a

The 'qname' argument is not only used for element names, so calling the non-namespace part 'tag' is confusing. The XML namespaces spec calls it 'local part', so I picked 'local' as the new name.

Separate serialized name computation from cache population

34e6b95

This is in preparation of a future change that requires the default namespace to not be applied to attributes. The cache population code will become non-trivial then.

Store separate local:prefix string in qnames cache for attributes

71119f5

This is in preparation of a future change that requires the default namespace to not be applied to attributes.

the-knights-who-say-ni added the CLA not signed label Dec 9, 2018

bedevere-bot added the awaiting review label Dec 9, 2018

Add NEWS.d item for ElementTree attribute serialization with default …

52c4b8e

…namespace

the-knights-who-say-ni added CLA signed and removed CLA not signed labels Dec 10, 2018

scoder reviewed Apr 4, 2020

View reviewed changes

csabella added awaiting changes and removed awaiting review labels May 25, 2020

ezio-melotti removed the CLA signed label Jul 13, 2022

SilverbackNet mannequin mentioned this pull request Jan 5, 2023

ElementTree incorrectly refuses to write attributes without namespaces when default_namespace is used #61290

Open

github-actions bot added the stale Stale PR or inactive for long period of time. label Apr 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

bpo-17088: Fix handling of XML attributes when serializing with default namespace #11050

bpo-17088: Fix handling of XML attributes when serializing with default namespace #11050

Uh oh!

mthuurne commented Dec 9, 2018 •

edited by bedevere-bot

Loading

Uh oh!

the-knights-who-say-ni commented Dec 9, 2018

Uh oh!

scoder Apr 4, 2020

Uh oh!

scoder Apr 4, 2020

Uh oh!

scoder Apr 4, 2020

Uh oh!

scoder Apr 4, 2020

Uh oh!

csabella commented May 25, 2020

Uh oh!

github-actions bot commented Apr 13, 2025

Uh oh!

Uh oh!

Uh oh!

bpo-17088: Fix handling of XML attributes when serializing with default namespace #11050

Are you sure you want to change the base?

bpo-17088: Fix handling of XML attributes when serializing with default namespace #11050

Uh oh!

Conversation

mthuurne commented Dec 9, 2018 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

the-knights-who-say-ni commented Dec 9, 2018

Uh oh!

scoder Apr 4, 2020

Choose a reason for hiding this comment

Uh oh!

scoder Apr 4, 2020

Choose a reason for hiding this comment

Uh oh!

scoder Apr 4, 2020

Choose a reason for hiding this comment

Uh oh!

scoder Apr 4, 2020

Choose a reason for hiding this comment

Uh oh!

csabella commented May 25, 2020

Uh oh!

github-actions bot commented Apr 13, 2025

Uh oh!

Uh oh!

mthuurne commented Dec 9, 2018 •

edited by bedevere-bot

Loading