Skip to content

bpo-17088: Fix handling of XML attributes when serializing with default namespace #11050

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

mthuurne
Copy link

@mthuurne mthuurne commented Dec 9, 2018

I split the changes over several commits for easier reviewing. Feel free to squash some or all of them later.

I took the approach that Stefan Behnel suggested in the bug discussion: change the qnames cache values from a single serialized value shared by tag and attribute to a pair that contains a separate serialized value for tag and attribute. Often the two will be identical, but unqualified names interact differently with the default namespace depending on whether it's a tag name or an attribute name.

Many thanks to "wiml" for the detailed test case.

https://bugs.python.org/issue17088

The 'qname' argument is not only used for element names, so calling
the non-namespace part 'tag' is confusing. The XML namespaces spec
calls it 'local part', so I picked 'local' as the new name.
All other callers of add_qname() already check the argument type
prior to calling, so by also checking before adding an attribute
name, add_qname() doesn't need to deal with TypeError anymore.

The diff is actually tiny when whitespace changes are ignored.
This is in preparation of a future change that requires the default
namespace to not be applied to attributes. The cache population code
will become non-trivial then.
This is in preparation of a future change that requires the default
namespace to not be applied to attributes.
Currently there is no real need to do this, since the serialization
used for tags and attributes is identical. But in the future, they
will use different serializations and we want to avoid defining
namespace prefixes for namespaces that are not going to appear
in the serialization.
This is in preparation of a future change that requires the default
namespace to not be applied to attributes.

Because the default namespace is no longer in the mapping, the first
generated synthetic namespace prefix is now always "ns0", while
previously it was "ns0" or "ns1" depending on whether a default
namespace was provided. I updated the expected serialization in
one test case to match this new behavior.
Previously, it returned a mapping from namespace to prefix. But later
the serialization code sorted that mapping by prefix, so inverting the
mapping simplifies things.

Another reason to invert the mapping is that currently there is a 1:1
correspondence between prefixes and namespaces, but in the future it
will be possible for the default namespace to exist both with an empty
prefix and with a actual prefix.
This is not my work: I extracted this test case from the second
iteration of wiml's patch for issue 17088.

https://bugs.python.org/file33125/bug17088_2.patch

I did modify the line wrapping, as the original patch went very wide
(over column 100). I also updated the prefix numbering in the expected
serialized output, since numbering always starts at 0 now.
Unprefixed attributes are considered to not be in any namespace,
according to the XML namespaces spec.

This also fixes issue 17088, since the exception that rejects
non-qualified names when using the default_namespace option is
no longer raised when serializing an attribute name.
When no default_namespace is passed, serialize_qname() returns the
same value regardless of whether is_attr is True or False. So we can
save some time by storing the serialization for both the tag and
attribute in the cache when one of them is computed.
@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

Comment on lines -871 to -872
except TypeError:
_raise_serialization_error(qname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's this part gone? Wouldn't this leak a TypeError to the user side?

elif isinstance(tag, str):
if tag not in qnames:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These conditions might have been there for performance reasons. Not sure if it matters, but that's up to some benchmarking I think.

add_qname(text.text)
return qnames, namespaces

prefix_map = {prefix: ns for ns, prefix in namespaces.items()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefixes don't have to be globally unique, so this might introduce bugs. (Suggests to me that there might be missing tests.)

Comment on lines +879 to +880
if not default_namespace:
ser_tag = ser_attr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems worth a comment in the code (likewise below).

@csabella
Copy link
Contributor

@mthuurne, please take a look at the code review. Thanks!

Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Apr 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants