Skip to content

bpo-37399: Correctly attach tail text to the last element/comment/pi #14856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 24, 2019

Conversation

scoder
Copy link
Contributor

@scoder scoder commented Jul 19, 2019

Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.

https://bugs.python.org/issue37399

… even when comments or pis are discarded.

Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
@dimaqq
Copy link
Contributor

dimaqq commented Jul 23, 2019

I think it would be nice to add a test case for Whitespace Processing, something like:

<a>text
  <!-- comment -->
  tail</a>

I can't tell off the top of my head what the text value should be :)
Apparently Py3.7 just yanks the comment and leaves a.text == "text\n \n tail" (GitHub renders this incorrectly, it's -newline-space-space-newline-space-space-)

Ref: e.g. https://docstore.mik.ua/orelly/xml/schema/ch04_02.htm

@dimaqq
Copy link
Contributor

dimaqq commented Jul 23, 2019

My compiler complains of logical ops not being explicit enough. Personally I think it's silly, but the rest of CPython code base uses explicit parentheses in such cases, if ((a && b) || c).

/Users/user.surnam/cpython/Modules/_elementtree.c:3639:35: warning: '&&' within '||' [-Wlogical-op-parentheses]
        if (target->events_append && target->pi_event_obj || target->insert_pis) {
            ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~ ~~
/Users/user.surnam/cpython/Modules/_elementtree.c:3639:35: note: place parentheses around the '&&' expression to silence this warning
        if (target->events_append && target->pi_event_obj || target->insert_pis) {
                                  ^

@scoder
Copy link
Contributor Author

scoder commented Jul 23, 2019

Thanks @dimaqq, I followed both of your suggestions.

@dimaqq
Copy link
Contributor

dimaqq commented Jul 24, 2019

Hi Stefan, I'm afraid there's a memory leak, consider this program:

from xml.etree import ElementTree
while True: ElementTree.fromstring("<a>text<!--c1-->a<!--c2-->b<!--c3-->foo</a>").text

d395209, common ancestor with master branch: ~5MB, stable
21cd0bd bpo-37399_missing_tail branch: grows to ~800MB in about a minute, probably unbounded

@ZackerySpytz
Copy link
Contributor

I tested this PR locally, and I see the following when ./python -m test -R 3:3 test_xml_etree_c is run:

Run tests sequentially
0:00:00 load avg: 1.40 [1/1] test_xml_etree_c
beginning 6 repetitions
123456
......
test_xml_etree_c leaked [20, 8, 20] references, sum=48
test_xml_etree_c leaked [8, 6, 8] memory blocks, sum=22
test_xml_etree_c failed

== Tests result: FAILURE ==

1 test failed:
    test_xml_etree_c

Total duration: 3 sec 50 ms
Tests result: FAILURE

@scoder
Copy link
Contributor Author

scoder commented Jul 24, 2019

Thanks for testing and reporting the leak. I resolved it, and also cleaned up a couple of other places to make them clearer.

@scoder scoder merged commit c6cb4cd into python:master Jul 24, 2019
@miss-islington
Copy link
Contributor

Thanks @scoder for the PR 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry, @scoder, I could not cleanly backport this to 3.8 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker c6cb4cdd21c0c3a09b0617dbfaa7053d3bfa6def 3.8

scoder added a commit to scoder/cpython that referenced this pull request Jul 24, 2019
…nt/pi (pythonGH-14856)

* bpo-37399: Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
@scoder scoder deleted the bpo-37399_missing_tail branch July 24, 2019 18:25
@bedevere-bot
Copy link

GH-14936 is a backport of this pull request to the 3.8 branch.

scoder added a commit that referenced this pull request Jul 24, 2019
…nt/pi (GH-14856) (GH-14936)

* bpo-37399: Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
lisroach pushed a commit to lisroach/cpython that referenced this pull request Sep 10, 2019
…ythonGH-14856)

* bpo-37399: Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
DinoV pushed a commit to DinoV/cpython that referenced this pull request Jan 14, 2020
…ythonGH-14856)

* bpo-37399: Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
websurfer5 pushed a commit to websurfer5/cpython that referenced this pull request Jul 20, 2020
…ythonGH-14856)

* bpo-37399: Correctly attach tail text to the last element/comment/pi, even when comments or pis are discarded.
Also fixes the insertion of PIs when "insert_pis=True" is configured for a TreeBuilder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip news type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants