Skip to content

gh-101438: Avoid reference cycle in ElementTree.iterparse. #114269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 17 additions & 10 deletions Lib/xml/etree/ElementTree.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@
import collections
import collections.abc
import contextlib
import weakref

from . import ElementPath

Expand Down Expand Up @@ -1223,13 +1224,14 @@ def iterparse(source, events=None, parser=None):
# parser argument of iterparse is removed, this can be killed.
pullparser = XMLPullParser(events=events, _parser=parser)

def iterator(source):
if not hasattr(source, "read"):
source = open(source, "rb")
close_source = True
else:
close_source = False

def iterator(source):
try:
if not hasattr(source, "read"):
source = open(source, "rb")
close_source = True
yield None
while True:
yield from pullparser.read_events()
# load event buffer
Expand All @@ -1239,18 +1241,23 @@ def iterator(source):
pullparser.feed(data)
root = pullparser._close_and_return_root()
yield from pullparser.read_events()
it.root = root
it = wr()
if it is not None:
it.root = root
finally:
if close_source:
source.close()

class IterParseIterator(collections.abc.Iterator):
__next__ = iterator(source).__next__
it = IterParseIterator()
it.root = None
del iterator, IterParseIterator

next(it)
def __del__(self):
if close_source:
source.close()

it = IterParseIterator()
wr = weakref.ref(it)
del IterParseIterator
Comment on lines +1258 to +1260
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious, why previously both iterator and IterParseIterator names were deleted, but now only IterParseIterator? And what is the purpose of this statement in the first place? I was thinking that iterator.__closure__ stores references to these objects; therefore, unnecessary references should be deleted. However, as per my checks, closure stores only referenced variables inside; pullparser, close_source and wr in this case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed that it.root = None was deleted. This fact is not documented, but this may still cause unintended errors on the user side if they use root.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right about the it.root = None. I did not intend a behavioral change here, so it seems like a good idea to add it back.

I don't think the del statements matter one way or the other. They look like they break a cycle, but not really, but they also are harmless.

return it


Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Avoid reference cycle in ElementTree.iterparse. The iterator returned by
``ElementTree.iterparse`` may hold on to a file descriptor. The reference
cycle prevented prompt clean-up of the file descriptor if the returned
iterator was not exhausted.