-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
ElementTree.iterparse "leaks" file descriptor when not exhausted #101438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think this would require something like PEP 533 “Deterministic cleanup for iterators“ to solve. |
In this case, I think you can fix it just by avoiding reference cycles that involve the generator. Below is an example refactoring of def iterparse(source, events=None, parser=None):
# Use the internal, undocumented _parser argument for now; When the
# parser argument of iterparse is removed, this can be killed.
pullparser = XMLPullParser(events=events, _parser=parser)
_root = None
def iterator(source):
nonlocal _root
close_source = False
try:
if not hasattr(source, "read"):
source = open(source, "rb")
close_source = True
yield None
while True:
yield from pullparser.read_events()
# load event buffer
data = source.read(16 * 1024)
if not data:
break
pullparser.feed(data)
root = pullparser._close_and_return_root()
yield from pullparser.read_events()
_root = root
finally:
if close_source:
source.close()
class IterParseIterator(collections.abc.Iterator):
def __init__(self, source):
self.it = iterator(source)
def __next__(self):
return next(self.it)
@property
def root(self):
nonlocal _root
return _root
it = IterParseIterator(source)
del iterator, IterParseIterator
next(it)
return it This avoids two reference cycles that captured the generator created by
I'm not sure about a few things including:
EDIT: the problem is not capturing of the file descriptor, but the generator |
@colesbury Doe the issue persist if you add |
@rhettinger everything is cleaned up properly once the garbage collector runs. So adding a |
Refactor IterParseIterator to avoid a reference cycle between the iterator() function and the IterParseIterator() instance. This leads to more prompt clean-up of the "source" file if the returned iterator is not exhausted and not otherwise part of a reference cycle. This also avoids a test failure in the GC implementation for the free-threaded build: if the "source" file is finalized before the "iterator()" generator, a ResourceWarning is issued leading to a failure in test_iterparse(). In theory, this warning can occur in the default build as well, but is much less likely because it would require an unlucky scheduling of the GC between creation of the generator and the file object in order to change the order of finalization.
The iterator returned by ElementTree.iterparse() may hold on to a file descriptor. The reference cycle prevented prompt clean-up of the file descriptor if the returned iterator was not exhausted.
…honGH-114269) The iterator returned by ElementTree.iterparse() may hold on to a file descriptor. The reference cycle prevented prompt clean-up of the file descriptor if the returned iterator was not exhausted. (cherry picked from commit ce01ab5) Co-authored-by: Sam Gross <colesbury@gmail.com>
…honGH-114269) The iterator returned by ElementTree.iterparse() may hold on to a file descriptor. The reference cycle prevented prompt clean-up of the file descriptor if the returned iterator was not exhausted. (cherry picked from commit ce01ab5) Co-authored-by: Sam Gross <colesbury@gmail.com>
…-114269) (GH-114499) The iterator returned by ElementTree.iterparse() may hold on to a file descriptor. The reference cycle prevented prompt clean-up of the file descriptor if the returned iterator was not exhausted. (cherry picked from commit ce01ab5) Co-authored-by: Sam Gross <colesbury@gmail.com>
…-114269) (GH-114500) The iterator returned by ElementTree.iterparse() may hold on to a file descriptor. The reference cycle prevented prompt clean-up of the file descriptor if the returned iterator was not exhausted. (cherry picked from commit ce01ab5) Co-authored-by: Sam Gross <colesbury@gmail.com>
…honGH-114269) The iterator returned by ElementTree.iterparse() may hold on to a file descriptor. The reference cycle prevented prompt clean-up of the file descriptor if the returned iterator was not exhausted.
…honGH-114269) The iterator returned by ElementTree.iterparse() may hold on to a file descriptor. The reference cycle prevented prompt clean-up of the file descriptor if the returned iterator was not exhausted.
The PR #31696 attempts to fix the "leak" of file descriptors when the iterator is not exhausted. That PR fixes the warning, but not the underlying issue that the files aren't closed until the next tracing garbage collection cycle.
Note that there isn't truly a leak of file descriptors. The file descriptors are eventually closed when the file object is finalized (at cyclic garbage collection). The point of the
ResourceWarning
(in my understanding) is that waiting until the next garbage collection cycle means that you may temporarily have a lot of unwanted open file descriptors, which could exhaust the global limit or prevent successful writes to those files on Windows.On my system, after lowering the file descriptor limit to 1000 (via
ulimit -Sn 1000
) I get:Linked PRs
The text was updated successfully, but these errors were encountered: