-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
FIX: do not try to help CPython with garbage collection #23712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Matplotlib has a large number of circular references (between figure and manager, between axes and figure, axes and artist, figure and canvas, and ...) so when the user drops their last reference to a `Figure` (and clears it from pyplot's state), the objects will not immediately deleted. To account for this we have long (goes back to e34a333 the "reorganize code" commit in 2004 which is the end of history for much of the code) had a `gc.collect()` in the close logic in order to promptly clean up after our selves. However, unconditionally calling `gc.collect` and be a major performance issue (see matplotlib#3044 and matplotlib#3045) because if there are a large number of long-lived user objects Python will spend a lot of time checking objects that are not going away are never going away. Instead of doing a full collection we switched to clearing out the lowest two generations. However this both not doing what we want (as most of our objects will actually survive) and due to clearing out the first generation opened us up to having unbounded memory usage. In cases with a very tight loop between creating the figure and destroying it (e.g. `plt.figure(); plt.close()`) the first generation will never grow large enough for Python to consider running the collection on the higher generations. This will lead to un-bounded memory usage as the long-lived objects are never re-considered to look for reference cycles and hence are never deleted because their reference counts will never go to zero. closes matplotlib#23701
This might also fix #22448. |
Not to completely nerd-snipe this issue, but I had wondered some time ago (basically when thinking about this gc.collect call, although I certainly didn't go through all the analysis you did!) whether CPython could have an API like |
I'm also not sure. I do not think there is currently enough information to do that but am not clear on how much more you would have to track to get there. My instinct is that this would add more overhead in cases where it is not used than you would ever get back in the cases when it is used. I think a Python exposed function for "run the full collection, but only if you would have otherwise" is what we really want here. The other thing that just connected in my brain is that dictionaries (which I assume includes instance dictionaries) are only swept in the oldest generation (which is why our objects all survived the I had a thought about being aggressive about calling fig, ax = plt.subplots()
ax.plot(...)
plt.show(block=True)
fig.savefig(...) Similarly, any way I can think of putting in weak references is going to break someone terribly. |
Perhaps we could call |
That is reasonable. It will be a bunch of work, but something like
might be a deprecation path to that? That said, it leaves a weird inconsistency between doing |
Doesn't the inline backend auto-close figures? I'm pretty sure many people use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably target 3.6 on this and not 3.5.4.
…rbage collection
…712-on-v3.6.x Backport PR #23712 on branch v3.6.x (FIX: do not try to help CPython with garbage collection)
Matplotlib has a large number of circular references (between figure and
manager, between axes and figure, axes and artist, figure and canvas, and ...)
so when the user drops their last reference to a
Figure
(and clears it frompyplot's state), the objects will not immediately deleted.
To account for this we have long (goes back to
e34a333 the "reorganize code" commit in 2004
which is the end of history for much of the code) had a
gc.collect()
in theclose logic in order to promptly clean up after our selves.
However, unconditionally calling
gc.collect
and be a major performanceissue (see #3044 and
#3045) because if there are a
large number of long-lived user objects Python will spend a lot of time
checking objects that are not going away are never going away.
Instead of doing a full collection we switched to clearing out the lowest two
generations. However this both not doing what we want (as most of our objects
will actually survive) and due to clearing out the first generation opened us
up to having unbounded memory usage.
In cases with a very tight loop between creating the figure and destroying
it (e.g.
plt.figure(); plt.close()
) the first generation will never growlarge enough for Python to consider running the collection on the higher
generations. This will lead to un-bounded memory usage as the long-lived
objects are never re-considered to look for reference cycles and hence are
never deleted because their reference counts will never go to zero.
closes #23701
I'm not sure how to test this, I do not want to put a maybe memory exhausting test in. There might be something that can be done using gc's logging / debugging / callback hooks.