-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Use less aggressive garbage collection #3045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Does this actually garbage collect the mpl objects? The artists, axes, and figures tend to end up with circular references. |
This is what happens in current matplotlib: http://nbviewer.ipython.org/urls/gist.githubusercontent.com/kmike/93102e3fdf75dfc29631/raw/matplotlib-gc3.ipynb This is the same example with no This is an example with So yes, I think
Numbers increase after each "hist" call followed by full There is a lot of moving parts (IPython, matplotlib, gc, various plotting methods, etc) so I may miss something, and the analysis can be incorrect. But running full garbage collection that user can't control is a bad decision IMHO. If leftovers from matplotlib is an issue it is always possible to run
|
@kmike: In your examples, can you clear the figure? An internal reference is still held to it in these examples... That should, at least by design, result in no leaks from the matplotlib side (though IPython may still hold references to the results of each cell). You might also try this testing outside of IPython to remove that as a factor when testing. |
@mdboom I can try to do it, but are you suggesting to keep |
|
My understanding is that the speed cost is due to large numbers of user objects making |
Yeah, that is my understanding too. I think the OP makes a good point. On Wed, May 7, 2014 at 1:27 PM, Thomas A Caswell
|
The docstring for gc.set_threshold() "explains" the generations and the collection scheme, but that still leaves me with a less-than-complete understanding. My interpretation is that collect(0) will look at objects that have not been checked previously; collect(1) will look at objects that have been checked exactly once; collect(2) at objects that have been checked twice or more; and collect() will look at all objects, by going through each of the three generation lists. It is not clear whether collect(1) actually operates on the generation 0 list and the generation 1 list, or only the latter. If only the latter, it would not seem to be very useful when used in isolation. |
I'd even remove It is true that unbounded memory leak is worse than a speed penalty, but now we have an O(N) speed penalty where N is a number of alive user objects, not a number of matplotlib objects, and the leak is not really a leak because objects will be eventually collected - in the worst case users can call A "leak" is in kilobytes of temporary allocated memory per chart (maybe megabytes in pathological cases); the speed penalty can be seconds of wait time for each executed IPython cell (see ipython/ipython#5795). |
I am in favor of merging this. I think this explains some issues I was having with my code at the end of grad school but due to the need to graduate didn't take the time to track down. |
Responding to @mdboom's last comment: Anything we can do to ensure prompt release of memory is a good move in general, but it doesn't address the OP's problem, which is that The problem addressed by this PR can be quite bad; I am in favor of giving it a try. It would certainly be good to have a clearer understanding of when, if ever in practice, it would lead to troublesome increases in memory consumption. My impression is that this should be very rare, so the tradeoff is worthwhile. |
Use less aggressive garbage collection
Matplotlib has a large number of circular references (between figure and manager, between axes and figure, axes and artist, figure and canvas, and ...) so when the user drops their last reference to a `Figure` (and clears it from pyplot's state), the objects will not immediately deleted. To account for this we have long (goes back to e34a333 the "reorganize code" commit in 2004 which is the end of history for much of the code) had a `gc.collect()` in the close logic in order to promptly clean up after our selves. However, unconditionally calling `gc.collect` and be a major performance issue (see matplotlib#3044 and matplotlib#3045) because if there are a large number of long-lived user objects Python will spend a lot of time checking objects that are not going away are never going away. Instead of doing a full collection we switched to clearing out the lowest two generations. However this both not doing what we want (as most of our objects will actually survive) and due to clearing out the first generation opened us up to having unbounded memory usage. In cases with a very tight loop between creating the figure and destroying it (e.g. `plt.figure(); plt.close()`) the first generation will never grow large enough for Python to consider running the collection on the higher generations. This will lead to un-bounded memory usage as the long-lived objects are never re-considered to look for reference cycles and hence are never deleted because their reference counts will never go to zero. closes matplotlib#23701
Matplotlib has a large number of circular references (between figure and manager, between axes and figure, axes and artist, figure and canvas, and ...) so when the user drops their last reference to a `Figure` (and clears it from pyplot's state), the objects will not immediately deleted. To account for this we have long (goes back to e34a333 the "reorganize code" commit in 2004 which is the end of history for much of the code) had a `gc.collect()` in the close logic in order to promptly clean up after our selves. However, unconditionally calling `gc.collect` and be a major performance issue (see matplotlib#3044 and matplotlib#3045) because if there are a large number of long-lived user objects Python will spend a lot of time checking objects that are not going away are never going away. Instead of doing a full collection we switched to clearing out the lowest two generations. However this both not doing what we want (as most of our objects will actually survive) and due to clearing out the first generation opened us up to having unbounded memory usage. In cases with a very tight loop between creating the figure and destroying it (e.g. `plt.figure(); plt.close()`) the first generation will never grow large enough for Python to consider running the collection on the higher generations. This will lead to un-bounded memory usage as the long-lived objects are never re-considered to look for reference cycles and hence are never deleted because their reference counts will never go to zero. closes matplotlib#23701
For me it fixes #3044.