-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[Bug]: plt.savefig leaks the stack #25406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thank you for the clear minimal example @CorentinJ. I just tried this with the |
I don't know that something which is cleaned up by If you are in a memory constrained environment (such as GP-GPU compute), calling Note that in both of the prior issues you link to, they actively call If it is easy to find/fix, I'd be all for explicitly breaking the object cycle/garbage collecting it to avoid the problem. (e.g. if I have been unable to reproduce on my machine (which is linux, so does differ) on py 3.9, 3.10, or 3.11. This makes investigating where cycles exist basically impossible on my machine. The first pass test I was going to do was to see what type of objects are referring to import gc
import matplotlib
matplotlib.use("TkAgg")
import matplotlib.pyplot as plt
class SomeObject:
pass
def save_plots():
plt.plot([1, 2, 3])
plt.savefig("test.png")
def leaky():
some_object = SomeObject()
save_plots()
leaky()
for obj in gc.get_objects():
try:
if isinstance(obj, SomeObject):
print("Object is leaking")
print([type(i) for i in gc.get_referrers(obj)])
except:
pass |
Please see #23712, the discussion in the linked issues, and https://matplotlib.org/stable/users/prev_whats_new/whats_new_3.6.0.html#garbage-collection-is-no-longer-run-on-figure-close. In this example you are creating and holding open a The other thing that may be happening to make this seem non-deterministic is that the algorithm that Python uses to decide when to do the full cyclic sweep depends on how many objects are around and various water marks being crossed. It is possible that in some cases a cyclic sweep is triggered and in others it is not. If you throw in something like
after you call
That is not at all what |
I did reproduce on Linux (Ubuntu), but I just realised my 3.10 environment had been hanging around for a few months. So I created a fresh one and could not reproduce with that. I’m looking at 3.10.9 vs 3.10.6. @CorentinJ are you able to try a fresh python install? |
I can repro on Py3.11+macOS+{agg,qtagg,tkagg,mplcairo.{base,qt,tk}}. |
@anntzer does ensuring the figure is actually closed or burning a bunch of bytecode before checking affect anything? That you can get it with just My other guesses:
|
This will reliably leak for me: import gc
import matplotlib.pyplot as plt
import inspect
import sys
class SomeObject:
pass
def save_plots():
plt.plot([1, 2, 3])
plt.savefig("test.png")
def leaky():
cf = inspect.currentframe()
print(f'pre: {sys.getrefcount(cf)}')
some_object = SomeObject()
print(f'pre: {sys.getrefcount(cf)}')
save_plots()
print(f'post: {sys.getrefcount(cf)}')
leaky()
for obj in gc.get_objects():
if isinstance(obj, SomeObject):
print("Object is leaking") and adding Does diff --git a/lib/matplotlib/_api/__init__.py b/lib/matplotlib/_api/__init__.py
index f4c390a198..5e0a7c1701 100644
--- a/lib/matplotlib/_api/__init__.py
+++ b/lib/matplotlib/_api/__init__.py
@@ -385,4 +385,5 @@ def warn_external(message, category=None):
frame.f_globals.get("__name__", "")):
break
frame = frame.f_back
+ del frame
warnings.warn(message, category, stacklevel) help? |
Explicitly plt.close()ing the figure doesn't fix the problem. Edit: Good idea looking at warn_external, but that didn't help either. |
I don't really understand all this and was just trying it to see if I could better understand. However, not importing Matplotlib and doing
also reports "Object is leaking" in the test #25406 (comment) |
@jklymak I was about to say something very similar (I just commented out the call rather than making it return) That said, I took that as an example of how the behavior may have arisen internally (e.g. via |
Okay, behavior that involves stack frames and only matplotlib/lib/matplotlib/cbook.py Lines 62 to 69 in fb356ed
|
diff --git a/lib/matplotlib/cbook.py b/lib/matplotlib/cbook.py
index c9699b2e21..46a17d4d8f 100644
--- a/lib/matplotlib/cbook.py
+++ b/lib/matplotlib/cbook.py
@@ -67,6 +67,7 @@ def _get_running_interactive_framework():
if frame.f_code in codes:
return "tk"
frame = frame.f_back
+ del frame
macosx = sys.modules.get("matplotlib.backends._macosx")
if macosx and macosx.event_loop_is_running():
return "macosx" While I'm still unable to reproduce the original bug, I suspect this diff may resolve it (same idea as |
I can't reproduce this on TkAgg or QtAgg, but I can in Gtk[34]. So I skipped the if isinstance(obj, SomeObject):
print("Object is leaking")
while (val := input('continue?').strip().lower()) not in ('n', 'no', 'q', 'quit'):
if val == 'break':
breakpoint()
referrers = [r for r in gc.get_referrers(obj) if not isinstance(r, (dict, list))]
if not referrers:
break
print(referrers)
obj = referrers[0] which gives me:
which is a traceback from an exception in Pillow. And it loops from that point on, so it appears to be a reference cycle. I'm not really sure why we think this is a Matplotlib problem. |
Actually I may have figured out why the leak is tricky to reproduce: this may depend on mathtext, because it only appears for me when using my default matplotlibrc, which has import gc
import matplotlib.pyplot as plt; plt.rcdefaults()
class SomeObject:
pass
def leaky():
some_object = SomeObject()
fig = plt.figure()
fig.text(.5, .5, "$0.1$") # no leak if using "0.1" i.e. no mathtext.
fig.canvas.draw()
leaky()
for obj in gc.get_objects():
try:
if isinstance(obj, SomeObject):
print("Object is leaking")
except:
pass |
This may be related to @anntzer 's example. First of all, I reproduce the same behavior that they posted above in Python 3.10.8 and MPL 3.6.2. Except if I prepend The comment led me to find the cause of a leak on a personal, non-minimal example. In my scenario, explicitly setting the font family to Palatino (provided by my local tex package) seems to prevent |
Combining the investigation from @QuLogic and @anntzer shows that the exception that is causing the cycle is still in an upstream package for that more reliable test case (that I can reproduce). This time it is an exception in
|
@anntzer How can one enable/disable mathtext programmatically? |
I found a similar leak with It does not occur on my windows machine this time, it's an Ubuntu one.
import gc
import matplotlib.pyplot as plt
plt.rc("font", family="consolas")
class SomeObject:
pass
def make_subplots():
plt.subplots()
def leaky_fn():
x = SomeObject()
make_subplots()
leaky_fn()
def detect_leak():
for obj in gc.get_objects():
try:
if isinstance(obj, SomeObject):
print("Object is leaking")
except:
pass
# First without gc.collect()
detect_leak()
# Still leaks after gc.collect()!
gc.collect()
detect_leak() |
In this instance the keeping of the underlying exception alive is actually intentional. It comes from the memoisation of font lookup returns as "consolas" is not actually found. Ref: matplotlib/lib/matplotlib/font_manager.py Lines 1288 to 1293 in 370547e
matplotlib/lib/matplotlib/font_manager.py Lines 1389 to 1392 in 370547e
matplotlib/lib/matplotlib/font_manager.py Lines 1442 to 1446 in 370547e
Now... perhaps in this instance it is possible to cut out the full stack reference from the returned exception? It would make some sense to me that a memoised exception should perhaps have a limited traceback... hard to say though, because the direct place it is used it is raised, and don't necessarily want to rob users of that context when it is accurate, but when it is reraised, that could be outright misleading (plus the stack leaking, which is not ideal)? |
I'm working on a fix for this (I put this particular bug in attempting to fix a performance regression). |
This leads to caching the tracebacks which can keep user's objects in local namespaces alive indefinitely. This can lead to very surprising memory issues for users and will result in incorrect tracebacks. Responsive to matplotlib#25406
@CorentinJ Are you able to check if the current main branch works in your actual application? |
I can confirm that the stack leak caused by the exception in setting the font is resolved on the current main ( |
so at this point we think there are frame loops in:
and we have fixed 1 definite problem (memoizing the error is finding fonts) and a 3 maybe problems (manually deleting frames just to be safe) in the mpl code. Has anyone created issues upstream with pillow and pyparsing (and if so can we have xrefs)? @CorentinJ do you have mathtext turned on by default? From reading back through this chain I am not confident we understand why not everyone can reproduce the issue with the OP code. |
From #25569, I noticed that Pillow 9.2 presents the problem with the test included by #25470 but at least that specific instance was resolved in Pillow 9.4. (That said, I was one of the ones having trouble reproducing originally, so slightly confused as to how I had the version of Pillow that was failing on my system, but may have been looking into other things at some point and installed a different version) @CorentinJ can you check the version of Pillow you have installed and see if that form of this bug remains if you update to latest Pillow? If fixed in latest, I don't think there is any action to be taken on the Pillow front (I don't see a need to e.g. block versions of Pillow over this as they still broadly work, just missing some optimization) Pyparsing is another story, should perhaps try to come up with a minimal (non-matplotlib) example of that form and report upstream if it has not been done already. |
Bug summary
Calling
plt.savefig
leaks local variables from parent calling functions. These can be found ingc.get_objects()
and are cleared bygc.collect()
Code for reproduction
Actual outcome
Object is leaking
Expected outcome
Nothing should be printed.
Additional information
Figure.savefig
was calledExitStack
withinsavefig
is meant to hack around this issue but is somehow not working in my case?agg
backendgc.collect()
. Most users won't be aware of the issue. Given the prevalence of deep learning these days, it's likely objects leaking into GPU memory will be a common problem due to this bug.Operating system
Win10 x64
Matplotlib Version
3.7.1
Matplotlib Backend
TkAgg
Python version
3.10.0
Jupyter version
No response
Installation
pip
The text was updated successfully, but these errors were encountered: