Skip to content

legend is eating up huge amounts of memory #19345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brennmat opened this issue Jan 23, 2021 · 12 comments
Closed

legend is eating up huge amounts of memory #19345

brennmat opened this issue Jan 23, 2021 · 12 comments
Milestone

Comments

@brennmat
Copy link

brennmat commented Jan 23, 2021

Bug report

Bug summary

I have a Python program that gets data from a measurement instrument and plots the data using matplotlib. A separate thread is used to trigger updating of the plots at fixed time intervals.

After a while, the program will take up huge amounts of memory (gigabytes after a few hours). The simplified code below is a self-contained example that illustrates the effect. The issue does not happen if the code is modified to skip the axes.legend(...) call on line 92, i.e., if no legend is drawn.

If I put some pressure on the system by running another application that will consume a lot of memory, my Python program will at some point start to release the excessive memory used up by matplotlib (ending up at about 60 MB or so). Releasing the memory does not seem to have any negative effects on the operation of my program. This tells me that the large junk of memory used by matplotlib is not vital (if not useless) in my application.

I believe the unnecessary memory usage related to the legend is a bug, or at least something very obscure that seems wrong to me. How can this issue be avoided or fixed?

Code for reproduction

import wx
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.backends.backend_wxagg import FigureCanvasWxAgg as FigureCanvas
import time
from random import random
from threading import Thread
from setproctitle import setproctitle


print ( 'Matplotlib version: ' + matplotlib.__version__ )


class measurement_instrument():
# measurement instrument

	def __init__(self):
		self.t = []
		self.val = []
		self.max_values = 1000
		
	def read(self):
		# get new data from the instrument:
		self.t.append( time.time() )
		self.val.append( random() )
		
		# truncate data length to max_values:
		self.t = self.t[-self.max_values:]
		self.val = self.val[-self.max_values:]


class measurement_thread(Thread):
# background thread that takes measurements at fixed time intervals

	def __init__(self, instrument):
		Thread.__init__(self)
		self._instrument = instrument
		self.start()
		
	def run(self):
		while True:
			self._instrument.read()
			time.sleep(0.01)


class plots_frame(wx.Frame):

	def __init__(self,instrument):

		# Access to measurement instrument (to get data for plotting):
		self._instrument = instrument
		
		# Create the window:		
		wx.Frame.__init__(self, parent=None, title='Instrument Data', style=wx.DEFAULT_FRAME_STYLE)
		
		# Create a wx panel to hold the plots
		p  = wx.Panel(self)
		p.SetBackgroundColour(wx.NullColour)
		
		# Set up figure:
		self.figure = plt.figure()
		self.axes = self.figure.add_subplot(1,1,1)
		self.axes.grid()	
		self.canvas = FigureCanvas(p, -1, self.figure)

		# wx sizer / widget arrangement within the window:
		sizer = wx.BoxSizer(wx.VERTICAL)
		sizer.Add(self.canvas, 1, wx.EXPAND)
		p.SetSizer(sizer) # apply the sizer to the panel
		sizer.SetSizeHints(self) # inform frame about size hints for the panel (min. plot canvas size and stuff)

		# install wx.Timer to update the data plot:
		self.Bind(wx.EVT_TIMER, self.on_plot_data) # bind to TIMER event
		self.plot_timer = wx.Timer(self) # timer to trigger plots
		wx.CallAfter(self.plot_timer.Start,100)
		
		# Show the wx.Frame
		self.Show()

	def on_plot_data(self,event):

		# Remove all the lines that are currently in the plot:
		while len(self.axes.lines) > 0: self.axes.lines[0].remove()
		
		# Plot the instrument data:
		self.axes.plot(self._instrument.t, self._instrument.val, label='data', color='r' )

		# x-axis limits:
		self.axes.set_xlim( [ self._instrument.t[0], self._instrument.t[-1] ] )

		# Legend (commenting out this line will avoid the memory issue!):
		self.axes.legend(loc='best')

		# Refresh the plot window:
		self.canvas.draw()
		self.Refresh()


# main:
setproctitle('memory_looser')
app = wx.App()
instrument   = measurement_instrument()
dataplots    = plots_frame(instrument)
measurements = measurement_thread(instrument)

app.MainLoop()

Matplotlib version

  • Debian Unstable
  • Matplotlib version 3.3.3
  • Python version: 3.9.1

Matplotlib and all other software used here was installed from Debians repositories, using the Debian package management system.

The attached screenshot shows the memory situation after running the code with and without the legend for a few minutes. The version without the legend will stay at the memory usage shown, whereas the memory usage of version with the legend will increase until it approaches the limits of the system.
Screenshot from 2021-01-23 10-46-19

@brennmat brennmat changed the title legend is eating up huge amounts of memory! legend is eating up huge amounts of memory Jan 23, 2021
@jklymak
Copy link
Member

jklymak commented Jan 23, 2021

You are removing the lines each call but not the legend so you end up w thousands of legends.

I'm going to close but feel free to ask for more help at discourse.matplotlib.org

@jklymak jklymak closed this as completed Jan 23, 2021
@brennmat
Copy link
Author

brennmat commented Jan 23, 2021

I already tried with removing the legend every time the plot gets updated, but that did not help in any way. Also, I'd consider it a bug if a plot could have more than one legend.

Should I submit a new bug report, or will you re-open this issue?

@jklymak
Copy link
Member

jklymak commented Jan 23, 2021

Again I think discourse would be the more appropriate place to discuss until we know what the problem is. Certainly Matplotlib was not designed around for legend to be called thousands of times. If you are just trying to animate a line you should probably not be removing it, but rather setting the x and y data, in which case the legend is called upon setup rather than at each draw

@mbrennwa
Copy link

Ok, I posted this at https://discourse.matplotlib.org/t/legend-is-eating-up-a-lot-of-memory

@anntzer
Copy link
Contributor

anntzer commented Jan 24, 2021

I can reproduce the leak (which is quite slow, you need to run the code for minutes at least to see it). https://pypi.org/project/memory-profiler/ is a good way to measure such things.
Interestingly, calling gc.collect() (the usual remedy in this case) is not sufficient to make the leak go away.

One thing to try would be first to check whether this is wx-specific, or whether this also happens with other GUI toolkits.

@jklymak
Copy link
Member

jklymak commented Jan 24, 2021

As discussed on https://discourse.matplotlib.org/t/legend-is-eating-up-a-lot-of-memory it would be nice to strip this from the app and see if the problem persists.

@mbrennwa
Copy link

Here is a very small example without wx. Just try running it with / without the ax.legend() call and see for yourself.

import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
ax  = fig.add_subplot(1,1,1)
plt.ion()

while True:
	while len(ax.lines) > 0: ax.lines[0].remove()
	ax.plot(np.random.rand(100), label='data')
	ax.legend()
	plt.pause(0.01)

@jklymak
Copy link
Member

jklymak commented Jan 25, 2021

Thanks a lot for the minimal example.

I agree that legend seems to be leaking. Even leg.remove() does not seem to free up the memory. ax.clear() does free up the memory. I guess that means that legend is modifying something on the axes that leg.remove() doesn't catch, but ax.clear() does. A quick skim of the code doesn't tell me what that may be.

@anntzer
Copy link
Contributor

anntzer commented Jan 26, 2021

I believe I figured out at least some of the problem: we're not clearing out unit-handling callbacks properly when gc'ing Line2Ds. At least the following should help; can you confirm?

diff --git i/lib/matplotlib/cbook/__init__.py w/lib/matplotlib/cbook/__init__.py
index 31055dcd7..755a4c8ea 100644
--- i/lib/matplotlib/cbook/__init__.py
+++ w/lib/matplotlib/cbook/__init__.py
@@ -217,7 +217,8 @@ class CallbackRegistry:
             return
         for signal, proxies in list(self._func_cid_map.items()):
             try:
-                del self.callbacks[signal][proxies[proxy]]
+                cid = proxies.pop(proxy)
+                del self.callbacks[signal][cid]
             except KeyError:
                 pass
             if len(self.callbacks[signal]) == 0:
@@ -241,6 +242,7 @@ class CallbackRegistry:
                         if value == cid:
                             del functions[function]
                 return
+        self._pickled_cids.discard(cid)
 
     def process(self, s, *args, **kwargs):
         """

@tacaswell tacaswell added this to the v3.4.0 milestone Jan 26, 2021
@mbrennwa
Copy link

I do not know how to apply this code modification, so I cannot test this. I guess others who are more into the code should be able to help.

@vallsv
Copy link
Contributor

vallsv commented Feb 8, 2021

@mbrennwa thanks a lot for the example, was very easy to check

I did #19480 to fix another leak, which is in the end the same problem

Thanks @anntzer to highlight me that.

Really weird to be stucked at the same time, from 2 different issues, to a bug which was mostly always in the base code.

@QuLogic
Copy link
Member

QuLogic commented Feb 19, 2021

Should be fixed by #19480.

@QuLogic QuLogic closed this as completed Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants