Description
Solution
Bug summary
I work with a large 1D dataset. For a statistical analysis, I am using the bootstrap method, resampling it many times.
I am interested in looping over all cases in order to put together on a single figure a specific result for all resamplings.
Memory issues take place though (e.g. not freed before the very end of the script, or even leaks).
Here I document some things that at least partially address the issue. None is fully satisfactory though.
I am running the same script both from Python and as a Jupyter notebook (synchronised via jupytext). I am trying to get rid of the memory issues in both cases (the RAM usage easily reaches 16–32 GB once I start playing with enough data).
Code for reproduction
import matplotlib.pyplot as plt
import numpy as np
da = np.sort(np.random.random(int(3e6))) # Beware: please lower this if your system has less than 32 GB
def custom_plot(da, **kwargs):
"""da is a 1-D ordered xarray.DataArray or numpy array (containing tons of data)"""
plt.yscale('log')
n = len(da)
return plt.plot(da, 1 - (np.arange(1, n+1) / (n+1)), **kwargs);
def resampling_method(da, case):
"""
A complex thing in reality but for the MWE, let us simply return da itself.
It will lead to the same memory problem.
"""
return da
plt.figure(figsize=(15, 8), dpi=150)
plt.ylim((1e-10,1))
for case in np.arange(50):
custom_plot(resampling_method(da, case)) # each time getting the curve for a different resampling of da
custom_plot(da) # curve for the original da
plt.savefig("output.png")
plt.show()
import gc
gc.collect();
print("Technically the programme would continue with more calculations.")
print("Notice how the memory won't be freed however until the entire script is finished.")
import time
time.sleep(120) # This simulates the fact that the programme continues with other calculations.
print("Now the programme exits")
Actual outcome
Memory issues are taking place no matter what I've tried so far. Depending on what is being attempted, it can lead to the memory either not being freed after the plot has been shown/is closed, or even memory leaks and massive swap usage.
Expected outcome
Memory freed well before the end of the programme. I would expect it to be freed soon after the figure is closed.
Additional information
NB: I did also try many other things (incl. plt.cla
and the like), as well as changing backend (notably "Agg" and "Qt5Agg") but that did not solve the problem in the slightest, so I won't document them.
Things that have some effect
- If you do
plt.show()
- It will show the plot as a new window when run from the terminal with Python but the memory usage related to the figure won't be freed after that. It will remain in use until the end of the entire script.
- it will be freed in Jupyter soon after displaying the figure however.
plt.show()
import gc
gc.collect();
- If you use
block=False
,time.sleep
andclose('all')
, the memory will be freed after the plot has been created both with Jupyter and Python. However, in Python, a window will be created (stealing focus) and nothing will ever appear in it (it will be closed after 5 seconds). It'd therefore be tempting to comment outplt.show(block=False)
but if you do, Jupyter will no longer clear the memory...
plt.show(block=False) ## If you comment this out, then Jupyter will not clear the memory..
import time
time.sleep(5)
plt.close('all')
import gc
gc.collect();
- Given what precedes, let us check whether Jupyter or Python is being used.
def type_of_script():
"""source: https://stackoverflow.com/a/47428575/452522"""
try:
ipy_str = str(type(get_ipython()))
if 'zmqshell' in ipy_str:
return 'jupyter'
if 'terminal' in ipy_str:
return 'ipython'
except:
return 'terminal'
if type_of_script() == "jupyter":
plt.show(block=False) ## If you comment this out, then Jupyter will not clear the memory..
else:
pass
import time
time.sleep(5)
plt.close('all')
import gc
gc.collect();
With this:
- Jupyter will create a file and will also display the figure inline.
- Python will only create a file and won't try to show a window
Both will clean the memory after that figure has been closed (or 5 seconds after rather).
This is the most satisfactory one... not exactly nice though.
Should one want to have the figure displayed when running python from CLI however, I haven't found a method
where the memory wouldn't remain in use until the very end of the entire script.
Some further notes:
-
There are known memory issues with matplotlib and looping, such as: http://datasideoflife.com/?p=1443
but here I do not create a at each iteration of a loop, but accumulate plots from a loop and plot the end result. The solution put forward there (i.e. useplt.close(fig)
does not work in this case). -
This is also distinct from Memory leak in plt.close() when unshown figures in GUI backends are closed #20300
Operating system
Ubuntu
Matplotlib Version
3.7.3
Matplotlib Backend
module://matplotlib_inline.backend_inline (default)
Python version
3.8.10
Jupyter version
6.5.2
Installation
pip