Skip to content

[Bug]: Memory not freed as expected after plotting heavy plot involving looping #27138

Closed as not planned
@spacescientist

Description

@spacescientist

Solution

#27138 (comment)

Bug summary

I work with a large 1D dataset. For a statistical analysis, I am using the bootstrap method, resampling it many times.

I am interested in looping over all cases in order to put together on a single figure a specific result for all resamplings.
Memory issues take place though (e.g. not freed before the very end of the script, or even leaks).

Here I document some things that at least partially address the issue. None is fully satisfactory though.
I am running the same script both from Python and as a Jupyter notebook (synchronised via jupytext). I am trying to get rid of the memory issues in both cases (the RAM usage easily reaches 16–32 GB once I start playing with enough data).

Code for reproduction

import matplotlib.pyplot as plt
import numpy as np

da = np.sort(np.random.random(int(3e6))) # Beware: please lower this if your system has less than 32 GB

def custom_plot(da, **kwargs):
    """da is a 1-D ordered xarray.DataArray or numpy array (containing tons of data)"""
    plt.yscale('log')
    n = len(da)
    return plt.plot(da, 1 - (np.arange(1, n+1) / (n+1)), **kwargs);

def resampling_method(da, case):
    """
    A complex thing in reality but for the MWE, let us simply return da itself.
    It will lead to the same memory problem.
    """
    return da

plt.figure(figsize=(15, 8), dpi=150)
plt.ylim((1e-10,1))

for case in np.arange(50):
    custom_plot(resampling_method(da, case)) # each time getting the curve for a different resampling of da
custom_plot(da) # curve for the original da

plt.savefig("output.png")
plt.show()

import gc
gc.collect();

print("Technically the programme would continue with more calculations.")
print("Notice how the memory won't be freed however until the entire script is finished.")
import time
time.sleep(120) # This simulates the fact that the programme continues with other calculations.
print("Now the programme exits")

Actual outcome

Memory issues are taking place no matter what I've tried so far. Depending on what is being attempted, it can lead to the memory either not being freed after the plot has been shown/is closed, or even memory leaks and massive swap usage.

Expected outcome

Memory freed well before the end of the programme. I would expect it to be freed soon after the figure is closed.

Additional information

NB: I did also try many other things (incl. plt.cla and the like), as well as changing backend (notably "Agg" and "Qt5Agg") but that did not solve the problem in the slightest, so I won't document them.

Things that have some effect

  1. If you do plt.show()
  • It will show the plot as a new window when run from the terminal with Python but the memory usage related to the figure won't be freed after that. It will remain in use until the end of the entire script.
  • it will be freed in Jupyter soon after displaying the figure however.
plt.show()
import gc
gc.collect();
  1. If you use block=False, time.sleep and close('all'), the memory will be freed after the plot has been created both with Jupyter and Python. However, in Python, a window will be created (stealing focus) and nothing will ever appear in it (it will be closed after 5 seconds). It'd therefore be tempting to comment out plt.show(block=False) but if you do, Jupyter will no longer clear the memory...
plt.show(block=False) ## If you comment this out, then Jupyter will not clear the memory..
import time
time.sleep(5)
plt.close('all')
import gc
gc.collect();
  1. Given what precedes, let us check whether Jupyter or Python is being used.
def type_of_script():
    """source: https://stackoverflow.com/a/47428575/452522"""
    try:
        ipy_str = str(type(get_ipython()))
        if 'zmqshell' in ipy_str:
            return 'jupyter'
        if 'terminal' in ipy_str:
            return 'ipython'
    except:
        return 'terminal'

if type_of_script() == "jupyter":
    plt.show(block=False) ## If you comment this out, then Jupyter will not clear the memory..
else:
    pass
import time
time.sleep(5)
plt.close('all')
import gc
gc.collect();

With this:

  • Jupyter will create a file and will also display the figure inline.
  • Python will only create a file and won't try to show a window
    Both will clean the memory after that figure has been closed (or 5 seconds after rather).
    This is the most satisfactory one... not exactly nice though.
    Should one want to have the figure displayed when running python from CLI however, I haven't found a method
    where the memory wouldn't remain in use until the very end of the entire script.

Some further notes:

Operating system

Ubuntu

Matplotlib Version

3.7.3

Matplotlib Backend

module://matplotlib_inline.backend_inline (default)

Python version

3.8.10

Jupyter version

6.5.2

Installation

pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions