Skip to content

[Bug]: Savefig slow with subplots #26150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Hvass-Labs opened this issue Jun 19, 2023 · 13 comments
Closed

[Bug]: Savefig slow with subplots #26150

Hvass-Labs opened this issue Jun 19, 2023 · 13 comments
Milestone

Comments

@Hvass-Labs
Copy link

Hvass-Labs commented Jun 19, 2023

Bug summary

There are apparently 3 problems which combine to make savefig slow: (1) The use of many sub-plots, (2) the use of bbox_inches='tight', and (3) the use of sharex='columns'. Unfortunately I need to use all three.

Code for reproduction

%matplotlib inline
from io import BytesIO
import numpy as np
from matplotlib.figure import Figure

# Random Number Generator.
rng = np.random.default_rng()

# Constants.
figsize = (10, 6)
ncols = 3
nrows = 10
size = 100
size_total = ncols * nrows * size

# Figure with many subplots.
fig_many = Figure(figsize=figsize)
axs_many = fig_many.subplots(ncols=ncols, nrows=nrows)

# Figure with many subplots and sharex='col'.
fig_many_sharex = Figure(figsize=figsize)
axs_many_sharex = fig_many_sharex.subplots(ncols=ncols, nrows=nrows, sharex='col')

# Figure with a single axes.
fig_single = Figure(figsize=figsize)
ax_single = fig_single.subplots()

# Helper-function: Generate random line-plots in the many subplots.
def generate_fig_many(axs):
    for row in range(nrows):
        for col in range(ncols):
            ax = axs[row, col]
            x = rng.normal(loc=row+1, scale=col+1, size=size)
            y = rng.normal(loc=col+1, scale=row+1, size=size)
            x = np.sort(x)
            ax.plot(x, y);
            ax.set_yticks([])

# Generate fig_many 
generate_fig_many(axs=axs_many)
fig_many.tight_layout()

# Generate fig_many_sharex
generate_fig_many(axs=axs_many_sharex)
fig_many_sharex.tight_layout()

# Generate fig_single
x = rng.normal(size=size_total)
y = rng.normal(size=size_total)
x = np.sort(x)
ax_single.plot(x, y);
fig_single.tight_layout()

# The following code-chunks were run in individual Jupyter cells.

%%timeit
stream = BytesIO()
fig_single.savefig(stream, format='svg')
s = stream.getvalue()
# 29.2 ms ± 168 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
stream = BytesIO()
fig_single.savefig(stream, format='svg', bbox_inches='tight')
s = stream.getvalue()
# 102 ms ± 6.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
stream = BytesIO()
fig_many.savefig(stream, format='svg')
s = stream.getvalue()
# 374 ms ± 4.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
stream = BytesIO()
fig_many.savefig(stream, format='svg', bbox_inches='tight')
s = stream.getvalue()
# 1.4 s ± 12.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
stream = BytesIO()
fig_many_sharex.savefig(stream, format='svg')
s = stream.getvalue()
# 565 ms ± 5.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
stream = BytesIO()
fig_many_sharex.savefig(stream, format='svg', bbox_inches='tight')
s = stream.getvalue()
# 2.22 s ± 20.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
stream = BytesIO()
fig_many_sharex.savefig(stream, format='jpg', bbox_inches='tight')
s = stream.getvalue()
# 2.17 s ± 21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
stream = BytesIO()
fig_many_sharex.savefig(stream, format='png', bbox_inches='tight')
s = stream.getvalue()
# 2.19 s ± 31.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Actual outcome

The test-results are summarized in this table, which are all for the SVG format. A few tests are made above for JPG and PNG formats and the results are similar.

Figure no bbox bbox=tight layout=constrained layout=tight
fig_single 29 102 30 30
fig_many 374 1,400 1,410 1,340
fig_many_sharex 565 2,220 2,220 2,110

Edit: Added time-usage for setting either layout='constrained' or 'tight' when creating the Figure objects.

Expected outcome

I would like it to run like this (you asked for a visual example).

Additional information

Thanks for making Matplotlib, I've been using it for many open-source projects in the past!

I am currently building a web-app where Matplotlib will be generating many SVG plots on a server that is running in the cloud. My own functions for generating the data are very fast, but unfortunately the plotting itself is very slow. For example, a figure with 3 columns and 10 rows of sub-plots takes 7 seconds to run savefig - even though most of the sub-plots only have a simple text-string such as "Same as previous", and the few other sub-plots are either line-plots or fill_between that are generated from just 100 data-points each.

I have tried simulating this problem in the sample code above, where fig_many has many sub-plots, and fig_single has a single plot with the same total number of data-points. I also tried using a profiler on this code, but it would take me forever to try and understand what the problem is in Matplotlib's code, and whether it's even fixable.

Please tell me if it might be possible to improve the speed, or if it's not possible then please explain the technical reason, and whether there is a work-around.

Thanks!

Operating system

Kubuntu 22

Matplotlib Version

3.7.1

Matplotlib Backend

module://matplotlib_inline.backend_inline

Python version

3.9.12

Jupyter version

6.4.12 (through VSCode)

Installation

pip

@rcomer
Copy link
Member

rcomer commented Jun 19, 2023

I am curious about why you use both tight_layout and bbox_inches="tight" as they are two different solutions for the same problem: tight_layout expands the content so that it fills up the figure. bbox_inches="tight" shrinks the figure so that it fits better around the content.

I did enjoy the video 😁

@Hvass-Labs
Copy link
Author

@rcomer because this is Matplotlib: You have to do redundant things at least twice! :-)

Seriously though, I don't think they're doing the exact same thing - at least when you're having many sub-plots, where tight_layout apparently also adjusts the space between the sub-plots, while bbox_inches='tight' only seems to expand the figure.

But maybe you're right that I could perhaps omit bbox_inches='tight' if I had already called tight_layout. I will have to experiment with that. Matplotlib's semantics are one of life's great and elusive mysteries.

Also note that tight_layout takes me 3 seconds to run, but there's a simple way to avoid that if the layout is mostly always the same, see #16550 (comment)

@jklymak
Copy link
Member

jklymak commented Jun 19, 2023

You can have two of fast, simple, and robust. simple and robust: constrained layout; fast and simple: manually placing the axes; fast and robust: come up with your own heuristics for when an axes needs to be resized.

In your case I'd go for fast and simple and hardcode the axes locations using subplotpars or manually.

Note that by calling plt.tight_layout() you are forcing an extra draw. Better to do fig, axs = plt.subplots(3, 10, layout='tight'), though preferred is layout='constrained'.

the only time bbox_inches='tight' may be useful is if your axes have a set aspect ratio or if you have things like a manually placed suptitle that are not part of the subplot layout. But you should just use constrained layout instead.

I'm going to close this, and suggest discussion at https://discourse.matplotlib.org as I don't think there are any bugs per-se here.

@jklymak jklymak closed this as not planned Won't fix, can't repro, duplicate, stale Jun 19, 2023
@tacaswell
Copy link
Member

It is odd that sharex takes longer.

@jklymak
Copy link
Member

jklymak commented Jun 19, 2023

Well maybe I closed too soon. It's possible the sharing code is going down a rabbit hole with so many axes.

@jklymak jklymak reopened this Jun 19, 2023
@jklymak
Copy link
Member

jklymak commented Jun 19, 2023

Let's reopen for the sharex issue. That does make things slower and I'm not sure why. I suspect some cross checking of bounds that is cascading each plot call.

@Hvass-Labs
Copy link
Author

@jklymak Thanks for the quick and detailed reply!

I don't think I can call plt.subplots(..., layout='constrained') because I am not calling through matplotlib.pyplot as plt but rather through a Figure object and it doesn't seem to have the argument layout in fig.subplots(..., layout='constrained'):

TypeError: subplots() got an unexpected keyword argument 'layout'

I use Figure objects in my actual application, because I am not plotting anything to screen, but rather creating an SVG string that I send to the user's web-browser. And I think I read somewhere in the Matplotlib docs that I should use Figure objects for efficiency, when not actually showing plots on screen.

The problem I am having with setting subplotpars manually, is that the number of subplots is generated dynamically. I suppose that I could try and calculate subplotspars for all combinations of nrows and ncols that I might need, but it would be a nightmare to do that, and I imagine that changes in axis-labels etc. might make those pre-calculated subplotspars slightly invalid.

So I really need a way to adjust the sub-plots automatically, to make the layout look nice and tight without too much white-space everywhere.

PS: I don't consider this to be a 'bug' per-se either, but there was no option to open this as a performance issue.

@rcomer
Copy link
Member

rcomer commented Jun 20, 2023

layout="constrained" is passed to Figure on instantiation.
https://matplotlib.org/stable/api/figure_api.html#matplotlib.figure.Figure

plt.subplots creates the Figure and then calls subplots on it, so it takes the arguments of both Figure.__init__ and Figure.subplots.

@Hvass-Labs
Copy link
Author

@rcomer thanks! I have updated the table above with time-usage for these. They're just as slow as bbox_inches='tight'. In my actual application they are significantly slower! And layout='constrained' doesn't always have uniformly sized white-spacing between sub-plots in different columns, e.g. try and plot fig_many above.

@tacaswell
Copy link
Member

My hunch of where to check is to see how many times we check get_window_extent on the tick labels.

@rcomer
Copy link
Member

rcomer commented Sep 24, 2023

#26899 seems to help with this. Using

%%timeit
stream = BytesIO()
fig_many.savefig(stream, format='svg', bbox_inches='tight')
s = stream.getvalue()

With main I get
2.35 s ± 88.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
with my branch I get
1.64 s ± 93.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Using

%%timeit
stream = BytesIO()
fig_many_sharex.savefig(stream, format='svg', bbox_inches='tight')
s = stream.getvalue()

with main
3.25 s ± 90.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
with my branch
2.31 s ± 46.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@rcomer
Copy link
Member

rcomer commented May 26, 2024

I believe I have understood why sharex is slower. When the x-axis is shared across the columns, the xtick labels are removed from all but the last row. So, for all but the last row, when we call ax.xaxis.get_ticks_positions() we get "unknown" because all of the other options involve the tick labels being visible either top or bottom. This causes a call to ax.xaxis.get_tightbbox() within ax._update_title_position, which just returns None, so is useless in this context.

If we add a line in the generate_fig_many loop to put the tick labels on top

ax.xaxis.set_ticks_position("top")

then fig_many.savefig(stream, format='svg', bbox_inches='tight') and fig_many_sharex.savefig(stream, format='svg', bbox_inches='tight') are equally slow, and slower than the existing sharex case. But if the tick labels are on top we do need the get_tightbbox call.

So I think it would be good to change how _update_title_position checks whether the ticklabels are on top. I'm not sure whether that should involve an extra return option in get_ticks_positions or whether we should just directly check the visibility of Tick.label2.

@rcomer
Copy link
Member

rcomer commented May 29, 2024

This was kept open for the question of why sharex caused savefig to take longer. Since #28300 savefig is quicker with sharex than without (~220 ms vs ~320 ms on my laptop, saving with bbox_inches='tight').

So I think this is now done. Please comment and/or re-open if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants