-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
fig.savefig gives OverflowError for high DPI's #19325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can reproduce that, but it works fine with a much more modest chunk size of 2000. By chunking with N larger than your data set, you are basically not chunking. I'll be honest, I don't know why you get an overflow. The limit is somewhere near 29350 on my machine before I have to chunk. I am somewhat suspicious we have a short-int sized buffer in there somewhere, which seems unnecessary, but maybe that is an agg limitation. No doubt @anntzer will tell us that cairo has no such limitation :-) |
I think(?) getting rid of that limit is just a matter of diff --git i/src/_backend_agg.cpp w/src/_backend_agg.cpp
index 6ae11167e..565cc9a62 100644
--- i/src/_backend_agg.cpp
+++ w/src/_backend_agg.cpp
@@ -45,7 +45,7 @@ RendererAgg::RendererAgg(unsigned int width, unsigned int height, double dpi)
rendererBase(),
rendererAA(),
rendererBin(),
- theRasterizer(8192),
+ theRasterizer(-1),
lastclippath(NULL),
_fill_color(agg::rgba(1, 1, 1, 0))
{ (the argument is an unsigned upper bound, so -1 means "until ints overflow" :)) but of course that means that you'll run out of memory, instead of cleanly raising an exception (much) earlier, if you try to draw something too big. (Perhaps that's better?) |
Or raise the limit given that it is likely tailored to relatively old machines? Like we don't want someone to thrash, but 100,000 data points seems small? |
Agreed, but I'm not going to pick a limit for you here :) |
Unfortunately I have no idea what that number means. Agg isn't very well documented so far as I can tell... |
I assume it's the (max)size of some internal rasterization buffer, but I don't know either. |
Running out of memory is definitely worse. This is documented in an odd place: https://matplotlib.org/tutorials/introductory/usage.html#splitting-lines-into-smaller-chunks . When we draw a An additional wrinkle is that the number of of verities that we feed the renderer is actually data dependent as internally the (x, y) data you provide is simplified to remove redundant vertices (also documented in a surprising place : https://matplotlib.org/tutorials/introductory/usage.html#line-segment-simplification). Thus if you change the data to x = np.linspace(0, 100, N:=100000)
y = np.ones(N) it saves both fine because the even though you are giving Matplotlib 100k points, internally we are probably only passing 2 points into the rasterizer (way down at the bottom). This also means that for "smooth" data, we get some automatic smart down sampling (e.g. As of 2.1 (via #8776) we picked up "back tracking" simplification as well (so if you have data that is just going up and down in a column of the rendered image we only only keep the vertices at the top and bottom. In the normal case that is enough to us under the threshold, but not for the 500dpi case. Tweaking the simplification threshold to be higher e.g. matplotlib.rcParams['path.simplify_threshold'] = 1 also makes the noise example render (and it works down to In #4404 @anntzer suggested making the chunk size automatic (e.g. it fails, we reduce it and try again until it works), but I was skeptical then and more skeptical now given that the simplification logic is another knob to reach for. At a minimum I am going to put in a PR making that error message give better direction as to how to fix the problem! |
I don't think allowing memory to fill is a good idea. OTOH if we can allow larger chunks because on-board memory has increased an order of magnitude, I dont see why we should burden users with twiddling anything. 100,000 points doesn't sound like a very big dataset in 2021. |
The issue isn't the number of points in the dataset it is how much irreducible texture it has. I chased this through and it is an upper bound matplotlib/extern/agg24-svn/include/agg_rasterizer_cells_aa.h Lines 181 to 196 in 752c376
We are already upping the limit from the default value by a factor of 8 which was last doubled in #4408 . That we have this knob at all seems to be a local patch we are carrying (via 1b937a4 in #3777 ). I have a patch to make the error messages more actionable and am becoming convinced than a 8-16x bump on the max is in order. |
One more comment while testing this, if we make the size a bit bigger we can get a 100k pathological data set to draw at 400dpi, but by setting the chunk_size to 10k we get a ~5k speed up in draw time.
So I think the main trade off on upping the chunk limit is less memory usage, and more run time. With a good error message we can hopefully direct the users who run into this to the chunk size with a sense of what "good" values are and save them run time. Where as if we push the limit higher and higher we are going to get bug reports of "Matplotlib is slow" which is harder to identify and point people to solutions. Another option could be to set the default chunk size to be like 10k or 50k. That would, with an upped limit, be enough to have saved the OP out of the box and should be a no-op for a vast majority of users (who I assume are mostly plotting things shorter than 50k). |
Is there a way to check how much memory each case is using? The 8192 number seemed ridiculously small, but I don't know how that corresponded to a memory. |
To quote myself from 2014 #3777 (comment)
I'm not fully sure what "cells" in the context are. But observational there is some super-linear scaling in the rasterization time with the number of cells (I suspect quadratic as I think each cell requires you to scan all of the other cells based on vague memories of conversations with Mike a few years ago). I am starting to think that the original limit was more about run time rather than memory usage (which is probably where we should have started given that the chunksize in documented under a "performance" heading). |
The error messages now provide better instruction on how to fix the problem. There are some paths we can not split (one with hatching, fills, or that claim they can not be simplified) so tell user that. If we could have otherwise split up the path if the chunksize is set to less than 100, which disables the feature, ask user to make it bigger or make the path simplification more aggressive. If the chunksize is bigger than the data, ask user to make it smaller or make the path simplification more aggressive. If the chunksize is smaller than the data, but still too big ask user to make it smaller or make the path simplification more aggressive. closes matplotlib#19325
The error messages now provide better instruction on how to fix the problem. There are some paths we can not split (one with hatching, fills, or that claim they can not be simplified) so tell user that. If we could have otherwise split up the path if the chunksize is set to less than 100, which disables the feature, ask user to make it bigger or make the path simplification more aggressive. If the chunksize is bigger than the data, ask user to make it smaller or make the path simplification more aggressive. If the chunksize is smaller than the data, but still too big ask user to make it smaller or make the path simplification more aggressive. closes matplotlib#19325
The error messages now provide better instruction on how to fix the problem. There are some paths we can not split (one with hatching, fills, or that claim they can not be simplified) so tell user that. If we could have otherwise split up the path if the chunksize is set to less than 100, which disables the feature, ask user to make it bigger or make the path simplification more aggressive. If the chunksize is bigger than the data, ask user to make it smaller or make the path simplification more aggressive. If the chunksize is smaller than the data, but still too big ask user to make it smaller or make the path simplification more aggressive. closes matplotlib#19325 Co-authored-by: Antony Lee <anntzer.lee@gmail.com>
The error messages now provide better instruction on how to fix the problem. There are some paths we can not split (one with hatching, fills, or that claim they can not be simplified) so tell user that. If we could have otherwise split up the path if the chunksize is set to less than 100, which disables the feature, ask user to make it bigger or make the path simplification more aggressive. If the chunksize is bigger than the data, ask user to make it smaller or make the path simplification more aggressive. If the chunksize is smaller than the data, but still too big ask user to make it smaller or make the path simplification more aggressive. closes matplotlib#19325 Co-authored-by: Antony Lee <anntzer.lee@gmail.com>
The error messages now provide better instruction on how to fix the problem. There are some paths we can not split (one with hatching, fills, or that claim they can not be simplified) so tell user that. If we could have otherwise split up the path if the chunksize is set to less than 100, which disables the feature, ask user to make it bigger or make the path simplification more aggressive. If the chunksize is bigger than the data, ask user to make it smaller or make the path simplification more aggressive. If the chunksize is smaller than the data, but still too big ask user to make it smaller or make the path simplification more aggressive. closes matplotlib#19325 Co-authored-by: Antony Lee <anntzer.lee@gmail.com>
The error messages now provide better instruction on how to fix the problem. There are some paths we can not split (one with hatching, fills, or that claim they can not be simplified) so tell user that. If we could have otherwise split up the path if the chunksize is set to less than 100, which disables the feature, ask user to make it bigger or make the path simplification more aggressive. If the chunksize is bigger than the data, ask user to make it smaller or make the path simplification more aggressive. If the chunksize is smaller than the data, but still too big ask user to make it smaller or make the path simplification more aggressive. closes #19325 Co-authored-by: Antony Lee <anntzer.lee@gmail.com>
Bug report
Even with short timeseries,
fig.savefig()
throws anOverflowError
for high dpi's.The example below uses
'Agg'
with a significantly increasedchunksize
, yet it still throws an error for dpis > 250.The output here is:
whereas I expected both to be successfully plotted, given that the chunksize OverflowError should only show when the number of plots exceeds the chunksize.
Matplotlib version
The text was updated successfully, but these errors were encountered: