Skip to content

memory error using savefig with ylim to create pdf of box plots #10889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bu22dee opened this issue Mar 27, 2018 · 16 comments · Fixed by #10919
Closed

memory error using savefig with ylim to create pdf of box plots #10889

bu22dee opened this issue Mar 27, 2018 · 16 comments · Fixed by #10919
Milestone

Comments

@bu22dee
Copy link

bu22dee commented Mar 27, 2018

Bug report

Bug summary

This is my first post. I am working with python for 3 weeks and i am completly new to this topic

savefig while using ylim for box plots gives sometimes a memory error. saveing as png or svg, or not using ylim workes fine.

my guess: despite of limiting the plot with ylim savefig just "zoom" in which can create very long distances even they are not part of the plot anymore which just hit some pdf limitations. in this case there is a data point at 1e39. when i use ylim(-2,2) there is a huge distence which is not part of the plot maybe this causes the memory error.

ram was fine the whole time.

i hope this helps.

code in comments

Actual outcome

PS D:\Users\user\Documents\s_line_messdaten\vorlage> py .\vorlage.py
1_box_plot: quantity1
2_box_plot: quantity2
Traceback (most recent call last):
  File ".\vorlage.py", line 173, in <module>
    box_plot(data)
  File ".\vorlage.py", line 146, in box_plot
    plt.savefig("./boxplot/"+simplename[quantity]+'.pdf', dpi=2)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\pyplot.py", line 701, in savefig
    res = fig.savefig(*args, **kwargs)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\figure.py", line 1834, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\backend_bases.py", line 2267, in print_figure
    **kwargs)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\backends\backend_pdf.py", line 2592, in print_pd
f
    self.figure.draw(renderer)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    return draw(artist, renderer, *args, **kwargs)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\figure.py", line 1299, in draw
    renderer, self, artists, self.suppressComposite)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\image.py", line 138, in _draw_list_compositing_i
mages
    a.draw(renderer)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    return draw(artist, renderer, *args, **kwargs)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\axes\_base.py", line 2437, in draw
    mimage._draw_list_compositing_images(renderer, self, artists)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\image.py", line 138, in _draw_list_compositing_i
mages
    a.draw(renderer)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\artist.py", line 55, in draw_wrapper
    return draw(artist, renderer, *args, **kwargs)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\lines.py", line 840, in draw
    rgbaFace)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\backends\backend_pdf.py", line 1799, in draw_mar
kers
    path, trans, rgbFace)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\backend_bases.py", line 328, in draw_markers
    rgbFace)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\backends\backend_pdf.py", line 1719, in draw_pat
h
    gc.get_sketch_params())
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\backends\backend_pdf.py", line 1521, in writePat
h
    sketch=sketch)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\matplotlib\backends\backend_pdf.py", line 1511, in pathOper
ations
    True))]
MemoryError: Memory error

Expected outcome

pdf files for all plots

Matplotlib version

  • Operating system: Win 7
  • Matplotlib version: 2.1.2
  • Matplotlib backend (print(matplotlib.get_backend())): TkAgg
  • Python version: 3.6
  • Jupyter version (if applicable): ---
  • Other libraries: numpy (but should not be relevant)

i am at work i think it was installed via pip

@ImportanceOfBeingErnest
Copy link
Member

Can you please provide a minimal runnable example. See http://sscce.org.

@bu22dee
Copy link
Author

bu22dee commented Mar 27, 2018

like this?

try to change datatype to svg or png. without ylim there are no problems, too. it looks like that this problem only shows with very large negative numbers.

import matplotlib.pyplot as plt


data = [[2,3,1,2,3,10e50], [2.3, 1, -2, -10e34]]

#without y_lim
def box_plot(data, datatype):
    plt.figure("nozoom")
    plt.boxplot(data, labels = ["data1", "data2"], sym="o", meanline=True, notch=False)
    plt.savefig('nofail.'+datatype)
    plt.clf()

box_plot(data, 'pdf')

#with y_lim
def box_plot_ylim(data, ylim, datatype):
    plt.figure("zoom")
    plt.boxplot(data, labels = ["data1", "data2"], sym="o", meanline=True, notch=False)
    plt.ylim(ylim)
    plt.savefig('fail.'+datatype)
    plt.clf()

box_plot_ylim(data, (-5, 5), 'pdf')

@ImportanceOfBeingErnest
Copy link
Member

I can reproduce the memory error with python 2.7, matplotlib 2.2.2, TkAgg as well as Qt4Agg. I can also reproduce with python 3.6, master, Qt5Agg. Both on Windows8.1.

What's funny is that it works fine for the last data value being set to -1e34 or -100e34, yet it fails for -10e34.

@phobson
Copy link
Member

phobson commented Mar 27, 2018

I cannot reproduce this with Python 3.6, MPL v2.2.2 on (Windows subsystem for) Linux

@phobson
Copy link
Member

phobson commented Mar 27, 2018

I also cannot reproduce this on vanilla Windows with Python 3.6 and MPL v2.1.2 & v2.2.2 on my admittedly pretty high-end work machine.

I can report however that the test case with the zoomed limits does seem to take much longer to render than it should.

@jklymak
Copy link
Member

jklymak commented Mar 27, 2018

I General we don’t garauntee to filter every bad value out of the input. In this case I can only assume the very large numbers are meant to represent bad data. Suggest masking your arrays before passing to boxplot or any other Matplotlib function is the right way to specify data as bad otherwise you are bound to hit floating point problems somewhere along the line.

@ImportanceOfBeingErnest
Copy link
Member

@jklymak -1e34 is a perfectly valid value. It should by no means being filtered out, since that would change the boxplot statistics. And using such a value should not result in a memory error.

@phobson
Copy link
Member

phobson commented Mar 27, 2018

one thing to check: inspect the artists that represent the whiskers and see how many points are in that path

@jklymak
Copy link
Member

jklymak commented Mar 27, 2018

@ImportanceOfBeingErnest Yes, -1e34 is perfectly valid floating point number, but you can't do valid floating point arithmetic with it unless the other numbers are very large as well.

Do this:

a = np.array([1e10], dtype=np.float64)
b = np.array([5.], dtype=np.float64)
c = a-b
d = c-a
print(d)

you get [ -5.] as the output.

Now do

a = np.array([1e17], dtype=np.float64)
b = np.array([5.], dtype=np.float64)
c = a-b
d = c-a
print(d)

[ 0.]

So yes, -1e34 is a perfectly valid floating point number, but you can't do floating point arithmetic properly unless the other values are on the order of 1e19 and expect to get anything sensible out.

I don't know thats the problem here, but I do know that mixing -5 and -1e34 is asking for trouble, and users shouldn't expect miracles. We can try to work around such issues, or folks can just be cognizant of how floating point arithmetic works and precondition their data in the way that makes sense for their data set.

@ImportanceOfBeingErnest
Copy link
Member

I don't quite get the point. Surely any maths on those numbers is accurate within the limits of floating point accuracy. Compared to 1e34 it does not actually matter if some result is +5 or -5. But this issue is not about the wrong mean shown on the plot or similar.

@jklymak
Copy link
Member

jklymak commented Mar 27, 2018

Because the error is only triggered when zooming in to -5 to +5...

@ImportanceOfBeingErnest
Copy link
Member

Ok, agreed, I mean it's just a memory error from a plot with 2 boxes and 10 lines. Can't be that serious after all.

@tacaswell tacaswell added this to the v2.2.3 milestone Mar 28, 2018
@tacaswell
Copy link
Member

This is super weird. The exception is coming from

matplotlib/src/_path.h

Lines 1184 to 1232 in a9a495e

template <class PathIterator>
int convert_to_string(PathIterator &path,
agg::trans_affine &trans,
agg::rect_d &clip_rect,
bool simplify,
SketchParams sketch_params,
int precision,
char **codes,
bool postfix,
char **buffer,
size_t *buffersize)
{
typedef agg::conv_transform<py::PathIterator> transformed_path_t;
typedef PathNanRemover<transformed_path_t> nan_removal_t;
typedef PathClipper<nan_removal_t> clipped_t;
typedef PathSimplifier<clipped_t> simplify_t;
typedef agg::conv_curve<simplify_t> curve_t;
typedef Sketch<curve_t> sketch_t;
bool do_clip = (clip_rect.x1 < clip_rect.x2 && clip_rect.y1 < clip_rect.y2);
transformed_path_t tpath(path, trans);
nan_removal_t nan_removed(tpath, true, path.has_curves());
clipped_t clipped(nan_removed, do_clip && !path.has_curves(), clip_rect);
simplify_t simplified(clipped, simplify, path.simplify_threshold());
*buffersize = path.total_vertices() * (precision + 5) * 4;
if (*buffersize == 0) {
return 0;
}
if (sketch_params.scale != 0.0) {
*buffersize *= 10;
}
*buffer = (char *)malloc(*buffersize);
if (*buffer == NULL) {
return 1;
}
if (sketch_params.scale == 0.0) {
return __convert_to_string(simplified, precision, codes, postfix, buffer, buffersize);
} else {
curve_t curve(simplified);
sketch_t sketch(curve, sketch_params.scale, sketch_params.length, sketch_params.randomness);
return __convert_to_string(sketch, precision, codes, postfix, buffer, buffersize);
}
}
but if you drop into the debugger to look at the input there are only 26 vertexes handed to c from python and percision looks like it is 6 (hard-coded).

@bu22dee
Copy link
Author

bu22dee commented Mar 28, 2018

btw i have got that error after i changed the order of the boxes. maybe the error is dependent on the positon of the very large negative number (while zoomed in).

@tacaswell
Copy link
Member

I did not dig deep enough,

matplotlib/src/_path.h

Lines 1047 to 1182 in a9a495e

char *__append_to_string(char *p, char **buffer, size_t *buffersize,
const char *content)
{
int buffersize_int = (int)*buffersize;
for (const char *i = content; *i; ++i) {
if (p < *buffer) {
/* This is just an internal error */
return NULL;
}
if (p - *buffer >= buffersize_int) {
int diff = p - *buffer;
*buffersize *= 2;
*buffer = (char *)realloc(*buffer, *buffersize);
if (*buffer == NULL) {
return NULL;
}
p = *buffer + diff;
}
*p++ = *i;
}
return p;
}
char *__add_number(double val, const char *format, int precision,
char **buffer, char *p, size_t *buffersize)
{
char *result;
char *str;
str = PyOS_double_to_string(val, format[0], precision, 0, NULL);
// Delete trailing zeros and decimal point
char *q = str;
for (; *q != 0; ++q) {
// Find the end of the string
}
--q;
for (; q >= str && *q == '0'; --q) {
// Rewind through all the zeros
}
// If the end is a decimal qoint, delete that too
if (q >= str && *q == '.') {
--q;
}
// Truncate the string
++q;
*q = 0;
if ((result = __append_to_string(p, buffer, buffersize, str)) == NULL) {
PyMem_Free(str);
return NULL;
}
PyMem_Free(str);
return result;
}
template <class PathIterator>
int __convert_to_string(PathIterator &path,
int precision,
char **codes,
bool postfix,
char **buffer,
size_t *buffersize)
{
const char *format = "f";
char *p = *buffer;
double x[3];
double y[3];
double last_x = 0.0;
double last_y = 0.0;
const int sizes[] = { 1, 1, 2, 3 };
int size = 0;
unsigned code;
while ((code = path.vertex(&x[0], &y[0])) != agg::path_cmd_stop) {
if (code == 0x4f) {
if ((p = __append_to_string(p, buffer, buffersize, codes[4])) == NULL) return 1;
} else if (code < 5) {
size = sizes[code - 1];
for (int i = 1; i < size; ++i) {
unsigned subcode = path.vertex(&x[i], &y[i]);
if (subcode != code) {
return 2;
}
}
/* For formats that don't support quad curves, convert to
cubic curves */
if (code == CURVE3 && codes[code - 1][0] == '\0') {
quad2cubic(last_x, last_y, x[0], y[0], x[1], y[1], x, y);
code++;
size = 3;
}
if (!postfix) {
if ((p = __append_to_string(p, buffer, buffersize, codes[code - 1])) == NULL) return 1;
if ((p = __append_to_string(p, buffer, buffersize, " ")) == NULL) return 1;
}
for (int i = 0; i < size; ++i) {
if ((p = __add_number(x[i], format, precision, buffer, p, buffersize)) == NULL) return 1;
if ((p = __append_to_string(p, buffer, buffersize, " ")) == NULL) return 1;
if ((p = __add_number(y[i], format, precision, buffer, p, buffersize)) == NULL) return 1;
if ((p = __append_to_string(p, buffer, buffersize, " ")) == NULL) return 1;
}
if (postfix) {
if ((p = __append_to_string(p, buffer, buffersize, codes[code - 1])) == NULL) return 1;
}
last_x = x[size - 1];
last_y = y[size - 1];
} else {
// Unknown code value
return 2;
}
if ((p = __append_to_string(p, buffer, buffersize, "\n")) == NULL) return 1;
}
*buffersize = p - *buffer;
return 0;
}
That whole set of functions can result in 1 being returned which gets turned into a MemoryError by
if (status) {
free(buffer);
if (status == 1) {
PyErr_SetString(PyExc_MemoryError, "Memory error");
} else if (status == 2) {
PyErr_SetString(PyExc_ValueError, "Malformed path codes");
}
return NULL;
}

@QuLogic
Copy link
Member

QuLogic commented Mar 28, 2018

The problem is that buffersize_int is not updated to match the new buffersize when the buffer is realloc'd, so it's doubled for every extra character over the original buffer size. I'm not sure why we even have buffersize_int in the first place.

QuLogic added a commit to QuLogic/matplotlib that referenced this issue Mar 30, 2018
The int version of the buffer size was not updated when the buffer was
resized. It's there to prevent a signed/unsigned comparison warning, but
it's simpler just to cast the other side of the comparison. There's no
problem with the signed-to-unsigned cast since we already know that the
result is positive due to the previous check.

Fixes matplotlib#10889.
anntzer pushed a commit to anntzer/matplotlib that referenced this issue Mar 31, 2018
The int version of the buffer size was not updated when the buffer was
resized. It's there to prevent a signed/unsigned comparison warning, but
it's simpler just to cast the other side of the comparison. There's no
problem with the signed-to-unsigned cast since we already know that the
result is positive due to the previous check.

Fixes matplotlib#10889.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants