-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
1.4.0 RC1+7: *** glibc detected *** python: corrupted double-linked list #3304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It is odd that the tests don't trigger this. |
Well, the tests are probably short ones, right? I plot hundreds of figures and allocate huge numpy arrays etc. And it only happens after a while. I guess this bug depends on the current memory layout to trigger an error. |
Can you provide a standalone script to reproduce the problem? |
I can try it next week, but the main problem is that 7b39e78 changes several things, so I guess it would be most useful if I apply each change on its own (the |
If the rasterization code path was broken when you ran these tests, that code path should not have been hit (as there were no temporary rasterized buffers). |
I ran a modified version of one of the demo (examples/api/collections_demo.py, see below) and let it run 18k times before I killed it.
|
I think I found the mistake. I couldn't produce a small test script though. The problem is the rewrite of code in So, what I think is happening is that the data object gets freed, as the reference count goes to zero. And then, by chance, everything works as usual, but equally it can also happen that something overwrites this piece of memory when it's allocated again, resulting in random corruption. Using I like to strongly emphasize that I think the previous version using the slightly ugly Python 2/3 switch is in many ways safer and more easily understandable (and thus changeable). By switching from PyCXX to the raw Python API you deal with I propose that the previous version of |
Thanks for getting to the bottom of this. The previous version leaked memory, however, which is the problem the change endeavoured to solve. We need to have a solution that neither leaks nor over-zealously frees memory. I'm fine with using the PyCXX API for this, but it didn't appear possible to get it to not leak in this particular case. |
This at least seems correct to me:
The change to passing |
I was still seeing a memory leak with 493ded8 in the test suite. For whatever reason, the string ended with a refcnt of 2 when exiting the function. |
Ok, I think this is because the tuple index assignment operator increases the refcnt: virtual void setItem (sequence_index_type offset, const Object&ob)
{
// note PyTuple_SetItem is a thief...
if(PyTuple_SetItem (ptr(), offset, new_reference_to(ob)) == -1)
{
throw Exception();
}
} Then it makes sense that the previous version had a refcnt of 2, one from PyBytes_AsString and one from the Tuple. So that would mean if we in 493ded8 add EDIT: The above is probably not correct, see below. |
That might work. If the refcnt is 1 on the way out, and valgrind is happy, and it fixes what you're seeing, I'm not opposed. |
I just got an idea. When PyCXX puts data into the tuple with |
I just tested all three versions and see absolutely no difference in memory behaviour, both using top and valgrind. But I found another thing which may be related to the strange crashes. There is a check |
Oh yeah, I'm hitting this case! I think this is the root of all evil here. What would be the correct thing to do in the else clause? If it's throwing an exception then I've probably found a bug which is higher up the hierarchy, right? |
I just created a reproducible test case for hitting the else clause. I cannot post it yet as it's using my own things, but I will try to simplify it. I will now debug a bit to see how this condition might occur. |
Got it! fig, ax = plt.subplots()
circle1=plt.Circle((-10,10),rasterized=True)
ax.add_artist(circle1)
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
fig.savefig('test.svg')
plt.close(fig) When putting an object outside of the plot limits while using svg and rasterized=True, then apparently it tries to create a zero size rasterized image. Note that I put an exception in the else-clause to let it fail. So, including it as a test case might be tricky, as without the else-clause it just produces random crashes depending on which PyCXX style (or raw Python API) is used to put the uninitialized pointer into the tuple. |
@neothemachine Thanks so much for tracking this down! Random crashes are not super informative, but they are something that will show up on travis and get the attention of the devs. |
Perhaps that if-statement should be an assert? Or maybe if the statement is On Tue, Jul 29, 2014 at 8:31 AM, Thomas A Caswell notifications@github.com
|
I don't think emitting a warning is the correct thing to do. The case where this happens is when a rasterized artist doesn't happen to be in the data limits, there is nothing wrong with this situation. I suspect that the right thing to do is to return a zero-sized array, but I don't know this section of the code well enough to know if that will cause problems else where. |
I think we should add an else clause that sets |
I'll try to whip something up for this. |
Although I said earlier that using PyCXX would be more understandable I think it's ok to leave it as is here, mostly because there's just one API function (PyBytes_FromStringAndSize) that's used. But I would still replace |
One more thing, the exception message points to the wrong function name |
BUG : fixes memory corruption issues with zero-size rasterized artists Fix #3304.
BUG : fixes memory corruption issues with zero-size rasterized artists Fix #3304.
BUG : fixes memory corruption issues with zero-size rasterized artists Fix matplotlib#3304.
I just spent a day figuring out where
*** glibc detected *** .../bin/python: corrupted double-linked list: 0x00000000069a01a0 ***
was coming from. I installed the latest 1.4.x branch version, so I thought that something must have happened there. Bisecting the commits and testing different versions, I found that 7b39e78 introduced the problem. The errors happen randomly and the stacktraces are different most of the time. Some examples:So, in summary: up until 760b2fc everything is fine, but with the following commit 7b39e78, my program crashes randomly when drawing things with matplotlib, usually after a few minutes of drawing plots.
As the crashes appear randomly I cannot be absolutely sure that up until 760b2fc the problem really is not appearing, but as I let it run for an hour without crashes I'm pretty confident.
I hope you can find out what's going on, I'm certainly not an expert in these things. One guess of me would be that something writes over the edge of the string array, and this then sometimes happens to be a doubly linked list which gets corrupted.
The text was updated successfully, but these errors were encountered: