segv under heavy load #161

jcupitt · 2015-04-23T19:56:52Z

Hi, I'm seeing a segv with 3.4.0 under heavy load. This is 50 threads all calling openslide_detect_vendor() in parallel:

https://gist.github.com/jcupitt/8ec613adcd6c000e82d7

I'm seeing:

$ for i in {1..10}; do ./a.out ~/pics/k2.jpg; done
Segmentation fault (core dumped)
Segmentation fault (core dumped)
*** Error in `./a.out': free(): invalid next size (normal): 0x00007f47000036e0 ***
Aborted (core dumped)
... etc.

I'll try with master as well.

The text was updated successfully, but these errors were encountered:

jcupitt · 2015-04-23T20:18:42Z

I see similar signs of memory corruption with master.

$ for i in {1..10}; do ./a.out ~/pics/k2.jpg; done

(a.out:32152): GLib-ERROR **: creating thread '': Error creating thread: Resource temporarily unavailable
Trace/breakpoint trap (core dumped)

(a.out:310): GLib-ERROR **: creating thread '': Error creating thread: Resource temporarily unavailable
Trace/breakpoint trap (core dumped)
*** Error in `./a.out': free(): invalid next size (normal): 0x00007f725c0036c0 ***
Aborted (core dumped)

Not sure what's causing the glib error.

agoode · 2015-04-23T20:48:33Z

Hi John,

Can you try building with asan? Try adding this gcc flag:
-fsanitize=address

agoode · 2015-04-23T20:51:09Z

You can also do tsan (it may need to be in a separate run):
-fsanitize=thread.

For both of these, you have to pass the flag during both the compile and link steps.

bgilbert · 2015-04-24T02:50:41Z

Hi John, thanks for your report. I can't reproduce the segfault on Fedora 21 x86_64, with either 3.4.0 or master, but I am seeing the SIGTRAP (from inside glib, on a failed pthread_setspecific() call). However, the trap also happens if I comment out the openslide_detect_vendor() call.

Your test program is leaking GThread *. If I modify it to g_thread_unref() each new thread, it fixes the SIGTRAP for me. Do you still see the problem with that modification in place?

jcupitt · 2015-04-24T07:55:17Z

Oop, you're right. I've updated the test program to remove the vips dependency and clean up. It fixes the glib errors, but I still see random heap corruption:

$ for i in {1..10}; do ./a.out ~/pics/k2.jpg; done
*** Error in `./a.out': free(): invalid next size (fast): 0x00007fc504001580 ***
Aborted (core dumped)
*** Error in `./a.out': free(): invalid next size (normal): 0x00007f58ec003220 ***
Aborted (core dumped)
*** Error in `./a.out': double free or corruption (out): 0x00007fb9c40015a0 ***
Aborted (core dumped)

Roughly one run in three. That's with git master openslide. The crashes go if I comment out openslide_detect_vendor().

I'll try with asan, as you suggested @agoode.

jcupitt · 2015-04-24T08:00:19Z

I got an error from asan:

$ for i in {1..10}; do ../a.out ~/pics/k2.jpg; done
=================================================================
==4314== ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x60240002fac0
    #0 0x7fd1ac27133a (/usr/lib/x86_64-linux-gnu/libasan.so.0.0.0+0x1533a)
    #1 0x7fd1abf8ba87 (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4002.0+0x37a87)
0x60240002fac0 is located 0 bytes inside of 183328-byte region [0x60240002fac0,0x60240005c6e0)
==4314== AddressSanitizer CHECK failed: ../../../../src/libsanitizer/asan/asan_thread_registry.cc:153 "((tid < n_threads_)) != (0)" (0x0, 0x0)
    #0 0x7fd1ac26e31d (/usr/lib/x86_64-linux-gnu/libasan.so.0.0.0+0x1231d)
    #1 0x7fd1ac275133 (/usr/lib/x86_64-linux-gnu/libasan.so.0.0.0+0x19133)
    #2 0x7fd1ac26f8c8 (/usr/lib/x86_64-linux-gnu/libasan.so.0.0.0+0x138c8)
    #3 0x7fd1ac272e01 (/usr/lib/x86_64-linux-gnu/libasan.so.0.0.0+0x16e01)
    #4 0x7fd1ac273733 (/usr/lib/x86_64-linux-gnu/libasan.so.0.0.0+0x17733)
    #5 0x7fd1ac264a3f (/usr/lib/x86_64-linux-gnu/libasan.so.0.0.0+0x8a3f)
    #6 0x7fd1ac271367 (/usr/lib/x86_64-linux-gnu/libasan.so.0.0.0+0x15367)
    #7 0x7fd1abf8ba87 (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4002.0+0x37a87)
    #8 0x7fd1abf8c003 (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4002.0+0x38003)
    #9 0x7fd1abf8c106 (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4002.0+0x38106)
    #10 0x7fd1abf7a1ff (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4002.0+0x261ff)
    #11 0x7fd1abf931fb (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4002.0+0x3f1fb)
    #12 0x7fd1abf94cfc (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4002.0+0x40cfc)
    #13 0x7fd1abd07461 (/home/john/vips/lib/libopenslide.so.0.4.1+0x36461)
    #14 0x7fd1abce3ffd (/home/john/vips/lib/libopenslide.so.0.4.1+0x12ffd)
    #15 0x7fd1abce43f8 (/home/john/vips/lib/libopenslide.so.0.4.1+0x133f8)
    #16 0x400ac2 (/home/john/try/a.out+0x400ac2)
    #17 0x7fd1abfc1f04 (/lib/x86_64-linux-gnu/libglib-2.0.so.0.4002.0+0x6df04)
    #18 0x7fd1ac274b97 (/usr/lib/x86_64-linux-gnu/libasan.so.0.0.0+0x18b97)
    #19 0x7fd1ab6f6181 (/lib/x86_64-linux-gnu/libpthread-2.19.so+0x8181)
    #20 0x7fd1aba0647c (/lib/x86_64-linux-gnu/libc-2.19.so+0xfa47c)

Not sure how helpful that is.

jcupitt · 2015-04-24T08:05:15Z

I forgot to say, I'm on Ubuntu 14.04, the current long-term support release.

jcupitt · 2015-04-24T08:47:52Z

I tried with valgrind, but that ran cleanly, it's not very good with threads. -fsanitize=thread seems to not work on 14.04.

bgilbert · 2015-04-25T07:41:24Z

Reproduced on Fedora 21; thanks for the update. The problem is inside g_get_language_names(), called during GKeyFile initialization via the VMS/VMU sniffer. Trying to track it down.

jcupitt · 2015-04-25T08:01:51Z

Oh, possibly a glib error? That's interesting. It took me a week to track it down this far :-( good luck.

bgilbert · 2015-04-26T05:30:51Z

It turns out that a helper function of g_get_language_names() performs unsynchronized initialization of a static GHashTable the first time it is called, and has been doing so since the code was added to glib in 2004. The bug can be avoided by calling g_get_language_names() before any threads are started. I will add that call to OpenSlide's shared library constructor, but you may also want to add it to VIPS to avoid the bug with older OpenSlide.

works around a crash under heavy load, see openslide/openslide#161

jcupitt · 2015-04-26T09:33:16Z

I've added a g_get_language_names() to vips_init() and it's fixed my crash too. Thanks for finding this @bgilbert.

bgilbert · 2016-05-14T03:41:52Z

glib 2.48.1 fixes the thread safety of g_get_language_names().

jcupitt mentioned this issue Apr 23, 2015

Cache warning: linecache error reading tile libvips/libvips#261

Closed

bgilbert closed this as completed in ba57bcd Apr 26, 2015

jcupitt added a commit to libvips/libvips that referenced this issue Apr 26, 2015

init glib langs on startup

ffdd9ff

works around a crash under heavy load, see openslide/openslide#161

bgilbert added defect format-vms Hamamatsu VMS format-vmu Hamamatsu VMU labels Sep 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

segv under heavy load #161

segv under heavy load #161

jcupitt commented Apr 23, 2015

jcupitt commented Apr 23, 2015

Uh oh!

agoode commented Apr 23, 2015

Uh oh!

agoode commented Apr 23, 2015

Uh oh!

bgilbert commented Apr 24, 2015

Uh oh!

jcupitt commented Apr 24, 2015

Uh oh!

jcupitt commented Apr 24, 2015

Uh oh!

jcupitt commented Apr 24, 2015

Uh oh!

jcupitt commented Apr 24, 2015

Uh oh!

bgilbert commented Apr 25, 2015

Uh oh!

jcupitt commented Apr 25, 2015

Uh oh!

bgilbert commented Apr 26, 2015

Uh oh!

jcupitt commented Apr 26, 2015

Uh oh!

bgilbert commented May 14, 2016

Uh oh!

segv under heavy load #161

segv under heavy load #161

Comments

jcupitt commented Apr 23, 2015

jcupitt commented Apr 23, 2015

Uh oh!

agoode commented Apr 23, 2015

Uh oh!

agoode commented Apr 23, 2015

Uh oh!

bgilbert commented Apr 24, 2015

Uh oh!

jcupitt commented Apr 24, 2015

Uh oh!

jcupitt commented Apr 24, 2015

Uh oh!

jcupitt commented Apr 24, 2015

Uh oh!

jcupitt commented Apr 24, 2015

Uh oh!

bgilbert commented Apr 25, 2015

Uh oh!

jcupitt commented Apr 25, 2015

Uh oh!

bgilbert commented Apr 26, 2015

Uh oh!

jcupitt commented Apr 26, 2015

Uh oh!

bgilbert commented May 14, 2016

Uh oh!