Skip to content

Fix intermittent SIGBUS on x64 Linux #822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

csfrancis
Copy link

This fixes an intermittent SIGBUS observed on Linux x64. Possibly fixes https://bugs.ruby-lang.org/issues/10626.

The root problem is that if ASLR places the heap and stack regions too close together, the C stack can potentially grow into the heap. The Linux kernel handles this by raising a SIGBUS signal and terminating the process: http://lxr.free-electrons.com/source/mm/memory.c?v=3.16#L2635. Because Ruby C stacks grow dynamically, and Linux process stacks are not reserved/commited up front, it's possible for the heap the prevent the stack from growing to the size allowed by ulimit.

I have written a small proof of concept that can reproduce the problem on Ruby 2.1.5: https://gist.github.com/csfrancis/46e360d401609275246c

My solution is to fully reserve the stack virtual address space at process start up. I do this by determining the current stack bounds, and then subtracting the available stack space (according to the process' rlimit) from the current rsp. Touching memory at that value reserves the address range for the stack (but only commits one physical page of memory).

Right now, this is Linux x64 only (other platforms will just no-op), but it wouldn't be too difficult to implement the same solution for i386 as well.

@nobu
Copy link
Member

nobu commented Jan 30, 2015

Thank you for the investigation and the patch.

Does following patch work?

diff --git i/thread_pthread.c w/thread_pthread.c
index 3ef316c..b7d0bf2 100644
--- i/thread_pthread.c
+++ w/thread_pthread.c
@@ -675,4 +675,7 @@ ruby_init_stack(volatile VALUE *addr
        native_main_thread.stack_maxsize = size;
        native_main_thread.stack_start = stackaddr;
+# if defined(__linux__) && (defined(__x86_64__) || defined(__i386__))
+       *(volatile unsigned long long *)((char *)stackaddr - size) = 0;
+# endif
        return;
    }

@nobu
Copy link
Member

nobu commented Jan 30, 2015

It didn't work outside gdb.

diff --git a/thread_pthread.c b/thread_pthread.c
index 3ef316c..94b3ef7 100644
--- a/thread_pthread.c
+++ b/thread_pthread.c
@@ -653,6 +653,30 @@ space_size(size_t stack_size)
     }
 }

+#ifdef __linux__
+static __attribute__((noinline)) void
+reserve_stack(int dir, volatile char *limit, size_t size)
+{
+    volatile char buf[0x100];
+    size -= sizeof(buf);   /* margin */
+    if (STACK_GROW_DIRECTION > 0 || dir > 0) {
+   limit += size;
+   if (limit > buf + sizeof(buf)) {
+       size = limit - (buf + sizeof(buf));
+       limit = alloca(size);
+       limit[size-1] = 0;
+   }
+    }
+    else {
+   limit -= size;
+   if (buf > limit) {
+       limit = alloca(buf - limit);
+       limit[0] = 0;
+   }
+    }
+}
+#endif
+
 #undef ruby_init_stack
 /* Set stack bottom of Ruby implementation.
  *
@@ -674,6 +698,9 @@ ruby_init_stack(volatile VALUE *addr
    if (get_main_stack(&stackaddr, &size) == 0) {
        native_main_thread.stack_maxsize = size;
        native_main_thread.stack_start = stackaddr;
+# if defined(__linux__)
+       reserve_stack(STACK_UPPER((VALUE *)(void *)&addr, +1, -1), stackaddr, size);
+# endif
        return;
    }
     }

@csfrancis
Copy link
Author

@nobu what was wrong with your first patch? Why doesn't that work?

@fbogsany
Copy link

First patch doesn't work because the Linux kernel appears to check if the access is related to rsp or if it is just a wild pointer.

This reverts commit 038c7508c51e682983d3b436aa9636e0750e8bb6.
@csfrancis
Copy link
Author

@nobu I've created a new patch, based on your suggested patch.

The problem with your patch is that it doesn't check if RLIMIT_STACK is RLIM_INFINITY. In my development environment this was causing the build to fail because make sets RLIMIT_STACK to RLIM_INFINITY.

@nobu
Copy link
Member

nobu commented Jan 31, 2015

It seems that this patch increases memory usage even in the case it is unnecessary, doesn't it?

@csfrancis
Copy link
Author

It seems that this patch increases memory usage even in the case it is unncessary, doesn't it?

It does, but very slightly. Even though we're fully reserving the entire stack region, only a single additional page is used (the memory that is touched by limit[0] = 0;):

7fffe43ee000-7fffe4bed000 rw-p 00000000 00:00 0                          [stack]
Size:               8192 kB
Rss:                  44 kB
Pss:                  44 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        44 kB
Referenced:           44 kB
Anonymous:            44 kB
AnonHugePages:         0 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB
VmFlags: rd wr mr mw me gd ac

This is from my development machine with an 8MB stack. Notice RSS is only 44kb, even though 8MB is reserved.

@nobu
Copy link
Member

nobu commented Jan 31, 2015

The get_stack() change caused SEGV on my Virtualbox Ubuntu 14.

@nobu nobu closed this in 8fe95fe Jan 31, 2015
@csfrancis
Copy link
Author

Awesome @nobu - would you be able to backport this to 2.2?

@jaroslawr
Copy link

@csfrancis Wow, I spent significant time on this issue and would love to hear some more words how did you run into the issue and how did you approach diagnosing it further and finding a fix. I am also curious about two things:

  1. Why doesn't this problem then affect ordinary C programs? i mean your description doesn't mention anything very specific to the interpreter, except maybe "Ruby C stacks grow dynamically", but I am not really sure what you mean here.
  2. Do pthreads fit into this picture in any way? My impresion was that the Ruby process relies to some extent on pthreads for stack handling (there are always at least two threads running in the Ruby interpreter, from what I understand, the timer and the core thread, with their own separate stacks).

@csfrancis
Copy link
Author

how did you run into the issue

We receive notifications when Ruby processes crash in our production environment (and they dump core).

how did you approach diagnosing it further

The core dump for this particular problem looked strange to me for a couple of reasons:

  • It crashed with a SIGBUS and not a SEGV, which is pretty unusual.
  • It crashed in a core VM function, vm_call0_cfunc_with_frame, which is called frequently.
  • The instruction that caused the SIGBUS was a call:
    • => 0x00007ffef02b47ca <+698>: callq *0x18(%r13)
    • I verified that the target address was valid.

I spent a bit of time investigating various theories, but finally found one that stood out. Given that the target address was valid callq instruction that was generating the SIGBUS, I realized what a call instruction actually does; it pushes the next instruction onto the stack and does an absolute jump to the target address. The value of the stack pointer in the faulting frame was:

(gdb) p $rsp
$2 = (void *) 0x7fff006b9000

I found this to be suspicious because it was 4k aligned.

A core file is just a memory dump of a process, it really has a wealth of information. Using readelf you can see the memory region layout of a given corefile. What it will show you is the mapping of memory region in the core file to the virtual address region in the process. For example:

[vagrant] ~ $ readelf -a core
<...snip...>
Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
<...snip...>
  LOAD           0x000000001f6a0000 0x00007ffef1bb7000 0x0000000000000000
                 0x000000000eb01000 0x000000000eb01000  RW     1000
  LOAD           0x000000002e1a1000 0x00007fff006b8000 0x0000000000000000
                 0x0000000000052000 0x0000000000052000  RW     1000

So, what this tells me is that 0x00007ffef1bb7000 is 0xeb01000 bytes in size, which means it ranges from 0x7ffef1bb7000-0x7fff006b8000. I determined that this was the heap for a few reasons:

  • It is one of the largest mappings in the process, and it doesn't map to any executable code.
  • There are a number of Ruby objects that live in this region. The Ruby GC uses malloc to allocate backend memory, so all Ruby objects live on the heap.

We can tell the mapping after the heap is the stack, because it starts at 0x7fff006b8000 (which you'll notice is 0x1000 away from rsp at the time of the SIGBUS). I'm not sure why, but I think the core file format coalesces the stack guard page into the main stack region.

My theory at this point was that the stack needed to grow in order to call a function, but couldn't because it was running into the heap. To prove that, I dug into the Linux kernel to figure out what happens when the stack needs to grow. Sure enough, that led me to the URL I pasted in the PR description: http://lxr.free-electrons.com/source/mm/memory.c?v=3.16#L2635. If you look at the function that's being called there, check_stack_guard_page, you'll see it fails if there's a region above/below that prevents the stack from growing: http://lxr.free-electrons.com/source/mm/memory.c?v=3.16#L2589. And when the stack can't grow, the kernel will raise a SIGBUS.

After coming to this conclusion I wrote a proof of concept script. The script calls itself repeatedly, on each execution checking the distance between the heap and stack segments. If they're within a certain threshold, it allocates a large number of Ruby objects, and then calls a recursive function that will force the stack to grow. Because ASLR randomly places the heap and stack segments, this can take a long time to manifest on x64 (it took over 18k executions for me to hit it). However, because i386 has a much smaller virtual address space, I think this problem would occur much faster on that platform.

Why doesn't this problem then affect ordinary C programs?

This could totally affect ordinary C programs if they use a similar workload. What I meant regarding Ruby C stacks growing dynamically is that MRI does not have its own dedicated stack for Ruby code. Because Ruby code and C code share the same stack, if Ruby code calls a C function, it can cause the native stack to grow.

Do pthreads fit into this picture in any way?

Native threads do have their own stacks. I haven't really investigated how pthreads handle stack allocation, but I suspect they could suffer from the same problem. In our case (Shopify), we're basically using Ruby in a single threaded fashion, so I doubt it would be an issue for us.

@jaroslawr
Copy link

@csfrancis Wow, thank you so much for taking the time to write it up! If I understand you right, isn't this then in the end a flaw in the Linux memory management that perhaps should be reported in the Linux bug tracker?

I went through some of the same hops you went, except I discovered this behaviour locally and was fortunate to start already with a semi-reprodicible test case (in the form of a huge app, but still). I ran the test case in valgrind, have seen that it does not crash then, then I ran it in gdb, did not crash either, finally I figured out a way of also reproducing it in gdb by using:

set disable-randomization off

which is quite telling now with your explanation in mind. Anyway, I was able to get a dissasembly and things like that, see my latter comments here:

https://bugs.ruby-lang.org/issues/10626

I also validated that the stack pointer is actually aligned and that the stack actually is way smaller than the ulimit -s. I just could not go further than that, and concluded it is most likely related to pthreads - even if you do not use threads at the Ruby level, the Ruby interpreter runs with two pthread threads - one for the timer and one executing the main Ruby thread and pthread's do some of their own magic regarding stack handling (eg. Ruby interpreter does pthread_attr_setstacksize). It would never occur to me a SIGBUS could be triggered in the way you presented, that is just crazy stuff that I don't think is documented anywhere, but of course I think you are right.

Having spent so much time on this, respect, man.

nurse added a commit that referenced this pull request Feb 12, 2015
	* thread_pthread.c (reserve_stack): fix intermittent SIGBUS on
	  Linux, by reserving the stack virtual address space at process
	  start up so that it will not clash with the heap space.
	  [Fix GH-822]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_2@49578 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
hsbt pushed a commit that referenced this pull request Apr 13, 2015
	* thread_pthread.c (reserve_stack): fix intermittent SIGBUS on
	  Linux, by reserving the stack virtual address space at process
	  start up so that it will not clash with the heap space.
	  [Fix GH-822]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_2_1@50289 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
mmasaki pushed a commit to mmasaki/ruby that referenced this pull request May 30, 2015
* thread_pthread.c (reserve_stack): fix intermittent SIGBUS on
  Linux, by reserving the stack virtual address space at process
  start up so that it will not clash with the heap space.
  [Fix rubyGH-822]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@49452 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
@KJTsanaktsidis
Copy link
Contributor

@jaroslawr Hello from the future! I don't know if you care about this almost 10 years later, but

If I understand you right, isn't this then in the end a flaw in the Linux memory management that perhaps should be reported in the Linux bug tracker?

Yes, I believe this was a bug in the kernel, and it was fixed in 2017 here: torvalds/linux@c204d21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants