Skip to content

Commit dfa37dc

Browse files
aagittorvalds
authored andcommitted
userfaultfd: allow signals to interrupt a userfault
This is only simple to achieve if the userfault is going to return to userland (not to the kernel) because we can avoid returning VM_FAULT_RETRY despite we temporarily released the mmap_sem. The fault would just be retried by userland then. This is safe at least on x86 and powerpc (the two archs with the syscall implemented so far). Hint to verify for which archs this is safe: after handle_mm_fault returns, no access to data structures protected by the mmap_sem must be done by the fault code in arch/*/mm/fault.c until up_read(&mm->mmap_sem) is called. This has two main benefits: signals can run with lower latency in production (signals aren't blocked by userfaults and userfaults are immediately repeated after signal processing) and gdb can then trivially debug the threads blocked in this kind of userfaults coming directly from userland. On a side note: while gdb has a need to get signal processed, coredumps always worked perfectly with userfaults, no matter if the userfault is triggered by GUP a kernel copy_user or directly from userland. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent e6485a4 commit dfa37dc

File tree

1 file changed

+32
-3
lines changed

1 file changed

+32
-3
lines changed

fs/userfaultfd.c

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address,
262262
struct userfaultfd_ctx *ctx;
263263
struct userfaultfd_wait_queue uwq;
264264
int ret;
265-
bool must_wait;
265+
bool must_wait, return_to_userland;
266266

267267
BUG_ON(!rwsem_is_locked(&mm->mmap_sem));
268268

@@ -327,6 +327,9 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address,
327327
uwq.msg = userfault_msg(address, flags, reason);
328328
uwq.ctx = ctx;
329329

330+
return_to_userland = (flags & (FAULT_FLAG_USER|FAULT_FLAG_KILLABLE)) ==
331+
(FAULT_FLAG_USER|FAULT_FLAG_KILLABLE);
332+
330333
spin_lock(&ctx->fault_pending_wqh.lock);
331334
/*
332335
* After the __add_wait_queue the uwq is visible to userland
@@ -338,21 +341,47 @@ int handle_userfault(struct vm_area_struct *vma, unsigned long address,
338341
* following the spin_unlock to happen before the list_add in
339342
* __add_wait_queue.
340343
*/
341-
set_current_state(TASK_KILLABLE);
344+
set_current_state(return_to_userland ? TASK_INTERRUPTIBLE :
345+
TASK_KILLABLE);
342346
spin_unlock(&ctx->fault_pending_wqh.lock);
343347

344348
must_wait = userfaultfd_must_wait(ctx, address, flags, reason);
345349
up_read(&mm->mmap_sem);
346350

347351
if (likely(must_wait && !ACCESS_ONCE(ctx->released) &&
348-
!fatal_signal_pending(current))) {
352+
(return_to_userland ? !signal_pending(current) :
353+
!fatal_signal_pending(current)))) {
349354
wake_up_poll(&ctx->fd_wqh, POLLIN);
350355
schedule();
351356
ret |= VM_FAULT_MAJOR;
352357
}
353358

354359
__set_current_state(TASK_RUNNING);
355360

361+
if (return_to_userland) {
362+
if (signal_pending(current) &&
363+
!fatal_signal_pending(current)) {
364+
/*
365+
* If we got a SIGSTOP or SIGCONT and this is
366+
* a normal userland page fault, just let
367+
* userland return so the signal will be
368+
* handled and gdb debugging works. The page
369+
* fault code immediately after we return from
370+
* this function is going to release the
371+
* mmap_sem and it's not depending on it
372+
* (unlike gup would if we were not to return
373+
* VM_FAULT_RETRY).
374+
*
375+
* If a fatal signal is pending we still take
376+
* the streamlined VM_FAULT_RETRY failure path
377+
* and there's no need to retake the mmap_sem
378+
* in such case.
379+
*/
380+
down_read(&mm->mmap_sem);
381+
ret = 0;
382+
}
383+
}
384+
356385
/*
357386
* Here we race with the list_del; list_add in
358387
* userfaultfd_ctx_read(), however because we don't ever run

0 commit comments

Comments
 (0)