Skip to content

Commit f4eafd8

Browse files
toshikaniIngo Molnar
authored andcommitted
x86/mm: Fix vmalloc_fault() to handle large pages properly
A kernel page fault oops with the callstack below was observed when a read syscall was made to a pmem device after a huge amount (>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64 system: BUG: unable to handle kernel paging request at ffff880840000ff8 IP: vmalloc_fault+0x1be/0x300 PGD c7f03a067 PUD 0 Oops: 0000 [#1] SM Call Trace: __do_page_fault+0x285/0x3e0 do_page_fault+0x2f/0x80 ? put_prev_entity+0x35/0x7a0 page_fault+0x28/0x30 ? memcpy_erms+0x6/0x10 ? schedule+0x35/0x80 ? pmem_rw_bytes+0x6a/0x190 [nd_pmem] ? schedule_timeout+0x183/0x240 btt_log_read+0x63/0x140 [nd_btt] : ? __symbol_put+0x60/0x60 ? kernel_read+0x50/0x80 SyS_finit_module+0xb9/0xf0 entry_SYSCALL_64_fastpath+0x1a/0xa4 Since v4.1, ioremap() supports large page (pud/pmd) mappings in x86_64 and PAE. vmalloc_fault() however assumes that the vmalloc range is limited to pte mappings. vmalloc faults do not normally happen in ioremap'd ranges since ioremap() sets up the kernel page tables, which are shared by user processes. pgd_ctor() sets the kernel's PGD entries to user's during fork(). When allocation of the vmalloc ranges crosses a 512GB boundary, ioremap() allocates a new pud table and updates the kernel PGD entry to point it. If user process's PGD entry does not have this update yet, a read/write syscall to the range will cause a vmalloc fault, which hits the Oops above as it does not handle a large page properly. Following changes are made to vmalloc_fault(). 64-bit: - No change for the PGD sync operation as it handles large pages already. - Add pud_huge() and pmd_huge() to the validation code to handle large pages. - Change pud_page_vaddr() to pud_pfn() since an ioremap range is not directly mapped (while the if-statement still works with a bogus addr). - Change pmd_page() to pmd_pfn() since an ioremap range is not backed by struct page (while the if-statement still works with a bogus addr). 32-bit: - No change for the sync operation since the index3 PGD entry covers the entire vmalloc range, which is always valid. (A separate change to sync PGD entry is necessary if this memory layout is changed regardless of the page size.) - Add pmd_huge() to the validation code to handle large pages. This is for completeness since vmalloc_fault() won't happen in ioremap'd ranges as its PGD entry is always valid. Reported-by: Henning Schild <henning.schild@siemens.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Borislav Petkov <bp@alien8.de> Cc: <stable@vger.kernel.org> # 4.1+ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent 4e7f9df commit f4eafd8

File tree

1 file changed

+11
-4
lines changed

1 file changed

+11
-4
lines changed

arch/x86/mm/fault.c

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,9 @@ static noinline int vmalloc_fault(unsigned long address)
287287
if (!pmd_k)
288288
return -1;
289289

290+
if (pmd_huge(*pmd_k))
291+
return 0;
292+
290293
pte_k = pte_offset_kernel(pmd_k, address);
291294
if (!pte_present(*pte_k))
292295
return -1;
@@ -360,8 +363,6 @@ void vmalloc_sync_all(void)
360363
* 64-bit:
361364
*
362365
* Handle a fault on the vmalloc area
363-
*
364-
* This assumes no large pages in there.
365366
*/
366367
static noinline int vmalloc_fault(unsigned long address)
367368
{
@@ -403,17 +404,23 @@ static noinline int vmalloc_fault(unsigned long address)
403404
if (pud_none(*pud_ref))
404405
return -1;
405406

406-
if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref))
407+
if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
407408
BUG();
408409

410+
if (pud_huge(*pud))
411+
return 0;
412+
409413
pmd = pmd_offset(pud, address);
410414
pmd_ref = pmd_offset(pud_ref, address);
411415
if (pmd_none(*pmd_ref))
412416
return -1;
413417

414-
if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
418+
if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
415419
BUG();
416420

421+
if (pmd_huge(*pmd))
422+
return 0;
423+
417424
pte_ref = pte_offset_kernel(pmd_ref, address);
418425
if (!pte_present(*pte_ref))
419426
return -1;

0 commit comments

Comments
 (0)