Skip to content

Commit 3eb5b89

Browse files
committed
Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MPX support from Thomas Gleixner: "This enables support for x86 MPX. MPX is a new debug feature for bound checking in user space. It requires kernel support to handle the bound tables and decode the bound violating instruction in the trap handler" * 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: asm-generic: Remove asm-generic arch_bprm_mm_init() mm: Make arch_unmap()/bprm_mm_init() available to all architectures x86: Cleanly separate use of asm-generic/mm_hooks.h x86 mpx: Change return type of get_reg_offset() fs: Do not include mpx.h in exec.c x86, mpx: Add documentation on Intel MPX x86, mpx: Cleanup unused bound tables x86, mpx: On-demand kernel allocation of bounds tables x86, mpx: Decode MPX instruction to get bound violation information x86, mpx: Add MPX-specific mmap interface x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific x86, mpx: Add MPX to disabled features ia64: Sync struct siginfo with general version mips: Sync struct siginfo with general version mpx: Extend siginfo structure to include bound violation information x86, mpx: Rename cfg_reg_u and status_reg x86: mpx: Give bndX registers actual names x86: Remove arbitrary instruction size limit in instruction decoder
2 parents 9e66645 + 9f7789f commit 3eb5b89

File tree

35 files changed

+1591
-47
lines changed

35 files changed

+1591
-47
lines changed

Documentation/x86/intel_mpx.txt

Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
1. Intel(R) MPX Overview
2+
========================
3+
4+
Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability
5+
introduced into Intel Architecture. Intel MPX provides hardware features
6+
that can be used in conjunction with compiler changes to check memory
7+
references, for those references whose compile-time normal intentions are
8+
usurped at runtime due to buffer overflow or underflow.
9+
10+
For more information, please refer to Intel(R) Architecture Instruction
11+
Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection
12+
Extensions.
13+
14+
Note: Currently no hardware with MPX ISA is available but it is always
15+
possible to use SDE (Intel(R) Software Development Emulator) instead, which
16+
can be downloaded from
17+
http://software.intel.com/en-us/articles/intel-software-development-emulator
18+
19+
20+
2. How to get the advantage of MPX
21+
==================================
22+
23+
For MPX to work, changes are required in the kernel, binutils and compiler.
24+
No source changes are required for applications, just a recompile.
25+
26+
There are a lot of moving parts of this to all work right. The following
27+
is how we expect the compiler, application and kernel to work together.
28+
29+
1) Application developer compiles with -fmpx. The compiler will add the
30+
instrumentation as well as some setup code called early after the app
31+
starts. New instruction prefixes are noops for old CPUs.
32+
2) That setup code allocates (virtual) space for the "bounds directory",
33+
points the "bndcfgu" register to the directory and notifies the kernel
34+
(via the new prctl(PR_MPX_ENABLE_MANAGEMENT)) that the app will be using
35+
MPX.
36+
3) The kernel detects that the CPU has MPX, allows the new prctl() to
37+
succeed, and notes the location of the bounds directory. Userspace is
38+
expected to keep the bounds directory at that locationWe note it
39+
instead of reading it each time because the 'xsave' operation needed
40+
to access the bounds directory register is an expensive operation.
41+
4) If the application needs to spill bounds out of the 4 registers, it
42+
issues a bndstx instruction. Since the bounds directory is empty at
43+
this point, a bounds fault (#BR) is raised, the kernel allocates a
44+
bounds table (in the user address space) and makes the relevant entry
45+
in the bounds directory point to the new table.
46+
5) If the application violates the bounds specified in the bounds registers,
47+
a separate kind of #BR is raised which will deliver a signal with
48+
information about the violation in the 'struct siginfo'.
49+
6) Whenever memory is freed, we know that it can no longer contain valid
50+
pointers, and we attempt to free the associated space in the bounds
51+
tables. If an entire table becomes unused, we will attempt to free
52+
the table and remove the entry in the directory.
53+
54+
To summarize, there are essentially three things interacting here:
55+
56+
GCC with -fmpx:
57+
* enables annotation of code with MPX instructions and prefixes
58+
* inserts code early in the application to call in to the "gcc runtime"
59+
GCC MPX Runtime:
60+
* Checks for hardware MPX support in cpuid leaf
61+
* allocates virtual space for the bounds directory (malloc() essentially)
62+
* points the hardware BNDCFGU register at the directory
63+
* calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to
64+
start managing the bounds directories
65+
Kernel MPX Code:
66+
* Checks for hardware MPX support in cpuid leaf
67+
* Handles #BR exceptions and sends SIGSEGV to the app when it violates
68+
bounds, like during a buffer overflow.
69+
* When bounds are spilled in to an unallocated bounds table, the kernel
70+
notices in the #BR exception, allocates the virtual space, then
71+
updates the bounds directory to point to the new table. It keeps
72+
special track of the memory with a VM_MPX flag.
73+
* Frees unused bounds tables at the time that the memory they described
74+
is unmapped.
75+
76+
77+
3. How does MPX kernel code work
78+
================================
79+
80+
Handling #BR faults caused by MPX
81+
---------------------------------
82+
83+
When MPX is enabled, there are 2 new situations that can generate
84+
#BR faults.
85+
* new bounds tables (BT) need to be allocated to save bounds.
86+
* bounds violation caused by MPX instructions.
87+
88+
We hook #BR handler to handle these two new situations.
89+
90+
On-demand kernel allocation of bounds tables
91+
--------------------------------------------
92+
93+
MPX only has 4 hardware registers for storing bounds information. If
94+
MPX-enabled code needs more than these 4 registers, it needs to spill
95+
them somewhere. It has two special instructions for this which allow
96+
the bounds to be moved between the bounds registers and some new "bounds
97+
tables".
98+
99+
#BR exceptions are a new class of exceptions just for MPX. They are
100+
similar conceptually to a page fault and will be raised by the MPX
101+
hardware during both bounds violations or when the tables are not
102+
present. The kernel handles those #BR exceptions for not-present tables
103+
by carving the space out of the normal processes address space and then
104+
pointing the bounds-directory over to it.
105+
106+
The tables need to be accessed and controlled by userspace because
107+
the instructions for moving bounds in and out of them are extremely
108+
frequent. They potentially happen every time a register points to
109+
memory. Any direct kernel involvement (like a syscall) to access the
110+
tables would obviously destroy performance.
111+
112+
Why not do this in userspace? MPX does not strictly require anything in
113+
the kernel. It can theoretically be done completely from userspace. Here
114+
are a few ways this could be done. We don't think any of them are practical
115+
in the real-world, but here they are.
116+
117+
Q: Can virtual space simply be reserved for the bounds tables so that we
118+
never have to allocate them?
119+
A: MPX-enabled application will possibly create a lot of bounds tables in
120+
process address space to save bounds information. These tables can take
121+
up huge swaths of memory (as much as 80% of the memory on the system)
122+
even if we clean them up aggressively. In the worst-case scenario, the
123+
tables can be 4x the size of the data structure being tracked. IOW, a
124+
1-page structure can require 4 bounds-table pages. An X-GB virtual
125+
area needs 4*X GB of virtual space, plus 2GB for the bounds directory.
126+
If we were to preallocate them for the 128TB of user virtual address
127+
space, we would need to reserve 512TB+2GB, which is larger than the
128+
entire virtual address space today. This means they can not be reserved
129+
ahead of time. Also, a single process's pre-popualated bounds directory
130+
consumes 2GB of virtual *AND* physical memory. IOW, it's completely
131+
infeasible to prepopulate bounds directories.
132+
133+
Q: Can we preallocate bounds table space at the same time memory is
134+
allocated which might contain pointers that might eventually need
135+
bounds tables?
136+
A: This would work if we could hook the site of each and every memory
137+
allocation syscall. This can be done for small, constrained applications.
138+
But, it isn't practical at a larger scale since a given app has no
139+
way of controlling how all the parts of the app might allocate memory
140+
(think libraries). The kernel is really the only place to intercept
141+
these calls.
142+
143+
Q: Could a bounds fault be handed to userspace and the tables allocated
144+
there in a signal handler intead of in the kernel?
145+
A: mmap() is not on the list of safe async handler functions and even
146+
if mmap() would work it still requires locking or nasty tricks to
147+
keep track of the allocation state there.
148+
149+
Having ruled out all of the userspace-only approaches for managing
150+
bounds tables that we could think of, we create them on demand in
151+
the kernel.
152+
153+
Decoding MPX instructions
154+
-------------------------
155+
156+
If a #BR is generated due to a bounds violation caused by MPX.
157+
We need to decode MPX instructions to get violation address and
158+
set this address into extended struct siginfo.
159+
160+
The _sigfault feild of struct siginfo is extended as follow:
161+
162+
87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
163+
88 struct {
164+
89 void __user *_addr; /* faulting insn/memory ref. */
165+
90 #ifdef __ARCH_SI_TRAPNO
166+
91 int _trapno; /* TRAP # which caused the signal */
167+
92 #endif
168+
93 short _addr_lsb; /* LSB of the reported address */
169+
94 struct {
170+
95 void __user *_lower;
171+
96 void __user *_upper;
172+
97 } _addr_bnd;
173+
98 } _sigfault;
174+
175+
The '_addr' field refers to violation address, and new '_addr_and'
176+
field refers to the upper/lower bounds when a #BR is caused.
177+
178+
Glibc will be also updated to support this new siginfo. So user
179+
can get violation address and bounds when bounds violations occur.
180+
181+
Cleanup unused bounds tables
182+
----------------------------
183+
184+
When a BNDSTX instruction attempts to save bounds to a bounds directory
185+
entry marked as invalid, a #BR is generated. This is an indication that
186+
no bounds table exists for this entry. In this case the fault handler
187+
will allocate a new bounds table on demand.
188+
189+
Since the kernel allocated those tables on-demand without userspace
190+
knowledge, it is also responsible for freeing them when the associated
191+
mappings go away.
192+
193+
Here, the solution for this issue is to hook do_munmap() to check
194+
whether one process is MPX enabled. If yes, those bounds tables covered
195+
in the virtual address region which is being unmapped will be freed also.
196+
197+
Adding new prctl commands
198+
-------------------------
199+
200+
Two new prctl commands are added to enable and disable MPX bounds tables
201+
management in kernel.
202+
203+
155 #define PR_MPX_ENABLE_MANAGEMENT 43
204+
156 #define PR_MPX_DISABLE_MANAGEMENT 44
205+
206+
Runtime library in userspace is responsible for allocation of bounds
207+
directory. So kernel have to use XSAVE instruction to get the base
208+
of bounds directory from BNDCFG register.
209+
210+
But XSAVE is expected to be very expensive. In order to do performance
211+
optimization, we have to get the base of bounds directory and save it
212+
into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT
213+
command execution.
214+
215+
216+
4. Special rules
217+
================
218+
219+
1) If userspace is requesting help from the kernel to do the management
220+
of bounds tables, it may not create or modify entries in the bounds directory.
221+
222+
Certainly users can allocate bounds tables and forcibly point the bounds
223+
directory at them through XSAVE instruction, and then set valid bit
224+
of bounds entry to have this entry valid. But, the kernel will decline
225+
to assist in managing these tables.
226+
227+
2) Userspace may not take multiple bounds directory entries and point
228+
them at the same bounds table.
229+
230+
This is allowed architecturally. See more information "Intel(R) Architecture
231+
Instruction Set Extensions Programming Reference" (9.3.4).
232+
233+
However, if users did this, the kernel might be fooled in to unmaping an
234+
in-use bounds table since it does not recognize sharing.

arch/ia64/include/uapi/asm/siginfo.h

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,10 @@ typedef struct siginfo {
6363
unsigned int _flags; /* see below */
6464
unsigned long _isr; /* isr */
6565
short _addr_lsb; /* lsb of faulting address */
66+
struct {
67+
void __user *_lower;
68+
void __user *_upper;
69+
} _addr_bnd;
6670
} _sigfault;
6771

6872
/* SIGPOLL */
@@ -110,9 +114,9 @@ typedef struct siginfo {
110114
/*
111115
* SIGSEGV si_codes
112116
*/
113-
#define __SEGV_PSTKOVF (__SI_FAULT|3) /* paragraph stack overflow */
117+
#define __SEGV_PSTKOVF (__SI_FAULT|4) /* paragraph stack overflow */
114118
#undef NSIGSEGV
115-
#define NSIGSEGV 3
119+
#define NSIGSEGV 4
116120

117121
#undef NSIGTRAP
118122
#define NSIGTRAP 4

arch/mips/include/uapi/asm/siginfo.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,10 @@ typedef struct siginfo {
9292
int _trapno; /* TRAP # which caused the signal */
9393
#endif
9494
short _addr_lsb;
95+
struct {
96+
void __user *_lower;
97+
void __user *_upper;
98+
} _addr_bnd;
9599
} _sigfault;
96100

97101
/* SIGPOLL, SIGXFSZ (To do ...) */

arch/s390/include/asm/mmu_context.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,4 +120,15 @@ static inline void arch_exit_mmap(struct mm_struct *mm)
120120
{
121121
}
122122

123+
static inline void arch_unmap(struct mm_struct *mm,
124+
struct vm_area_struct *vma,
125+
unsigned long start, unsigned long end)
126+
{
127+
}
128+
129+
static inline void arch_bprm_mm_init(struct mm_struct *mm,
130+
struct vm_area_struct *vma)
131+
{
132+
}
133+
123134
#endif /* __S390_MMU_CONTEXT_H */

arch/um/include/asm/mmu_context.h

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,26 @@
1010
#include <asm/mmu.h>
1111

1212
extern void uml_setup_stubs(struct mm_struct *mm);
13+
/*
14+
* Needed since we do not use the asm-generic/mm_hooks.h:
15+
*/
16+
static inline void arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
17+
{
18+
uml_setup_stubs(mm);
19+
}
1320
extern void arch_exit_mmap(struct mm_struct *mm);
21+
static inline void arch_unmap(struct mm_struct *mm,
22+
struct vm_area_struct *vma,
23+
unsigned long start, unsigned long end)
24+
{
25+
}
26+
static inline void arch_bprm_mm_init(struct mm_struct *mm,
27+
struct vm_area_struct *vma)
28+
{
29+
}
30+
/*
31+
* end asm-generic/mm_hooks.h functions
32+
*/
1433

1534
#define deactivate_mm(tsk,mm) do { } while (0)
1635

@@ -41,11 +60,6 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next,
4160
}
4261
}
4362

44-
static inline void arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm)
45-
{
46-
uml_setup_stubs(mm);
47-
}
48-
4963
static inline void enter_lazy_tlb(struct mm_struct *mm,
5064
struct task_struct *tsk)
5165
{

arch/unicore32/include/asm/mmu_context.h

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,4 +86,15 @@ static inline void arch_dup_mmap(struct mm_struct *oldmm,
8686
{
8787
}
8888

89+
static inline void arch_unmap(struct mm_struct *mm,
90+
struct vm_area_struct *vma,
91+
unsigned long start, unsigned long end)
92+
{
93+
}
94+
95+
static inline void arch_bprm_mm_init(struct mm_struct *mm,
96+
struct vm_area_struct *vma)
97+
{
98+
}
99+
89100
#endif

arch/x86/Kconfig

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -248,6 +248,10 @@ config HAVE_INTEL_TXT
248248
def_bool y
249249
depends on INTEL_IOMMU && ACPI
250250

251+
config X86_INTEL_MPX
252+
def_bool y
253+
depends on CPU_SUP_INTEL
254+
251255
config X86_32_SMP
252256
def_bool y
253257
depends on X86_32 && SMP

arch/x86/include/asm/disabled-features.h

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,12 @@
1010
* cpu_feature_enabled().
1111
*/
1212

13+
#ifdef CONFIG_X86_INTEL_MPX
14+
# define DISABLE_MPX 0
15+
#else
16+
# define DISABLE_MPX (1<<(X86_FEATURE_MPX & 31))
17+
#endif
18+
1319
#ifdef CONFIG_X86_64
1420
# define DISABLE_VME (1<<(X86_FEATURE_VME & 31))
1521
# define DISABLE_K6_MTRR (1<<(X86_FEATURE_K6_MTRR & 31))
@@ -34,6 +40,6 @@
3440
#define DISABLED_MASK6 0
3541
#define DISABLED_MASK7 0
3642
#define DISABLED_MASK8 0
37-
#define DISABLED_MASK9 0
43+
#define DISABLED_MASK9 (DISABLE_MPX)
3844

3945
#endif /* _ASM_X86_DISABLED_FEATURES_H */

0 commit comments

Comments
 (0)