Skip to content

Commit 51580e7

Browse files
Alexei Starovoitovdavem330
authored andcommitted
bpf: verifier (add docs)
this patch adds all of eBPF verfier documentation and empty bpf_check() The end goal for the verifier is to statically check safety of the program. Verifier will catch: - loops - out of range jumps - unreachable instructions - invalid instructions - uninitialized register access - uninitialized stack access - misaligned stack access - out of range stack access - invalid calling convention More details in Documentation/networking/filter.txt Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 0a542a8 commit 51580e7

File tree

5 files changed

+361
-2
lines changed

5 files changed

+361
-2
lines changed

Documentation/networking/filter.txt

Lines changed: 224 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1001,6 +1001,99 @@ instruction that loads 64-bit immediate value into a dst_reg.
10011001
Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads
10021002
32-bit immediate value into a register.
10031003

1004+
eBPF verifier
1005+
-------------
1006+
The safety of the eBPF program is determined in two steps.
1007+
1008+
First step does DAG check to disallow loops and other CFG validation.
1009+
In particular it will detect programs that have unreachable instructions.
1010+
(though classic BPF checker allows them)
1011+
1012+
Second step starts from the first insn and descends all possible paths.
1013+
It simulates execution of every insn and observes the state change of
1014+
registers and stack.
1015+
1016+
At the start of the program the register R1 contains a pointer to context
1017+
and has type PTR_TO_CTX.
1018+
If verifier sees an insn that does R2=R1, then R2 has now type
1019+
PTR_TO_CTX as well and can be used on the right hand side of expression.
1020+
If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=UNKNOWN_VALUE,
1021+
since addition of two valid pointers makes invalid pointer.
1022+
(In 'secure' mode verifier will reject any type of pointer arithmetic to make
1023+
sure that kernel addresses don't leak to unprivileged users)
1024+
1025+
If register was never written to, it's not readable:
1026+
bpf_mov R0 = R2
1027+
bpf_exit
1028+
will be rejected, since R2 is unreadable at the start of the program.
1029+
1030+
After kernel function call, R1-R5 are reset to unreadable and
1031+
R0 has a return type of the function.
1032+
1033+
Since R6-R9 are callee saved, their state is preserved across the call.
1034+
bpf_mov R6 = 1
1035+
bpf_call foo
1036+
bpf_mov R0 = R6
1037+
bpf_exit
1038+
is a correct program. If there was R1 instead of R6, it would have
1039+
been rejected.
1040+
1041+
load/store instructions are allowed only with registers of valid types, which
1042+
are PTR_TO_CTX, PTR_TO_MAP, FRAME_PTR. They are bounds and alignment checked.
1043+
For example:
1044+
bpf_mov R1 = 1
1045+
bpf_mov R2 = 2
1046+
bpf_xadd *(u32 *)(R1 + 3) += R2
1047+
bpf_exit
1048+
will be rejected, since R1 doesn't have a valid pointer type at the time of
1049+
execution of instruction bpf_xadd.
1050+
1051+
At the start R1 type is PTR_TO_CTX (a pointer to generic 'struct bpf_context')
1052+
A callback is used to customize verifier to restrict eBPF program access to only
1053+
certain fields within ctx structure with specified size and alignment.
1054+
1055+
For example, the following insn:
1056+
bpf_ld R0 = *(u32 *)(R6 + 8)
1057+
intends to load a word from address R6 + 8 and store it into R0
1058+
If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
1059+
that offset 8 of size 4 bytes can be accessed for reading, otherwise
1060+
the verifier will reject the program.
1061+
If R6=FRAME_PTR, then access should be aligned and be within
1062+
stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8,
1063+
so it will fail verification, since it's out of bounds.
1064+
1065+
The verifier will allow eBPF program to read data from stack only after
1066+
it wrote into it.
1067+
Classic BPF verifier does similar check with M[0-15] memory slots.
1068+
For example:
1069+
bpf_ld R0 = *(u32 *)(R10 - 4)
1070+
bpf_exit
1071+
is invalid program.
1072+
Though R10 is correct read-only register and has type FRAME_PTR
1073+
and R10 - 4 is within stack bounds, there were no stores into that location.
1074+
1075+
Pointer register spill/fill is tracked as well, since four (R6-R9)
1076+
callee saved registers may not be enough for some programs.
1077+
1078+
Allowed function calls are customized with bpf_verifier_ops->get_func_proto()
1079+
The eBPF verifier will check that registers match argument constraints.
1080+
After the call register R0 will be set to return type of the function.
1081+
1082+
Function calls is a main mechanism to extend functionality of eBPF programs.
1083+
Socket filters may let programs to call one set of functions, whereas tracing
1084+
filters may allow completely different set.
1085+
1086+
If a function made accessible to eBPF program, it needs to be thought through
1087+
from safety point of view. The verifier will guarantee that the function is
1088+
called with valid arguments.
1089+
1090+
seccomp vs socket filters have different security restrictions for classic BPF.
1091+
Seccomp solves this by two stage verifier: classic BPF verifier is followed
1092+
by seccomp verifier. In case of eBPF one configurable verifier is shared for
1093+
all use cases.
1094+
1095+
See details of eBPF verifier in kernel/bpf/verifier.c
1096+
10041097
eBPF maps
10051098
---------
10061099
'maps' is a generic storage of different types for sharing data between kernel
@@ -1040,6 +1133,137 @@ The map is defined by:
10401133
. key size in bytes
10411134
. value size in bytes
10421135

1136+
Understanding eBPF verifier messages
1137+
------------------------------------
1138+
1139+
The following are few examples of invalid eBPF programs and verifier error
1140+
messages as seen in the log:
1141+
1142+
Program with unreachable instructions:
1143+
static struct bpf_insn prog[] = {
1144+
BPF_EXIT_INSN(),
1145+
BPF_EXIT_INSN(),
1146+
};
1147+
Error:
1148+
unreachable insn 1
1149+
1150+
Program that reads uninitialized register:
1151+
BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
1152+
BPF_EXIT_INSN(),
1153+
Error:
1154+
0: (bf) r0 = r2
1155+
R2 !read_ok
1156+
1157+
Program that doesn't initialize R0 before exiting:
1158+
BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
1159+
BPF_EXIT_INSN(),
1160+
Error:
1161+
0: (bf) r2 = r1
1162+
1: (95) exit
1163+
R0 !read_ok
1164+
1165+
Program that accesses stack out of bounds:
1166+
BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
1167+
BPF_EXIT_INSN(),
1168+
Error:
1169+
0: (7a) *(u64 *)(r10 +8) = 0
1170+
invalid stack off=8 size=8
1171+
1172+
Program that doesn't initialize stack before passing its address into function:
1173+
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1174+
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1175+
BPF_LD_MAP_FD(BPF_REG_1, 0),
1176+
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1177+
BPF_EXIT_INSN(),
1178+
Error:
1179+
0: (bf) r2 = r10
1180+
1: (07) r2 += -8
1181+
2: (b7) r1 = 0x0
1182+
3: (85) call 1
1183+
invalid indirect read from stack off -8+0 size 8
1184+
1185+
Program that uses invalid map_fd=0 while calling to map_lookup_elem() function:
1186+
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
1187+
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1188+
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1189+
BPF_LD_MAP_FD(BPF_REG_1, 0),
1190+
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1191+
BPF_EXIT_INSN(),
1192+
Error:
1193+
0: (7a) *(u64 *)(r10 -8) = 0
1194+
1: (bf) r2 = r10
1195+
2: (07) r2 += -8
1196+
3: (b7) r1 = 0x0
1197+
4: (85) call 1
1198+
fd 0 is not pointing to valid bpf_map
1199+
1200+
Program that doesn't check return value of map_lookup_elem() before accessing
1201+
map element:
1202+
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
1203+
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1204+
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1205+
BPF_LD_MAP_FD(BPF_REG_1, 0),
1206+
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1207+
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
1208+
BPF_EXIT_INSN(),
1209+
Error:
1210+
0: (7a) *(u64 *)(r10 -8) = 0
1211+
1: (bf) r2 = r10
1212+
2: (07) r2 += -8
1213+
3: (b7) r1 = 0x0
1214+
4: (85) call 1
1215+
5: (7a) *(u64 *)(r0 +0) = 0
1216+
R0 invalid mem access 'map_value_or_null'
1217+
1218+
Program that correctly checks map_lookup_elem() returned value for NULL, but
1219+
accesses the memory with incorrect alignment:
1220+
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
1221+
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1222+
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1223+
BPF_LD_MAP_FD(BPF_REG_1, 0),
1224+
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1225+
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
1226+
BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
1227+
BPF_EXIT_INSN(),
1228+
Error:
1229+
0: (7a) *(u64 *)(r10 -8) = 0
1230+
1: (bf) r2 = r10
1231+
2: (07) r2 += -8
1232+
3: (b7) r1 = 1
1233+
4: (85) call 1
1234+
5: (15) if r0 == 0x0 goto pc+1
1235+
R0=map_ptr R10=fp
1236+
6: (7a) *(u64 *)(r0 +4) = 0
1237+
misaligned access off 4 size 8
1238+
1239+
Program that correctly checks map_lookup_elem() returned value for NULL and
1240+
accesses memory with correct alignment in one side of 'if' branch, but fails
1241+
to do so in the other side of 'if' branch:
1242+
BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
1243+
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1244+
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1245+
BPF_LD_MAP_FD(BPF_REG_1, 0),
1246+
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1247+
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
1248+
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
1249+
BPF_EXIT_INSN(),
1250+
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
1251+
BPF_EXIT_INSN(),
1252+
Error:
1253+
0: (7a) *(u64 *)(r10 -8) = 0
1254+
1: (bf) r2 = r10
1255+
2: (07) r2 += -8
1256+
3: (b7) r1 = 1
1257+
4: (85) call 1
1258+
5: (15) if r0 == 0x0 goto pc+2
1259+
R0=map_ptr R10=fp
1260+
6: (7a) *(u64 *)(r0 +0) = 0
1261+
7: (95) exit
1262+
1263+
from 5 to 8: R0=imm0 R10=fp
1264+
8: (7a) *(u64 *)(r0 +0) = 1
1265+
R0 invalid mem access 'imm'
1266+
10431267
Testing
10441268
-------
10451269

include/linux/bpf.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,5 +83,7 @@ struct bpf_prog_aux {
8383

8484
void bpf_prog_put(struct bpf_prog *prog);
8585
struct bpf_prog *bpf_prog_get(u32 ufd);
86+
/* verify correctness of eBPF program */
87+
int bpf_check(struct bpf_prog *fp, union bpf_attr *attr);
8688

8789
#endif /* _LINUX_BPF_H */

kernel/bpf/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
obj-y := core.o syscall.o
1+
obj-y := core.o syscall.o verifier.o

kernel/bpf/syscall.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -507,7 +507,7 @@ static int bpf_prog_load(union bpf_attr *attr)
507507
goto free_prog;
508508

509509
/* run eBPF verifier */
510-
/* err = bpf_check(prog, tb); */
510+
err = bpf_check(prog, attr);
511511

512512
if (err < 0)
513513
goto free_used_maps;

kernel/bpf/verifier.c

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
/* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com
2+
*
3+
* This program is free software; you can redistribute it and/or
4+
* modify it under the terms of version 2 of the GNU General Public
5+
* License as published by the Free Software Foundation.
6+
*
7+
* This program is distributed in the hope that it will be useful, but
8+
* WITHOUT ANY WARRANTY; without even the implied warranty of
9+
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
10+
* General Public License for more details.
11+
*/
12+
#include <linux/kernel.h>
13+
#include <linux/types.h>
14+
#include <linux/slab.h>
15+
#include <linux/bpf.h>
16+
#include <linux/filter.h>
17+
#include <net/netlink.h>
18+
#include <linux/file.h>
19+
#include <linux/vmalloc.h>
20+
21+
/* bpf_check() is a static code analyzer that walks eBPF program
22+
* instruction by instruction and updates register/stack state.
23+
* All paths of conditional branches are analyzed until 'bpf_exit' insn.
24+
*
25+
* The first pass is depth-first-search to check that the program is a DAG.
26+
* It rejects the following programs:
27+
* - larger than BPF_MAXINSNS insns
28+
* - if loop is present (detected via back-edge)
29+
* - unreachable insns exist (shouldn't be a forest. program = one function)
30+
* - out of bounds or malformed jumps
31+
* The second pass is all possible path descent from the 1st insn.
32+
* Since it's analyzing all pathes through the program, the length of the
33+
* analysis is limited to 32k insn, which may be hit even if total number of
34+
* insn is less then 4K, but there are too many branches that change stack/regs.
35+
* Number of 'branches to be analyzed' is limited to 1k
36+
*
37+
* On entry to each instruction, each register has a type, and the instruction
38+
* changes the types of the registers depending on instruction semantics.
39+
* If instruction is BPF_MOV64_REG(BPF_REG_1, BPF_REG_5), then type of R5 is
40+
* copied to R1.
41+
*
42+
* All registers are 64-bit.
43+
* R0 - return register
44+
* R1-R5 argument passing registers
45+
* R6-R9 callee saved registers
46+
* R10 - frame pointer read-only
47+
*
48+
* At the start of BPF program the register R1 contains a pointer to bpf_context
49+
* and has type PTR_TO_CTX.
50+
*
51+
* Verifier tracks arithmetic operations on pointers in case:
52+
* BPF_MOV64_REG(BPF_REG_1, BPF_REG_10),
53+
* BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -20),
54+
* 1st insn copies R10 (which has FRAME_PTR) type into R1
55+
* and 2nd arithmetic instruction is pattern matched to recognize
56+
* that it wants to construct a pointer to some element within stack.
57+
* So after 2nd insn, the register R1 has type PTR_TO_STACK
58+
* (and -20 constant is saved for further stack bounds checking).
59+
* Meaning that this reg is a pointer to stack plus known immediate constant.
60+
*
61+
* Most of the time the registers have UNKNOWN_VALUE type, which
62+
* means the register has some value, but it's not a valid pointer.
63+
* (like pointer plus pointer becomes UNKNOWN_VALUE type)
64+
*
65+
* When verifier sees load or store instructions the type of base register
66+
* can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, FRAME_PTR. These are three pointer
67+
* types recognized by check_mem_access() function.
68+
*
69+
* PTR_TO_MAP_VALUE means that this register is pointing to 'map element value'
70+
* and the range of [ptr, ptr + map's value_size) is accessible.
71+
*
72+
* registers used to pass values to function calls are checked against
73+
* function argument constraints.
74+
*
75+
* ARG_PTR_TO_MAP_KEY is one of such argument constraints.
76+
* It means that the register type passed to this function must be
77+
* PTR_TO_STACK and it will be used inside the function as
78+
* 'pointer to map element key'
79+
*
80+
* For example the argument constraints for bpf_map_lookup_elem():
81+
* .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
82+
* .arg1_type = ARG_CONST_MAP_PTR,
83+
* .arg2_type = ARG_PTR_TO_MAP_KEY,
84+
*
85+
* ret_type says that this function returns 'pointer to map elem value or null'
86+
* function expects 1st argument to be a const pointer to 'struct bpf_map' and
87+
* 2nd argument should be a pointer to stack, which will be used inside
88+
* the helper function as a pointer to map element key.
89+
*
90+
* On the kernel side the helper function looks like:
91+
* u64 bpf_map_lookup_elem(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
92+
* {
93+
* struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
94+
* void *key = (void *) (unsigned long) r2;
95+
* void *value;
96+
*
97+
* here kernel can access 'key' and 'map' pointers safely, knowing that
98+
* [key, key + map->key_size) bytes are valid and were initialized on
99+
* the stack of eBPF program.
100+
* }
101+
*
102+
* Corresponding eBPF program may look like:
103+
* BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), // after this insn R2 type is FRAME_PTR
104+
* BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -4), // after this insn R2 type is PTR_TO_STACK
105+
* BPF_LD_MAP_FD(BPF_REG_1, map_fd), // after this insn R1 type is CONST_PTR_TO_MAP
106+
* BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
107+
* here verifier looks at prototype of map_lookup_elem() and sees:
108+
* .arg1_type == ARG_CONST_MAP_PTR and R1->type == CONST_PTR_TO_MAP, which is ok,
109+
* Now verifier knows that this map has key of R1->map_ptr->key_size bytes
110+
*
111+
* Then .arg2_type == ARG_PTR_TO_MAP_KEY and R2->type == PTR_TO_STACK, ok so far,
112+
* Now verifier checks that [R2, R2 + map's key_size) are within stack limits
113+
* and were initialized prior to this call.
114+
* If it's ok, then verifier allows this BPF_CALL insn and looks at
115+
* .ret_type which is RET_PTR_TO_MAP_VALUE_OR_NULL, so it sets
116+
* R0->type = PTR_TO_MAP_VALUE_OR_NULL which means bpf_map_lookup_elem() function
117+
* returns ether pointer to map value or NULL.
118+
*
119+
* When type PTR_TO_MAP_VALUE_OR_NULL passes through 'if (reg != 0) goto +off'
120+
* insn, the register holding that pointer in the true branch changes state to
121+
* PTR_TO_MAP_VALUE and the same register changes state to CONST_IMM in the false
122+
* branch. See check_cond_jmp_op().
123+
*
124+
* After the call R0 is set to return type of the function and registers R1-R5
125+
* are set to NOT_INIT to indicate that they are no longer readable.
126+
*/
127+
128+
int bpf_check(struct bpf_prog *prog, union bpf_attr *attr)
129+
{
130+
int ret = -EINVAL;
131+
132+
return ret;
133+
}

0 commit comments

Comments
 (0)