@@ -1001,6 +1001,99 @@ instruction that loads 64-bit immediate value into a dst_reg.
1001
1001
Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads
1002
1002
32-bit immediate value into a register.
1003
1003
1004
+ eBPF verifier
1005
+ -------------
1006
+ The safety of the eBPF program is determined in two steps.
1007
+
1008
+ First step does DAG check to disallow loops and other CFG validation.
1009
+ In particular it will detect programs that have unreachable instructions.
1010
+ (though classic BPF checker allows them)
1011
+
1012
+ Second step starts from the first insn and descends all possible paths.
1013
+ It simulates execution of every insn and observes the state change of
1014
+ registers and stack.
1015
+
1016
+ At the start of the program the register R1 contains a pointer to context
1017
+ and has type PTR_TO_CTX.
1018
+ If verifier sees an insn that does R2=R1, then R2 has now type
1019
+ PTR_TO_CTX as well and can be used on the right hand side of expression.
1020
+ If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=UNKNOWN_VALUE,
1021
+ since addition of two valid pointers makes invalid pointer.
1022
+ (In 'secure' mode verifier will reject any type of pointer arithmetic to make
1023
+ sure that kernel addresses don't leak to unprivileged users)
1024
+
1025
+ If register was never written to, it's not readable:
1026
+ bpf_mov R0 = R2
1027
+ bpf_exit
1028
+ will be rejected, since R2 is unreadable at the start of the program.
1029
+
1030
+ After kernel function call, R1-R5 are reset to unreadable and
1031
+ R0 has a return type of the function.
1032
+
1033
+ Since R6-R9 are callee saved, their state is preserved across the call.
1034
+ bpf_mov R6 = 1
1035
+ bpf_call foo
1036
+ bpf_mov R0 = R6
1037
+ bpf_exit
1038
+ is a correct program. If there was R1 instead of R6, it would have
1039
+ been rejected.
1040
+
1041
+ load/store instructions are allowed only with registers of valid types, which
1042
+ are PTR_TO_CTX, PTR_TO_MAP, FRAME_PTR. They are bounds and alignment checked.
1043
+ For example:
1044
+ bpf_mov R1 = 1
1045
+ bpf_mov R2 = 2
1046
+ bpf_xadd *(u32 *)(R1 + 3) += R2
1047
+ bpf_exit
1048
+ will be rejected, since R1 doesn't have a valid pointer type at the time of
1049
+ execution of instruction bpf_xadd.
1050
+
1051
+ At the start R1 type is PTR_TO_CTX (a pointer to generic 'struct bpf_context')
1052
+ A callback is used to customize verifier to restrict eBPF program access to only
1053
+ certain fields within ctx structure with specified size and alignment.
1054
+
1055
+ For example, the following insn:
1056
+ bpf_ld R0 = *(u32 *)(R6 + 8)
1057
+ intends to load a word from address R6 + 8 and store it into R0
1058
+ If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know
1059
+ that offset 8 of size 4 bytes can be accessed for reading, otherwise
1060
+ the verifier will reject the program.
1061
+ If R6=FRAME_PTR, then access should be aligned and be within
1062
+ stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8,
1063
+ so it will fail verification, since it's out of bounds.
1064
+
1065
+ The verifier will allow eBPF program to read data from stack only after
1066
+ it wrote into it.
1067
+ Classic BPF verifier does similar check with M[0-15] memory slots.
1068
+ For example:
1069
+ bpf_ld R0 = *(u32 *)(R10 - 4)
1070
+ bpf_exit
1071
+ is invalid program.
1072
+ Though R10 is correct read-only register and has type FRAME_PTR
1073
+ and R10 - 4 is within stack bounds, there were no stores into that location.
1074
+
1075
+ Pointer register spill/fill is tracked as well, since four (R6-R9)
1076
+ callee saved registers may not be enough for some programs.
1077
+
1078
+ Allowed function calls are customized with bpf_verifier_ops->get_func_proto()
1079
+ The eBPF verifier will check that registers match argument constraints.
1080
+ After the call register R0 will be set to return type of the function.
1081
+
1082
+ Function calls is a main mechanism to extend functionality of eBPF programs.
1083
+ Socket filters may let programs to call one set of functions, whereas tracing
1084
+ filters may allow completely different set.
1085
+
1086
+ If a function made accessible to eBPF program, it needs to be thought through
1087
+ from safety point of view. The verifier will guarantee that the function is
1088
+ called with valid arguments.
1089
+
1090
+ seccomp vs socket filters have different security restrictions for classic BPF.
1091
+ Seccomp solves this by two stage verifier: classic BPF verifier is followed
1092
+ by seccomp verifier. In case of eBPF one configurable verifier is shared for
1093
+ all use cases.
1094
+
1095
+ See details of eBPF verifier in kernel/bpf/verifier.c
1096
+
1004
1097
eBPF maps
1005
1098
---------
1006
1099
'maps' is a generic storage of different types for sharing data between kernel
@@ -1040,6 +1133,137 @@ The map is defined by:
1040
1133
. key size in bytes
1041
1134
. value size in bytes
1042
1135
1136
+ Understanding eBPF verifier messages
1137
+ ------------------------------------
1138
+
1139
+ The following are few examples of invalid eBPF programs and verifier error
1140
+ messages as seen in the log:
1141
+
1142
+ Program with unreachable instructions:
1143
+ static struct bpf_insn prog[] = {
1144
+ BPF_EXIT_INSN(),
1145
+ BPF_EXIT_INSN(),
1146
+ };
1147
+ Error:
1148
+ unreachable insn 1
1149
+
1150
+ Program that reads uninitialized register:
1151
+ BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
1152
+ BPF_EXIT_INSN(),
1153
+ Error:
1154
+ 0: (bf) r0 = r2
1155
+ R2 !read_ok
1156
+
1157
+ Program that doesn't initialize R0 before exiting:
1158
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_1),
1159
+ BPF_EXIT_INSN(),
1160
+ Error:
1161
+ 0: (bf) r2 = r1
1162
+ 1: (95) exit
1163
+ R0 !read_ok
1164
+
1165
+ Program that accesses stack out of bounds:
1166
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0),
1167
+ BPF_EXIT_INSN(),
1168
+ Error:
1169
+ 0: (7a) *(u64 *)(r10 +8) = 0
1170
+ invalid stack off=8 size=8
1171
+
1172
+ Program that doesn't initialize stack before passing its address into function:
1173
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1174
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1175
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
1176
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1177
+ BPF_EXIT_INSN(),
1178
+ Error:
1179
+ 0: (bf) r2 = r10
1180
+ 1: (07) r2 += -8
1181
+ 2: (b7) r1 = 0x0
1182
+ 3: (85) call 1
1183
+ invalid indirect read from stack off -8+0 size 8
1184
+
1185
+ Program that uses invalid map_fd=0 while calling to map_lookup_elem() function:
1186
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
1187
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1188
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1189
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
1190
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1191
+ BPF_EXIT_INSN(),
1192
+ Error:
1193
+ 0: (7a) *(u64 *)(r10 -8) = 0
1194
+ 1: (bf) r2 = r10
1195
+ 2: (07) r2 += -8
1196
+ 3: (b7) r1 = 0x0
1197
+ 4: (85) call 1
1198
+ fd 0 is not pointing to valid bpf_map
1199
+
1200
+ Program that doesn't check return value of map_lookup_elem() before accessing
1201
+ map element:
1202
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
1203
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1204
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1205
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
1206
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1207
+ BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
1208
+ BPF_EXIT_INSN(),
1209
+ Error:
1210
+ 0: (7a) *(u64 *)(r10 -8) = 0
1211
+ 1: (bf) r2 = r10
1212
+ 2: (07) r2 += -8
1213
+ 3: (b7) r1 = 0x0
1214
+ 4: (85) call 1
1215
+ 5: (7a) *(u64 *)(r0 +0) = 0
1216
+ R0 invalid mem access 'map_value_or_null'
1217
+
1218
+ Program that correctly checks map_lookup_elem() returned value for NULL, but
1219
+ accesses the memory with incorrect alignment:
1220
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
1221
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1222
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1223
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
1224
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1225
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
1226
+ BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0),
1227
+ BPF_EXIT_INSN(),
1228
+ Error:
1229
+ 0: (7a) *(u64 *)(r10 -8) = 0
1230
+ 1: (bf) r2 = r10
1231
+ 2: (07) r2 += -8
1232
+ 3: (b7) r1 = 1
1233
+ 4: (85) call 1
1234
+ 5: (15) if r0 == 0x0 goto pc+1
1235
+ R0=map_ptr R10=fp
1236
+ 6: (7a) *(u64 *)(r0 +4) = 0
1237
+ misaligned access off 4 size 8
1238
+
1239
+ Program that correctly checks map_lookup_elem() returned value for NULL and
1240
+ accesses memory with correct alignment in one side of 'if' branch, but fails
1241
+ to do so in the other side of 'if' branch:
1242
+ BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
1243
+ BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
1244
+ BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
1245
+ BPF_LD_MAP_FD(BPF_REG_1, 0),
1246
+ BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
1247
+ BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
1248
+ BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
1249
+ BPF_EXIT_INSN(),
1250
+ BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1),
1251
+ BPF_EXIT_INSN(),
1252
+ Error:
1253
+ 0: (7a) *(u64 *)(r10 -8) = 0
1254
+ 1: (bf) r2 = r10
1255
+ 2: (07) r2 += -8
1256
+ 3: (b7) r1 = 1
1257
+ 4: (85) call 1
1258
+ 5: (15) if r0 == 0x0 goto pc+2
1259
+ R0=map_ptr R10=fp
1260
+ 6: (7a) *(u64 *)(r0 +0) = 0
1261
+ 7: (95) exit
1262
+
1263
+ from 5 to 8: R0=imm0 R10=fp
1264
+ 8: (7a) *(u64 *)(r0 +0) = 1
1265
+ R0 invalid mem access 'imm'
1266
+
1043
1267
Testing
1044
1268
-------
1045
1269
0 commit comments