Skip to content

Commit 5ba5780

Browse files
committed
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says: ==================== pull-request: bpf 2019-04-04 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Batch of fixes to the existing BPF flow dissector API to support calling BPF programs from the eth_get_headlen context (support for latter is planned to be added in bpf-next), from Stanislav. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2 parents 3baf5c2 + 5eed789 commit 5ba5780

File tree

6 files changed

+208
-26
lines changed

6 files changed

+208
-26
lines changed
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
==================
4+
BPF Flow Dissector
5+
==================
6+
7+
Overview
8+
========
9+
10+
Flow dissector is a routine that parses metadata out of the packets. It's
11+
used in the various places in the networking subsystem (RFS, flow hash, etc).
12+
13+
BPF flow dissector is an attempt to reimplement C-based flow dissector logic
14+
in BPF to gain all the benefits of BPF verifier (namely, limits on the
15+
number of instructions and tail calls).
16+
17+
API
18+
===
19+
20+
BPF flow dissector programs operate on an ``__sk_buff``. However, only the
21+
limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``.
22+
``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input
23+
and output arguments.
24+
25+
The inputs are:
26+
* ``nhoff`` - initial offset of the networking header
27+
* ``thoff`` - initial offset of the transport header, initialized to nhoff
28+
* ``n_proto`` - L3 protocol type, parsed out of L2 header
29+
30+
Flow dissector BPF program should fill out the rest of the ``struct
31+
bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
32+
also adjusted accordingly.
33+
34+
The return code of the BPF program is either BPF_OK to indicate successful
35+
dissection, or BPF_DROP to indicate parsing error.
36+
37+
__sk_buff->data
38+
===============
39+
40+
In the VLAN-less case, this is what the initial state of the BPF flow
41+
dissector looks like::
42+
43+
+------+------+------------+-----------+
44+
| DMAC | SMAC | ETHER_TYPE | L3_HEADER |
45+
+------+------+------------+-----------+
46+
^
47+
|
48+
+-- flow dissector starts here
49+
50+
51+
.. code:: c
52+
53+
skb->data + flow_keys->nhoff point to the first byte of L3_HEADER
54+
flow_keys->thoff = nhoff
55+
flow_keys->n_proto = ETHER_TYPE
56+
57+
In case of VLAN, flow dissector can be called with the two different states.
58+
59+
Pre-VLAN parsing::
60+
61+
+------+------+------+-----+-----------+-----------+
62+
| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
63+
+------+------+------+-----+-----------+-----------+
64+
^
65+
|
66+
+-- flow dissector starts here
67+
68+
.. code:: c
69+
70+
skb->data + flow_keys->nhoff point the to first byte of TCI
71+
flow_keys->thoff = nhoff
72+
flow_keys->n_proto = TPID
73+
74+
Please note that TPID can be 802.1AD and, hence, BPF program would
75+
have to parse VLAN information twice for double tagged packets.
76+
77+
78+
Post-VLAN parsing::
79+
80+
+------+------+------+-----+-----------+-----------+
81+
| DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER |
82+
+------+------+------+-----+-----------+-----------+
83+
^
84+
|
85+
+-- flow dissector starts here
86+
87+
.. code:: c
88+
89+
skb->data + flow_keys->nhoff point the to first byte of L3_HEADER
90+
flow_keys->thoff = nhoff
91+
flow_keys->n_proto = ETHER_TYPE
92+
93+
In this case VLAN information has been processed before the flow dissector
94+
and BPF flow dissector is not required to handle it.
95+
96+
97+
The takeaway here is as follows: BPF flow dissector program can be called with
98+
the optional VLAN header and should gracefully handle both cases: when single
99+
or double VLAN is present and when it is not present. The same program
100+
can be called for both cases and would have to be written carefully to
101+
handle both cases.
102+
103+
104+
Reference Implementation
105+
========================
106+
107+
See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference
108+
implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]``
109+
for the loader. bpftool can be used to load BPF flow dissector program as well.
110+
111+
The reference implementation is organized as follows:
112+
* ``jmp_table`` map that contains sub-programs for each supported L3 protocol
113+
* ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and
114+
does ``bpf_tail_call`` to the appropriate L3 handler
115+
116+
Since BPF at this point doesn't support looping (or any jumping back),
117+
jmp_table is used instead to handle multiple levels of encapsulation (and
118+
IPv6 options).
119+
120+
121+
Current Limitations
122+
===================
123+
BPF flow dissector doesn't support exporting all the metadata that in-kernel
124+
C-based implementation can export. Notable example is single VLAN (802.1Q)
125+
and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys``
126+
for a set of information that's currently can be exported from the BPF context.

Documentation/networking/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ Contents:
99
netdev-FAQ
1010
af_xdp
1111
batman-adv
12+
bpf_flow_dissector
1213
can
1314
can_ucan_protocol
1415
device_drivers/freescale/dpaa2/index

net/core/filter.c

Lines changed: 3 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6613,14 +6613,8 @@ static bool flow_dissector_is_valid_access(int off, int size,
66136613
const struct bpf_prog *prog,
66146614
struct bpf_insn_access_aux *info)
66156615
{
6616-
if (type == BPF_WRITE) {
6617-
switch (off) {
6618-
case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]):
6619-
break;
6620-
default:
6621-
return false;
6622-
}
6623-
}
6616+
if (type == BPF_WRITE)
6617+
return false;
66246618

66256619
switch (off) {
66266620
case bpf_ctx_range(struct __sk_buff, data):
@@ -6632,11 +6626,7 @@ static bool flow_dissector_is_valid_access(int off, int size,
66326626
case bpf_ctx_range_ptr(struct __sk_buff, flow_keys):
66336627
info->reg_type = PTR_TO_FLOW_KEYS;
66346628
break;
6635-
case bpf_ctx_range(struct __sk_buff, tc_classid):
6636-
case bpf_ctx_range(struct __sk_buff, data_meta):
6637-
case bpf_ctx_range_till(struct __sk_buff, family, local_port):
6638-
case bpf_ctx_range(struct __sk_buff, tstamp):
6639-
case bpf_ctx_range(struct __sk_buff, wire_len):
6629+
default:
66406630
return false;
66416631
}
66426632

net/core/flow_dissector.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -707,6 +707,7 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
707707
/* Pass parameters to the BPF program */
708708
memset(flow_keys, 0, sizeof(*flow_keys));
709709
cb->qdisc_cb.flow_keys = flow_keys;
710+
flow_keys->n_proto = skb->protocol;
710711
flow_keys->nhoff = skb_network_offset(skb);
711712
flow_keys->thoff = flow_keys->nhoff;
712713

@@ -716,7 +717,8 @@ bool __skb_flow_bpf_dissect(struct bpf_prog *prog,
716717
/* Restore state */
717718
memcpy(cb, &cb_saved, sizeof(cb_saved));
718719

719-
flow_keys->nhoff = clamp_t(u16, flow_keys->nhoff, 0, skb->len);
720+
flow_keys->nhoff = clamp_t(u16, flow_keys->nhoff,
721+
skb_network_offset(skb), skb->len);
720722
flow_keys->thoff = clamp_t(u16, flow_keys->thoff,
721723
flow_keys->nhoff, skb->len);
722724

tools/testing/selftests/bpf/prog_tests/flow_dissector.c

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,58 @@ static struct bpf_flow_keys pkt_v6_flow_keys = {
3939
.n_proto = __bpf_constant_htons(ETH_P_IPV6),
4040
};
4141

42+
#define VLAN_HLEN 4
43+
44+
static struct {
45+
struct ethhdr eth;
46+
__u16 vlan_tci;
47+
__u16 vlan_proto;
48+
struct iphdr iph;
49+
struct tcphdr tcp;
50+
} __packed pkt_vlan_v4 = {
51+
.eth.h_proto = __bpf_constant_htons(ETH_P_8021Q),
52+
.vlan_proto = __bpf_constant_htons(ETH_P_IP),
53+
.iph.ihl = 5,
54+
.iph.protocol = IPPROTO_TCP,
55+
.iph.tot_len = __bpf_constant_htons(MAGIC_BYTES),
56+
.tcp.urg_ptr = 123,
57+
.tcp.doff = 5,
58+
};
59+
60+
static struct bpf_flow_keys pkt_vlan_v4_flow_keys = {
61+
.nhoff = VLAN_HLEN,
62+
.thoff = VLAN_HLEN + sizeof(struct iphdr),
63+
.addr_proto = ETH_P_IP,
64+
.ip_proto = IPPROTO_TCP,
65+
.n_proto = __bpf_constant_htons(ETH_P_IP),
66+
};
67+
68+
static struct {
69+
struct ethhdr eth;
70+
__u16 vlan_tci;
71+
__u16 vlan_proto;
72+
__u16 vlan_tci2;
73+
__u16 vlan_proto2;
74+
struct ipv6hdr iph;
75+
struct tcphdr tcp;
76+
} __packed pkt_vlan_v6 = {
77+
.eth.h_proto = __bpf_constant_htons(ETH_P_8021AD),
78+
.vlan_proto = __bpf_constant_htons(ETH_P_8021Q),
79+
.vlan_proto2 = __bpf_constant_htons(ETH_P_IPV6),
80+
.iph.nexthdr = IPPROTO_TCP,
81+
.iph.payload_len = __bpf_constant_htons(MAGIC_BYTES),
82+
.tcp.urg_ptr = 123,
83+
.tcp.doff = 5,
84+
};
85+
86+
static struct bpf_flow_keys pkt_vlan_v6_flow_keys = {
87+
.nhoff = VLAN_HLEN * 2,
88+
.thoff = VLAN_HLEN * 2 + sizeof(struct ipv6hdr),
89+
.addr_proto = ETH_P_IPV6,
90+
.ip_proto = IPPROTO_TCP,
91+
.n_proto = __bpf_constant_htons(ETH_P_IPV6),
92+
};
93+
4294
void test_flow_dissector(void)
4395
{
4496
struct bpf_flow_keys flow_keys;
@@ -68,5 +120,21 @@ void test_flow_dissector(void)
68120
err, errno, retval, duration, size, sizeof(flow_keys));
69121
CHECK_FLOW_KEYS("ipv6_flow_keys", flow_keys, pkt_v6_flow_keys);
70122

123+
err = bpf_prog_test_run(prog_fd, 10, &pkt_vlan_v4, sizeof(pkt_vlan_v4),
124+
&flow_keys, &size, &retval, &duration);
125+
CHECK(size != sizeof(flow_keys) || err || retval != 1, "vlan_ipv4",
126+
"err %d errno %d retval %d duration %d size %u/%lu\n",
127+
err, errno, retval, duration, size, sizeof(flow_keys));
128+
CHECK_FLOW_KEYS("vlan_ipv4_flow_keys", flow_keys,
129+
pkt_vlan_v4_flow_keys);
130+
131+
err = bpf_prog_test_run(prog_fd, 10, &pkt_vlan_v6, sizeof(pkt_vlan_v6),
132+
&flow_keys, &size, &retval, &duration);
133+
CHECK(size != sizeof(flow_keys) || err || retval != 1, "vlan_ipv6",
134+
"err %d errno %d retval %d duration %d size %u/%lu\n",
135+
err, errno, retval, duration, size, sizeof(flow_keys));
136+
CHECK_FLOW_KEYS("vlan_ipv6_flow_keys", flow_keys,
137+
pkt_vlan_v6_flow_keys);
138+
71139
bpf_object__close(obj);
72140
}

tools/testing/selftests/bpf/progs/bpf_flow.c

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,6 @@ static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto)
9292
{
9393
struct bpf_flow_keys *keys = skb->flow_keys;
9494

95-
keys->n_proto = proto;
9695
switch (proto) {
9796
case bpf_htons(ETH_P_IP):
9897
bpf_tail_call(skb, &jmp_table, IP);
@@ -119,10 +118,9 @@ static __always_inline int parse_eth_proto(struct __sk_buff *skb, __be16 proto)
119118
SEC("flow_dissector")
120119
int _dissect(struct __sk_buff *skb)
121120
{
122-
if (!skb->vlan_present)
123-
return parse_eth_proto(skb, skb->protocol);
124-
else
125-
return parse_eth_proto(skb, skb->vlan_proto);
121+
struct bpf_flow_keys *keys = skb->flow_keys;
122+
123+
return parse_eth_proto(skb, keys->n_proto);
126124
}
127125

128126
/* Parses on IPPROTO_* */
@@ -336,35 +334,32 @@ PROG(VLAN)(struct __sk_buff *skb)
336334
{
337335
struct bpf_flow_keys *keys = skb->flow_keys;
338336
struct vlan_hdr *vlan, _vlan;
339-
__be16 proto;
340-
341-
/* Peek back to see if single or double-tagging */
342-
if (bpf_skb_load_bytes(skb, keys->thoff - sizeof(proto), &proto,
343-
sizeof(proto)))
344-
return BPF_DROP;
345337

346338
/* Account for double-tagging */
347-
if (proto == bpf_htons(ETH_P_8021AD)) {
339+
if (keys->n_proto == bpf_htons(ETH_P_8021AD)) {
348340
vlan = bpf_flow_dissect_get_header(skb, sizeof(*vlan), &_vlan);
349341
if (!vlan)
350342
return BPF_DROP;
351343

352344
if (vlan->h_vlan_encapsulated_proto != bpf_htons(ETH_P_8021Q))
353345
return BPF_DROP;
354346

347+
keys->nhoff += sizeof(*vlan);
355348
keys->thoff += sizeof(*vlan);
356349
}
357350

358351
vlan = bpf_flow_dissect_get_header(skb, sizeof(*vlan), &_vlan);
359352
if (!vlan)
360353
return BPF_DROP;
361354

355+
keys->nhoff += sizeof(*vlan);
362356
keys->thoff += sizeof(*vlan);
363357
/* Only allow 8021AD + 8021Q double tagging and no triple tagging.*/
364358
if (vlan->h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021AD) ||
365359
vlan->h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021Q))
366360
return BPF_DROP;
367361

362+
keys->n_proto = vlan->h_vlan_encapsulated_proto;
368363
return parse_eth_proto(skb, vlan->h_vlan_encapsulated_proto);
369364
}
370365

0 commit comments

Comments
 (0)