Skip to content

Commit 143bdc2

Browse files
committed
Merge branch 'bpf-libbpf-af-xdp'
Magnus Karlsson says: ==================== This patch proposes to add AF_XDP support to libbpf. The main reason for this is to facilitate writing applications that use AF_XDP by offering higher-level APIs that hide many of the details of the AF_XDP uapi. This is in the same vein as libbpf facilitates XDP adoption by offering easy-to-use higher level interfaces of XDP functionality. Hopefully this will facilitate adoption of AF_XDP, make applications using it simpler and smaller, and finally also make it possible for applications to benefit from optimizations in the AF_XDP user space access code. Previously, people just copied and pasted the code from the sample application into their application, which is not desirable. The proposed interface is composed of two parts: * Low-level access interface to the four rings and the packet * High-level control plane interface for creating and setting up umems and AF_XDP sockets. This interface also loads a simple XDP program that routes all traffic on a queue up to the AF_XDP socket. The sample program has been updated to use this new interface and in that process it lost roughly 300 lines of code. I cannot detect any performance degradations due to the use of this library instead of the previous functions that were inlined in the sample application. But I did measure this on a slower machine and not the Broadwell that we normally use. The rings are now called xsk_ring and when a producer operates on it. It is xsk_ring_prod and for a consumer it is xsk_ring_cons. This way we can get some compile time error checking that the rings are used correctly. Comments and contenplations: * The current behaviour is that the library loads an XDP program (if requested to do so) but the clean up of this program is left to the application. It would be possible to implement this cleanup in the library, but it would require state to be kept on netdev level, which there is none at the moment, and the synchronization of this between processes. All this adding complexity. But when we get an XDP program per queue id, then it becomes trivial to also remove the XDP program when the application exits. This proposal from Jesper, Björn and others will also improve the performance of libbpf, since most of the XDP program code can be removed when that feature is supported. * In a future release, I am planning on adding a higher level data plane interface too. This will be based around recvmsg and sendmsg with the use of struct iovec for batching, without the user having to know anything about the underlying four rings of an AF_XDP socket. There will be one semantic difference though from the standard recvmsg and that is that the kernel will fill in the iovecs instead of the application. But the rest should be the same as the libc versions so that application writers feel at home. Patch 1: adds AF_XDP support in libbpf Patch 2: updates the xdpsock sample application to use the libbpf functions Patch 3: Documentation update to help first time users Changes v5 to v6: * Fixed prog_fd bug found by Xiaolong Ye. Thanks! Changes v4 to v5: * Added a FAQ to the documentation * Removed xsk_umem__get_data and renamed xsk_umem__get_dat_raw to xsk_umem__get_data * Replaced the netlink code with bpf_get_link_xdp_id() * Dynamic allocation of the map sizes. They are now sized after the max number of queueus on the netdev in question. Changes v3 to v4: * Dropped the pr_*() patch in favor of Yonghong Song's patch set * Addressed the review comments of Daniel Borkmann, mainly leaking of file descriptors at clean up and making the data plane APIs all static inline (with the exception of xsk_umem__get_data that uses an internal structure I do not want to expose). * Fixed the netlink callback as suggested by Maciej Fijalkowski. * Removed an unecessary include in the sample program as spotted by Ilia Fillipov. Changes v2 to v3: * Added automatic loading of a simple XDP program that routes all traffic on a queue up to the AF_XDP socket. This program loading can be disabled. * Updated function names to be consistent with the libbpf naming convention * Moved all code to xsk.[ch] * Removed all the XDP program loading code from the sample since this is now done by libbpf * The initialization functions now return a handle as suggested by Alexei * const statements added in the API where applicable. Changes v1 to v2: * Fixed cleanup of library state on error. * Moved API to initial version * Prefixed all public functions by xsk__ instead of xsk_ * Added comment about changed default ring sizes, batch size and umem size in the sample application commit message * The library now only creates an Rx or Tx ring if the respective parameter is != NULL ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2 parents 740f8a6 + 0f4a9b7 commit 143bdc2

File tree

13 files changed

+1376
-652
lines changed

13 files changed

+1376
-652
lines changed

Documentation/networking/af_xdp.rst

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,41 @@ using::
295295
For XDP_SKB mode, use the switch "-S" instead of "-N" and all options
296296
can be displayed with "-h", as usual.
297297

298+
FAQ
299+
=======
300+
301+
Q: I am not seeing any traffic on the socket. What am I doing wrong?
302+
303+
A: When a netdev of a physical NIC is initialized, Linux usually
304+
allocates one Rx and Tx queue pair per core. So on a 8 core system,
305+
queue ids 0 to 7 will be allocated, one per core. In the AF_XDP
306+
bind call or the xsk_socket__create libbpf function call, you
307+
specify a specific queue id to bind to and it is only the traffic
308+
towards that queue you are going to get on you socket. So in the
309+
example above, if you bind to queue 0, you are NOT going to get any
310+
traffic that is distributed to queues 1 through 7. If you are
311+
lucky, you will see the traffic, but usually it will end up on one
312+
of the queues you have not bound to.
313+
314+
There are a number of ways to solve the problem of getting the
315+
traffic you want to the queue id you bound to. If you want to see
316+
all the traffic, you can force the netdev to only have 1 queue, queue
317+
id 0, and then bind to queue 0. You can use ethtool to do this::
318+
319+
sudo ethtool -L <interface> combined 1
320+
321+
If you want to only see part of the traffic, you can program the
322+
NIC through ethtool to filter out your traffic to a single queue id
323+
that you can bind your XDP socket to. Here is one example in which
324+
UDP traffic to and from port 4242 are sent to queue 2::
325+
326+
sudo ethtool -N <interface> rx-flow-hash udp4 fn
327+
sudo ethtool -N <interface> flow-type udp4 src-port 4242 dst-port \
328+
4242 action 2
329+
330+
A number of other ways are possible all up to the capabilitites of
331+
the NIC you have.
332+
298333
Credits
299334
=======
300335

@@ -309,4 +344,3 @@ Credits
309344
- Michael S. Tsirkin
310345
- Qi Z Zhang
311346
- Willem de Bruijn
312-

samples/bpf/Makefile

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,6 @@ always += xdp2skb_meta_kern.o
163163
always += syscall_tp_kern.o
164164
always += cpustat_kern.o
165165
always += xdp_adjust_tail_kern.o
166-
always += xdpsock_kern.o
167166
always += xdp_fwd_kern.o
168167
always += task_fd_query_kern.o
169168
always += xdp_sample_pkts_kern.o

samples/bpf/xdpsock.h

Lines changed: 0 additions & 11 deletions
This file was deleted.

samples/bpf/xdpsock_kern.c

Lines changed: 0 additions & 56 deletions
This file was deleted.

0 commit comments

Comments
 (0)