Hardware Accelerating Linux
Network Functions
Part I: Virtual Switching Technologies in Linux
Toshiaki Makita
NTT Open Source Software Center
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved.
Part I topics
• Virtual switching technologies in Linux
• Software switches and NIC embedded switch
• Userland APIs and commands for bridge
• Introduction to Recent features of bridge
(and others)
• FDB manipulation
• VLAN filtering
• Learning /flooding control
• Non-promiscuous bridge
• VLAN filtering for 802.1ad (Q-in-Q)
• Demo
• Setting up non-promiscuous bridge
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 2
Who is Toshiaki Makita?
• Linux kernel engineer at NTT Open Source
Software Center
• Technical support for NTT group companies
• Active patch submitter on kernel networking
subsystem
• bridge, vlan, etc.
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 3
Switching technologies in Linux
• Linux (kernel) has 3 types of software
switches
• bridge
• macvlan
• Open vSwitch
• NIC embedded switch in SR-IOV device is
also used instead of software switches
• These are often used for network backend in
server virtualization
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 4
bridge
• HW switch like device (IEEE 802.1D)
• Has FDB (Forwarding DB), STP (Spanning tree), etc.
• Use promiscuous mode that allows to receive all packets
• Common NICs filter unicast whose dst is not its mac address
without promiscuous mode
• Many NICs also filter multicast / vlan-tagged packets by default
without bridge with bridge
kernel kernel
TCP/IP TCP/IP
if dst mac is bridge device
br0
pass to bridge
upper layer handler hook
eth0 eth0 eth1
promiscuous
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada promiscuous
mode mode
Copyright © 2015 NTT Corp. All Rights Reserved. 5
bridge with KVM
• Used with tap device
• Tap device qemu/vhost
• packet transmission -> file read Guest
• file write -> packet reception
eth0
fd
kernel read/write
bridge vfs
eth0 tap0
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 6
macvlan
• VLAN using not 802.1Q tag but mac address
• 4 types of mode
• private kernel
• vepa MAC address A MAC address B
• bridge
macvlan0 macvlan1
• passthru
• Using unicast
macvlan
filtering if supported,
instead of promiscuous handler hook
mode eth0
(except for passthru) unicast filtering
• Unicast filtering allows
NIC to receive multiple
mac addresses Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 7
macvlan (bridge mode)
• Light weight bridge
• No source learning kernel
• No STP
MAC address A MAC address B
• Only one uplink
macvlan0 macvlan1
• Allow traffic
between macvlans
(via macvlan stack) macvlan
eth0
External SW
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 8
macvtap (private, vepa, bridge) with KVM
• macvtap qemu/vhost qemu/vhost
• tap-like macvlan variant
Guest Guest
• packet reception
-> file read eth0 eth0
• file write
-> packet transmission
fd fd
read/write read/write
macvtap0 macvtap1
macvlan
kernel eth0
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 9
Open vSwitch
• Supports OpenFlow
• Can be used as a normal switch as well
• Has many features (VLAN tagging, VXLAN, Geneve, GRE, bonding, etc.)
• Flow based forwarding
• Control plane in user space
• flow miss-hit causes upcall to userspace daemon
user space
daemon Flow table OpenFlow
(ovs-vswitchd) controller
FDB
control plane
upcall
kernel openvswitch
Flow table
(datapath) (cache)
data plane
handler hook
eth0 eth1
promiscuous
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
mode Copyright © 2015 NTT Corp. All Rights Reserved. 10
Open vSwitch with KVM
• Configuration is the same as
bridge qemu/vhost
• used with tap device Guest
eth0
fd
kernel read/write
openvswitch vfs
eth0 tap0
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 11
NIC embedded switch (SR-IOV)
• SR-IOV
• Addition to PCI normal physical function (PF),
allow to add light weight virtual functions (VF)
• VF appears as a network interface (eth0_0, eth0_1...)
• Some SR-IOV devices have switches in them
• allow PF-VF / VF-VF communication
PF VF VF
eth0 eth0_0 eth0_1
embedded switch
kernel SR-IOV supported NIC
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 12
NIC embedded switch (SR-IOV)
• SR-IOV with KVM
• Use PCI-passthrough to attach VF to guest
qemu qemu
Guest Guest
eth0_0 eth0_1
eth0
embedded switch
kernel SR-IOV supported NIC
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 13
Userland APIs and commands (bridge)
• Various APIs
• ioctl
• sysfs
• netlink
• Netlink is preferred for new features
• Because it is extensible
• sysfs is sometimes used
• Commands
• brctl (in bridge-utils, using ioctl / sysfs)
• ip / bridge (in iproute2, using netlink)
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 14
Userland APIs and commands (bridge)
• brctl
# brctl addbr <bridge> ... create new bridge
# brctl addif <bridge> <port> ... attach port to bridge
# brctl showmacs <bridge> ... show fdb entries
• These operations can be performed by netlink
based commands as well (Since kernel 3.0)
# ip link add <bridge> type bridge ... create new bridge
# ip link set <port> master <bridge> ... attach port
# bridge fdb show ... show fdb entries
• And recent features can only be used by netlink
based ones or direct sysfs write
# bridge fdb add
# bridge vlan add
etc... Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 15
Recent features of bridge (and others)
• FDB manipulation
• VLAN filtering
• Learning / flooding control
• Non-promiscuous bridge
• VLAN filtering for 802.1ad (Q-in-Q)
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 16
FDB manipulation
• FDB
• Forwarding database
• Learning: packet arrival triggers entry creation
• Source MAC address is used with incoming port
• Flood if failed to find entry
• Flood: deliver packet to all ports but incoming one
FDB kernel
MAC address Dst
learning
aa:bb:cc:dd:ee:ff eth0 bridge
...
eth0 packet eth1
arrival from
aa:bb:cc:dd:ee:ff
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 17
FDB manipulation
• FDB manipulation commands
• Since kernel 3.0
# bridge fdb add <mac address> dev <port> master temp
# bridge fdb del <mac address> dev <port> master
MAC address Dst
kernel
specified mac port
... bridge
specified port eth0 eth1
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 18
FDB manipulation
# bridge fdb add <mac address> dev <port> master temp
• What's "temp"?
• There are 3 types of FDB entries
• permanent (local)
• static
• others (dynamically learned by packet arrival)
• "temp" means static here
• "bridge fdb"'s default is kernel
permanent
br0 if match
• permanent here means
"deliver to bridge device" bridge permanent
(e.g. br0) (br0)
• permanent doesn't deliver
to specified port eth0 eth1
specified port
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 19
FDB manipulation
# bridge fdb add <mac address> dev <port> master temp
• What's "master"?
• Remember this command?
# ip link set <port> master <bridge> ... attach port
• "bridge fdb"'s default is "self"
• It adds entry to specified port (eth0) itself!
kernel
master bridge
specified port eth0 eth1
(self)
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 20
FDB manipulation
• When to use "self"?
• Unicast /multicast filtering
• Use case: SR-IOV embedded SW
• VTEP-Mac mapping table (vxlan)
master bridge
PF VF VF
self eth0 eth0_0 eth0_1
embedded switch
kernel SR-IOV supported NIC
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 21
FDB manipulation
• Example: Intel 82599 (ixgbe)
• Some people think of using both bridge and SR-IOV due
to limitation of VFs
• bridge puts eth0 (PF) into promiscuous, but...
• Unknown MAC address from VF goes to wire, not to PF
qemu qemu
Guest 1 Guest 2
MAC A MAC C
eth1 eth0_0
VF
tap bridge
Dst. A
PF
eth0 MAC B
embedded switch
kernel Intel 82599 (ixgbe)
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 22
FDB manipulation
• Example: Intel 82599 (ixgbe)
• Type "bridge fdb add A dev eth0" on host
• Traffic to A will be forwarded to bridge
qemu qemu
Guest 1 Guest 2
MAC A MAC C
eth1 eth0_0
VF
tap bridge
Dst. A
PF
add fdb entry eth0 MAC B
embedded switch
kernel Intel 82599 (ixgbe)
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 23
VLAN filtering
• 802.1Q Bridge
• Since kernel 3.9
• Filter packets according to vlan tag
• Forward packets according to vlan tag as well as mac
address
• Insert / strip vlan tag
kernel
FDB
insert / strip vlan tag
MAC address Vlan Dst
aa:bb:cc:dd:ee:ff 10 eth0 bridge
... filter disallowed vlan
eth0 eth1
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 24
VLAN filtering
• Ingress / egress filtering policy
• Incoming / outgoing packet is filtered if matching
filtering policy
• Per-port per-vlan policy
• Default is "disallow all vlans"
• Since kernel 3.18, vid 1 is allowed by default
• All packets are dropped except for untagged or vid 1
Filtering table kernel
Port Allowed
bridge
Vlans
eth0 10 filter by vlan filter by vlan
20 at ingress at egress
eth1 20 allow 10 disallow 10
eth0 eth1
30
VID 10
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 25
VLAN filtering
• PVID (Port VID)
• Untagged (and VID 0) packet is assigned this VID
• Per-port configuration
• Default PVID is 1 (Since kernel 3.18)
• Egress policy untagged
• Outgoing packet that matches this policy get untagged
• Per-port per-vlan policy
kernel
Filtering table
bridge
Port Allowed PVID Egress
Vlans Untag
apply pvid apply untagged
eth0 10 ✔
(insert vid 20) (strip tag 20)
20 ✔ ✔
eth1 20 ✔ ✔ eth0 eth1
30
untagged
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
packet Copyright © 2015 NTT Corp. All Rights Reserved. 26
VLAN filtering
• Commands
• Enable VLAN filtering (disabled by default)
# echo 1 > /sys/class/net/<bridge>/bridge/vlan_filtering
• Add / delete allowed vlan
# bridge vlan add vid <vid> dev <port>
# bridge vlan del vid <vid> dev <port>
• Set pvid / untagged
# bridge vlan add vid <vid> dev <port> [pvid] [untagged]
• Dump settings
# bridge vlan show
• Note: bridge device needs "self "
# bridge vlan add vid <vid> dev br0 self
# bridge vlan del vid <vid> dev br0 self
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 27
VLAN with KVM
• Traditional configuration
• Use vlan devices qemu qemu
• Needs bridges per vlan Guest Guest
• Low flexibility
• How many devices? eth0 eth0
# ifconfig -s
Iface ...
eth0 tap0 tap1
eth0.10
br10
br10 br20
eth0.20
br20
eth0.30 eth0.10 eth0.20
br30
eth0.40 eth0
br40 kernel
... Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 28
VLAN with KVM
• With VLAN filtering
• Simple qemu qemu
• Flexible Guest Guest
• Only one bridge
# ifconfig -s eth0 eth0
Iface ...
eth0
br0
tap0 tap1
pvid/untag pvid/untag
vlan 10 vlan 20
br0
vlan10 / 20
eth0
kernel
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 29
VLAN with KVM
• Other switches
• Open vSwitch
• Can also handle VLANs
# ovs-vsctl set Port <port> tag=<vid>
• NIC embedded switch
• Some of them support VLAN (e.g. Intel 82599)
# ip link set <PF> vf <VF_num> vlan <vid>
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 30
Learning / flooding control
• Limit mac addresses guest qemu qemu
can use
• Reduce FDB size Guest Guest
• Used with static FDB
entries eth0 eth0
("bridge fdb" command)
• Disable FDB learning on
particular port tap0 tap1
• Since kernel 3.11 no learning no learning
• No dynamic FDB entry
no flooding no flooding
• Don't flood unknown bridge
mac to specified port learning
• Since kernel 3.11 flooding
• Control packet delivery to
guests kernel eth0
• Commands
# bridge link set dev <port> learning off
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
# bridge link set dev <port> flood off
Copyright © 2015 NTT Corp. All Rights Reserved. 31
Non-promiscuous bridge
• Since kernel 3.16 qemu qemu
Guest Guest
• If there is only one
learning /flooding port, eth0 eth0
it can be non-promisc
• Instead of promisc
mode, unicast filtering is tap0 tap1
set for static FDB entries no learning no learning
no flooding no flooding
• Automatically enabled if bridge
meeting some conditions learning
• There is one or zero non-promisc
flooding
learning or flooding port
kernel eth0
• bridge itself is not
promiscuous mode
• VLAN filtering is enabled
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 32
802.1ad (Q-in-Q) support for bridge
• Since kernel 3.16
• 802.1ad allows stacked vlan tags
MAC .1ad tag .1Q tag payload
• Outer 802.1ad tag can be used to separate
customers
• Example: Guest A, B -> Customer X
Guest C, D -> Customer Y
• Inner 802.1Q tag can be used inside customers
• Customer X and Y can use any 802.1Q tags
• Command
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
# echo 0x88a8 > /sys/class/net/<bridge>/bridge/vlan_protocol
Copyright © 2015 NTT Corp. All Rights Reserved. 33
802.1ad (Q-in-Q) support for bridge
qemu
• Bridge preserves qemu
guest .1Q tag (vid Guest A
Guest C
eth0.30
30) when inserting
.1ad tag (vid 10) eth0 eth0
.1Q VID 30
• .1ad tag will be tap0 tap1
stripped at .1ad VID 10 pvid/untag pvid/untag
another end .1Q VID 30 vlan 10 vlan 20
point of .1ad bridge (.1ad mode)
network vlan10 / 20
eth0
kernel .1ad VID 10
.1Q VID 30
.1Q VID 30
Customer's Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
.1ad network
another site Copyright © 2015 NTT Corp. All Rights Reserved. 34
Demo
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 35
Non-promiscuous bridge
• Let's setup non- qemu qemu
promiscuous KVM Guest Guest
environment!
eth0 eth0
• Steps
• Create bridge vnet0 vnet1
• Enable vlan filtering no learning no learning
• Attach guests (by libvirt) no flooding no flooding
bridge
• Add FDB entries
learning
• Set port attributes non-promisc
flooding
(learning /flooding)
kernel eth0
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 36
Non-promiscuous bridge setup
• Commands
• Create bridge
# ip link add br0 up type bridge
# ip link set eth0 master br0
• Enable vlan filtering
# echo 1 > /sys/class/net/br0/bridge/vlan_filtering
• Attach guests
# virsh start guest1
# virsh start guest2
• Add FDB entries ("append" overwrites if exists)
# bridge fdb append 52:54:00:xx:xx:xx dev vnet0 master temp
# bridge fdb append 52:54:00:yy:yy:yy dev vnet1 master temp
• Set port attributes
# bridge link set dev vnet0 learning off flood off
# bridge link setProceedings
devof netdev
vnet1 learning
0.1, Feb 14-17, 2015, Ottawa, On, Canadaoff flood off
Copyright © 2015 NTT Corp. All Rights Reserved. 37
Non-promiscuous bridge via libvirt xml
• libvirt (>= 1.2.11 with kernel >= 3.17) can
automatically handle these settings
• Network XML
# virsh net-edit <network>
...
<bridge name="br0" macTableManager="libvirt"/>
...
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 38
Some more useful commands...
• Filter FDB dump per bridge /port (Since 3.17)
• Filter per bridge
# bridge fdb show br <bridge>
• Filter per port
# bridge fdb show brport <port>
• VLAN range (Coming soon... 3.20?)
• Add vlans
# bridge vlan add vid <vid_begin>-<vid_end> dev <port>
• Show vlans in compressed format
# bridge -c vlan show
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 39
Summary
• Linux has several types of switches
• bridge, macvlan (macvtap), Open vSwitch
• SR-IOV NIC enbedded switch can also be used
• Bridge's recent features
• FDB manipulation
• VLAN filtering
• Learning / Flooding control
• Non-promiscuous bridge
• 802.1ad (Q-in-Q) support
Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada
Copyright © 2015 NTT Corp. All Rights Reserved. 40