09 Networking
09 Networking
09 Networking
Stream Datagram
socket socket
IP
Receive queue
Kernel
Network interface
Receiving a packet in BSD
Application Application
Stream Datagram
socket socket
IP
1. Interrupt
Receive queue 1.1 Allocate mbuf
1.2 Enqueue packet
1.3 Post s/w
Kernel
interrupt
Network interface
Receiving a packet in BSD
Application Application
Stream Datagram
socket socket
2. S/W Interrupt
TCP UDP ICMP High priority
An process context
Defragmentation
IP TCP processing
Enqueue on socket
Receive queue
Kernel
Network interface
Receiving a packet in BSD
Application Application
3. Application
Stream Datagram Copy mbuf to user space
socket socket Application process
context
IP
Receive queue
Kernel
Network interface
Sending a packet in BSD
Application Application
Stream Datagram
socket socket
IP
Receive queue
Kernel
Network interface
Sending a packet in BSD
Application Application
1. Application
Stream Datagram
Copy from user space to mbuf
socket socket
Call TCP code and process
Possible enqueue on socket queue
IP
Receive queue
Kernel
Network interface
Sending a packet in BSD
Application Application
Stream Datagram
socket socket
Receive queue
Kernel
Network interface
Sending a packet in BSD
Application Application
Stream Datagram
socket socket
IP
Receive queue
3. Interrupt
Send packet
Kernel Free mbuf
Network interface
A note on terminology
• In Unix (and most systems):
– Top half: called from user space (syscalls, etc.)
• Per‐process stack, synchronous traps, etc.
– Bottom half: hardware and software interrupts
• Dedicated stack, asynchronous w.r.t. top half, etc.
A note on terminology
• In Unix (and most systems):
– Top half: called from user space (syscalls, etc.)
• Per‐process stack, synchronous traps, etc.
– Bottom half: hardware and software interrupts
• Dedicated stack, asynchronous w.r.t. top half, etc.
• In Linux
– Top half: hardware interrupts
– Bottom half: software interrupts
Multiplexing
• Fit traffic streams on a single channel
• Occurs at most levels of IP stack
• Protocols specify with headers/encapsulation
• Operating Systems also have to schedule messages
– Which waiting packet do I send now?
• Examples:
– IP, IPX, AppleTalk, etc. over Ethernet
– TCP, UDP, ICMP, etc. over IP
– Multiple TCP connections (ports)
– ...
Demultiplexing
• Where to direct a packet from a lower layer
– Lowest level: (physical!) network interface
– Highest level: socket (for Unix/Windows...)
• Traditional approach: demux at multiple layers
– IP, TCP, etc.
– Extreme case: run each layer in own thread/process
Sending a packet in BSD
Application Application
Stream Datagram
socket socket
= Multiplexing point
IP
Receive queue
Kernel
Network interface
Layered vs. Early Multiplexing
[Tennenhouse, 1989]
Stream Datagram
socket socket
IP
Interface
queue
Kernel
Network interface
LRP in practice
Application Application
Stream Datagram
socket socket
IP
Interface Multiple
queue receive
Kernel queues
Network interface
LRP in practice
Application Application
IP
Interface Multiple
queue receive
Kernel queues
Network interface
LRP in practice
Application Application
Stream Datagram
socket socket
IP NIC demuxes
• Once (to a
Interface socket)
queue
• Early (before
Kernel
processing)
Network interface
LRP in practice
Application Application
Stream Datagram
socket socket
UDP Processing
• Lazy (when
application
TCP UDP ICMP
requests it)
IP
Interface
queue
Kernel
Network interface
LRP in practice
Application Application
Stream Datagram
socket socket
TCP receive
• Lazy (when application
requests it)
TCP UDP ICMP • Accounted to dest. Process
• At dest. priority
IP
Interface
queue
Kernel
Network interface
How to demux early?
• [G&B] used ATM: demux on Virtual Circuit Identifiers
– Most ATMcards had per‐VC queues, inc. the SBA‐200
– Requires one TCP connection per circuit
– Also a good idea, but a story for another time...
• Ethernet cards (mostly) can’t do this...
– “SOFT‐LRP”: demux in interrupt handler
– Adds latency, which is hard to account for
• See later ...
LRP Isolation
LRP stability
Livelock
• Graph shows livelock in action
– As load increases, throughput decreases
– Interrupts and processing overwhelm system
• Early demux alone is insufficient to prevent livelock
– But helps somewhat under heavy load
• SOFT‐LRP delays the onset
• NI‐LRP (hardware) almost eliminates it
Eliminating livelock
[Mogul and Ramakrishnan, 1997]
• Avoid livelock:
– Only use interrupts to initiate polling, then disable
– Round‐robin polling for input event sources
– Schedule packet processing properly
– Drop packets early
• Maintain performance:
– Re‐enable interrupts when no work pending to keep
latency low
– Buffer bursts in the receiving interface
– Eliminate the IP input queue
Results
Software demux: basic idea
• Packet filters: old idea for inspecting the network
– Each filter has an associated socket.
– When a packet arrives, every filter is run on the packet
– If the filter “passes” the packet, it’s delivered to the socket
• Not quite the same as a demultiplexer ...
Software demux: basic idea
• Installation:
– Write program in simple, high‐level language
– Compile to simple byte code (no jumps, etc.)
– Hand to the kernel
– Kernel installs
• Issues:
– Slower than hard‐coded demux (e.g. Linux main stack)
– Or is it? c.f. DPF: efficient compilation of multiple filters
Hardware demux: Arsenic
[Pratt and Fraser, 2001]
Standard
kernel
connection
IP
registry OS
Connection Protocol
creation
and
stack
management
Device
manager
Linux
Kernel
Device …
management
interface Network interface
Direct access library