Kubernetes Networking
Seattle Kubernetes Meetup
CJ Cullen <cjcullen@google.com>
Software Engineer
@cj_cullen
github.com/cjcullen
Docker Networking
Docker networking
docker start ...
Docker networking
docker start ...
Docker networking
docker0 172.16.1.0/24
Docker networking
docker run ...
docker0 172.16.1.0/24
Docker networking
docker0 172.16.1.0/24
Docker networking
172.16.1.1
eth0
vethAQ2IT
docker0 172.16.1.0/24
Docker networking
172.16.1.1
eth0
docker run ...
vethAQ2IT
docker0 172.16.1.0/24
Docker networking
172.16.1.1 172.16.1.2
eth0 eth0
vethAQ2IT vethS1LUI
docker0 172.16.1.0/24
Docker networking
172.16.1.1
172.16.1.1
172.16.1.2
172.16.1.1
Docker networking
NAT
172.16.1.1
172.16.1.1
NAT
NAT
NAT
NAT
172.16.1.2
172.16.1.1
Host ports
A: 172.16.1.1
3306 C: 172.16.1.1
9376 SNAT
8000
80
SNAT 11878
B: 172.16.1.2
Host ports
T E D
C
A: 172.16.1.1
E JE
3306 C: 172.16.1.1
R
9376 SNAT
8000
80
SNAT 11878
B: 172.16.1.2
Kubernetes Networking
Kubernetes networking
IPs are routable
• vs docker default private IP
Pods can reach each other without NAT
• even across nodes
No brokering of port numbers
• too complex, why bother?
This is a fundamental requirement
• can be L3 routed
• can be underlayed (cloud)
• can be overlayed (SDN)
Kubernetes networking
10.1.2.1
10.1.1.1
10.1.2.0/24
10.1.1.2
10.1.3.1
10.1.1.0/24 10.1.3.0/24
Kubernetes networking
10.1.2.1
10.1.1.1
10.1.2.0/24
10.1.1.2 ? 10.1.3.1
10.1.1.0/24 10.1.3.0/24
Kubernetes networking
On GCE/GKE
• GCE Advanced Routes (program the fabric)
• “Everything to 10.1.1.0/24, send to this VM”
Plenty of other ways
• AWS: Route Tables
• Weave
• Calico
• Flannel
• OVS
• OpenContrail
• Cisco Contiv
• Others...
Kubernetes networking
On GCE/GKE
• GCE Advanced Routes (program the fabric)
• “Everything to 10.1.1.0/24, send to this VM”
Plenty of other ways
• AWS: Route Tables
• Weave
• Calico
• Flannel
• OVS
• OpenContrail
• Cisco Contiv
• Others...
Kubernetes networking
On GCE/GKE
• GCE Advanced Routes (program the fabric)
• “Everything to 10.1.1.0/24, send to this VM”
Plenty of other ways
• AWS: Route Tables
• Weave
• Calico
• Flannel
• OVS
• OpenContrail
• Cisco Contiv
• Others...
Pods
Pods
Content
Consumers
Manager
Small group of containers & volumes
Tightly coupled
The atom of scheduling & placement
Shared namespace File Web
Puller Server
• share IP address & localhost
• share IPC, etc.
Managed lifecycle
• bound to a node, restart in place Volume
• can die, cannot be reborn with same ID
Example: data puller & web server Pod
Pods
Small group of containers & volumes
Tightly coupled
The atom of scheduling & placement
Shared namespace
• share IP address & localhost 10.1.1.2
• share IPC, etc.
Managed lifecycle
• bound to a node, restart in place
• can die, cannot be reborn with same ID
Example: data puller & web server
Pods
Small group of containers & volumes
Tightly coupled
The atom of scheduling & placement
c1 c2
Shared namespace
--net=container:infra --net=container:infra
• share IP address & localhost --ipc=container:infra --ipc=container:infra
• share IPC, etc.
Managed lifecycle infra
• bound to a node, restart in place
• can die, cannot be reborn with same ID 10.1.1.2
Example: data puller & web server
Services
Services
A group of pods that work together Client
• grouped by a selector
Defines access policy
• “load balanced” or “headless”
Virtual IP
Gets a stable virtual IP and port
• sometimes called the service portal
• also a DNS name
VIP is managed by kube-proxy
• watches all services
• updates iptables when backends change
Hides complexity - ideal for non-native apps
kube-proxy
kube-proxy (legacy)
Node X
kube-proxy apiserver
iptables
kube-proxy (legacy) services &
endpoints
Node X
kube-proxy watch apiserver
iptables
kube-proxy (legacy) kubectl run ...
Node X
kube-proxy watch apiserver
iptables
kube-proxy (legacy)
Node X
kube-proxy watch apiserver
schedule
iptables
kube-proxy (legacy) kubectl expose ...
Node X
kube-proxy watch apiserver
iptables
kube-proxy (legacy) new
service!
Node X
kube-proxy update apiserver
iptables
kube-proxy (legacy)
Node X
kube-proxy watch apiserver
listen
iptables
kube-proxy (legacy)
Node X
kube-proxy watch apiserver
listen
iptables
kube-proxy (legacy)
Node X
kube-proxy watch apiserver
configure
iptables
kube-proxy (legacy)
Node X
kube-proxy watch apiserver
VIP
iptables
kube-proxy (legacy) new
endpoints!
Node X
kube-proxy update apiserver
VIP
iptables
kube-proxy (legacy)
Node X
kube-proxy watch apiserver
VIP
iptables
kube-proxy (legacy)
Node X
kube-proxy watch apiserver
Client
VIP
iptables
kube-proxy (legacy)
Node X
kube-proxy watch apiserver
Client
VIP
iptables
kube-proxy (legacy)
Node X
kube-proxy watch apiserver
Client
VIP
iptables
kube-proxy (legacy)
Node X
kube-proxy watch apiserver
Client
VIP
iptables
kube-proxy (legacy)
Userspace proxy isn’t ideal
Burns CPU copying bytes
• “Proxy” is just parallel copy loops.
Loses source IP
• Everything looks like it’s from the node IP.
Userspace TCP listening = higher latency
iptables kube-proxy
iptables kube-proxy
Node X
kube-proxy apiserver
iptables
iptables kube-proxy services &
endpoints
Node X
kube-proxy watch apiserver
iptables
iptables kube-proxy kubectl run ...
Node X
kube-proxy watch apiserver
iptables
iptables kube-proxy
Node X
kube-proxy watch apiserver
schedule
iptables
iptables kube-proxy kubectl expose ...
Node X
kube-proxy watch apiserver
iptables
iptables kube-proxy new
service!
Node X
kube-proxy update apiserver
iptables
iptables kube-proxy
Node X
kube-proxy watch apiserver
configure
iptables
iptables kube-proxy
Node X
kube-proxy watch apiserver
VIP
iptables
iptables kube-proxy new
endpoints!
Node X
kube-proxy update apiserver
VIP
iptables
iptables kube-proxy
Node X
kube-proxy watch apiserver
configure
VIP
iptables
iptables kube-proxy
Node X
kube-proxy watch apiserver
VIP
iptables
iptables kube-proxy
Node X
kube-proxy watch apiserver
Client
VIP
iptables
iptables kube-proxy
Node X
kube-proxy watch apiserver
Client
VIP
iptables
iptables kube-proxy
Node X
kube-proxy watch apiserver
Client
VIP
iptables
iptables kube-proxy
Node X
kube-proxy watch apiserver
Client
VIP
iptables
iptables kube-proxy Mean Latency
contrib/for-tests/netperf-tester --number=1000
iptables
kube-proxy
legacy
kube-proxy
Mean Latency Microseconds
Services
Services are just an abstraction
• Only requirement: route (and maybe load
balance) a virtual IP to a set of backends.
Kube-proxy is an implementation
• Kube-proxy watches apiserver.
• iptables is re-configured on changes.
There could be other ways
• Userspace, iptables, IP Virtual Servers?
DNS
Run SkyDNS as a pod in the cluster
• kube2sky bridges Kubernetes API -> SkyDNS
• Tell kubelets about it (static service IP) kubernetes
kubernetes.default
Strictly optional, but practically required
• LOTS of things depend on it
kubernetes.default.svc.cluster.local
• Probably will become more integrated
foo.my-namespace.svc.cluster.local
Or plug in your own!
DNS
Run SkyDNS as a pod in the cluster
• kube2sky bridges Kubernetes API -> SkyDNS
• Tell kubelets about it (static service IP)
Strictly optional, but practically required
• LOTS of things depend on it
• Probably will become more integrated
kube-dns-qxin
skyDNS kube2sky
Or plug in your own! watch apiserver
etcd
DNS
Run SkyDNS as a pod in the cluster
• kube2sky bridges Kubernetes API -> SkyDNS
• Tell kubelets about it (static service IP) /etc/resolv.conf
nameserver 10.0.0.10
Strictly optional, but practically required ...
• LOTS of things depend on it
• Probably will become more integrated
kube-dns-qxin
skyDNS kube2sky
Or plug in your own! watch apiserver
etcd
DNS
Run SkyDNS as a pod in the cluster
• kube2sky bridges Kubernetes API -> SkyDNS
• Tell kubelets about it (static service IP) /etc/resolv.conf
nameserver 10.0.0.10
Strictly optional, but practically required ...
• LOTS of things depend on it
• Probably will become more integrated
kube-dns-qxin
skyDNS kube2sky
Or plug in your own! 10.0.0.10 watch apiserver
etcd
Putting it Together
What happens when I...
$ curl foo.my-namespace
Client
Putting it Together
What happens when I...
$ curl foo.my-namespace /etc/resolv.conf
nameserver 10.0.0.10
...
10.1.0.1
Client
Putting it Together
What happens when I...
$ curl foo.my-namespace
foo.my-namespace? kube-dns-qxin
10.1.0.1 skyDNS kube2sky
Client 10.0.0.10
etcd
Putting it Together
What happens when I...
$ curl foo.my-namespace
10.0.123.45 kube-dns-qxin
10.1.0.1 skyDNS kube2sky
Client 10.0.0.10
etcd
Putting it Together
What happens when I...
$ curl foo.my-namespace
10.1.0.1
Client 10.0.123.45
Putting it Together
What happens when I...
$ curl foo.my-namespace
10.1.0.1
Client 10.0.123.45 VIP
Putting it Together
What happens when I...
$ curl foo.my-namespace
10.1.0.1
Client 10.0.123.45 VIP
iptables
Putting it Together
10.1.0.6 10.1.3.1 10.1.6.3
What happens when I...
$ curl foo.my-namespace
10.1.0.1
Client 10.0.123.45 VIP
iptables
Putting it Together
10.1.0.6 10.1.3.1 10.1.6.3
What happens when I...
$ curl foo.my-namespace
3.1
.1.
10
10.1.0.1
Client 10.0.123.45 VIP
iptables
Putting it Together
10.1.0.6 10.1.3.1 10.1.6.3
What happens when I...
$ curl foo.my-namespace
3.1
10.1.3.0/24 -> Node X
.1.
10
10.1.0.1
Client 10.0.123.45 VIP
iptables
Putting it Together
10.1.0.6 10.1.3.1 10.1.6.3
What happens when I...
$ curl foo.my-namespace
3.1
.1.
10
10.1.0.1
Client 10.0.123.45 VIP
iptables
Putting it Together Hello World!
10.1.0.6 10.1.3.1 10.1.6.3
What happens when I...
$ curl foo.my-namespace
3.1
.1.
10
10.1.0.1
Client 10.0.123.45 VIP
iptables
Putting it Together Hello World!
10.1.0.6 10.1.3.1 10.1.6.3
What happens when I...
$ curl foo.my-namespace
0.1
.1.
10
10.1.0.1
Client
iptables
Putting it Together Hello World!
10.1.0.6 10.1.3.1 10.1.6.3
What happens when I...
$ curl foo.my-namespace
0.1
.1.
10
10.1.0.1
Client
10.1.0.0/24 -> Node Y
iptables
Putting it Together Hello World!
10.1.0.6 10.1.3.1 10.1.6.3
What happens when I...
$ curl foo.my-namespace
0.1
.1.
10
10.1.0.1
Client
10.1.0.0/24 -> Node Y
iptables
Putting it Together Hello World!
10.1.0.6 10.1.3.1 10.1.6.3
What happens when I...
$ curl foo.my-namespace
0.1
.1.
10
Hello World!
10.1.0.1
Client
10.1.0.0/24 -> Node Y
iptables
What about external?
External Services
Services IPs are only available inside the
cluster
Need to receive traffic from “the outside
world”
Builtin: Service “type”
• nodePort: expose on a port on every node
• loadBalancer: provision a cloud load-balancer
DiY load-balancer solutions
• socat (for nodePort remapping)
• haproxy
• nginx
The Bleeding Edge
Ingress (L7) Client
Services are assumed L3/L4
Lots of apps want HTTP/HTTPS
Ingress maps incoming traffic to backend URL Map
services
• by HTTP host headers
• by HTTP URL paths
HAProxy and GCE implementations
No SSL yet
Status: BETA in Kubernetes v1.1
Ingress (L7) Client
Services are assumed L3/L4
api.company.com
Lots of apps want HTTP/HTTPS
Ingress maps incoming traffic to backend URL Map
services
• by HTTP host headers
• by HTTP URL paths api.company.com/foo api.company.com/bar
HAProxy and GCE implementations
No SSL yet othercompany.com/*
Status: BETA in Kubernetes v1.1
Network Plugins
Network Plugins
Introduced in Kubernetes v1.0
• VERY experimental Plugin
Uses CNI (CoreOS) in v1.1
• Simple exec interface
net
• Not using Docker libnetwork Plugin
• but can defer to Docker for networking
Cluster admins can customize their installs Plugin
• DHCP, MACVLAN, Flannel, custom
Kubernetes is Open
- open community
- open design
- open source
- open to ideas
Networking is Hard
- help guide us!
http://kubernetes.io
https://github.com/kubernetes/kubernetes
slack: kubernetes twitter: @kubernetesio