|
| 1 | +# STUN and NAT |
| 2 | + |
| 3 | +> [Session Traversal Utilities for NAT (STUN)](https://www.rfc-editor.org/rfc/rfc8489.html) |
| 4 | +> is a protocol used to assist applications in establishing peer-to-peer |
| 5 | +> communications across Network Address Translations (NATs) or firewalls. |
| 6 | +> |
| 7 | +> [Network Address Translation (NAT)](https://en.wikipedia.org/wiki/Network_address_translation) |
| 8 | +> is commonly used in private networks to allow multiple devices to share a |
| 9 | +> single public IP address. The vast majority of home and corporate internet |
| 10 | +> connections use at least one level of NAT. |
| 11 | +
|
| 12 | +## Overview |
| 13 | + |
| 14 | +In order for one application to connect to another across a network, the |
| 15 | +connecting application needs to know the IP address and port under which the |
| 16 | +target application is reachable. If both applications reside on the same |
| 17 | +network, then they can most likely connect directly to each other. In the |
| 18 | +context of a Coder workspace agent and client, this is generally not the case, |
| 19 | +as both agent and client will most likely be running in different _private_ |
| 20 | +networks (e.g. `192.168.1.0/24`). In this case, at least one of the two will |
| 21 | +need to know an IP address and port under which they can reach their |
| 22 | +counterpart. |
| 23 | + |
| 24 | +This problem is often referred to as NAT traversal, and Coder uses a standard |
| 25 | +protocol named STUN to address this. |
| 26 | + |
| 27 | +Inside of that network, packets from the agent or client will show up as having |
| 28 | +source address `192.168.1.X:12345`. However, outside of this private network, |
| 29 | +the source address will show up differently (for example, `12.3.4.56:54321`). In |
| 30 | +order for the Coder client and agent to establish a direct connection with each |
| 31 | +other, one of them needs to know the `ip:port` pair under which their |
| 32 | +counterpart can be reached. Once communication succeeds in one direction, we can |
| 33 | +inspect the source address of the received packet to determine the return |
| 34 | +address. |
| 35 | + |
| 36 | +At a high level, STUN works like this: |
| 37 | + |
| 38 | +> The below glosses over a lot of the complexity of traversing NATs. For a more |
| 39 | +> in-depth technical explanation, see |
| 40 | +> [How NAT traversal works (tailscale.com)](https://tailscale.com/blog/how-nat-traversal-works). |
| 41 | +
|
| 42 | +- **Discovery:** Both the client and agent will send UDP traffic to one or more |
| 43 | + configured STUN servers. These STUN servers are generally located on the |
| 44 | + public internet, and respond with the public IP address and port from which |
| 45 | + the request came. |
| 46 | +- **Coordination:** The client and agent then exchange this information through |
| 47 | + the Coder server. They will then construct packets that should be able to |
| 48 | + successfully traverse their counterpart's NATs successfully. |
| 49 | +- **NAT Traversal:** The client and agent then send these crafted packets to |
| 50 | + their counterpart's public addresses. If all goes well, the NATs on the other |
| 51 | + end should route these packets to the correct internal address. |
| 52 | +- **Connection:** Once the packets reach the other side, they send a response |
| 53 | + back to the source `ip:port` from the packet. Again, the NATs should recognize |
| 54 | + these responses as belonging to an ongoing communication, and forward them |
| 55 | + accordingly. |
| 56 | + |
| 57 | +At this point, both the client and agent should be able to send traffic directly |
| 58 | +to each other. |
| 59 | + |
| 60 | +## Examples |
| 61 | + |
| 62 | +In this example, both the client and agent are located on the network |
| 63 | +`192.168.21.0/24`. Assuming no firewalls are blocking packets in either |
| 64 | +direction, both client and agent are able to communicate directly with each |
| 65 | +other's locally assigned IP address. |
| 66 | + |
| 67 | +### 1. Direct connections without NAT or STUN |
| 68 | + |
| 69 | +```mermaid |
| 70 | +flowchart LR |
| 71 | + subgraph corpnet["Private Network\ne.g. Corp. LAN"] |
| 72 | + A[Client Workstation\n192.168.21.47:38297] |
| 73 | + C[Workspace Agent\n192.168.21.147:41563] |
| 74 | + A <--> C |
| 75 | + end |
| 76 | +``` |
| 77 | + |
| 78 | +### 2. Direct connections with one layer of NAT |
| 79 | + |
| 80 | +In this example, client and agent are located on different networks and connect |
| 81 | +to each other over the public Internet. Both client and agent connect to a |
| 82 | +configured STUN server located on the public Internet to determine the public IP |
| 83 | +address and port on which they can be reached. |
| 84 | + |
| 85 | +```mermaid |
| 86 | +flowchart LR |
| 87 | + subgraph homenet["Network A"] |
| 88 | + client["Client workstation\n192.168.1.101:38297"] |
| 89 | + homenat["NAT\n??.??.??.??:?????"] |
| 90 | + end |
| 91 | + subgraph internet["Public Internet"] |
| 92 | + stun1["STUN server"] |
| 93 | + end |
| 94 | + subgraph corpnet["Network B"] |
| 95 | + agent["Workspace agent\n10.21.43.241:56812"] |
| 96 | + corpnat["NAT\n??.??.??.??:?????"] |
| 97 | + end |
| 98 | +
|
| 99 | + client --- homenat |
| 100 | + agent --- corpnat |
| 101 | + corpnat -- "[I see 12.34.56.7:41563]" --> stun1 |
| 102 | + homenat -- "[I see 65.4.3.21:29187]" --> stun1 |
| 103 | +``` |
| 104 | + |
| 105 | +They then exchange this information through Coder server, and can then |
| 106 | +communicate directly with each other through their respective NATs. |
| 107 | + |
| 108 | +```mermaid |
| 109 | +flowchart LR |
| 110 | + subgraph homenet["Network A"] |
| 111 | + client["Client workstation\n192.168.1.101:38297"] |
| 112 | + homenat["NAT\n65.4.3.21:29187"] |
| 113 | + end |
| 114 | + subgraph corpnet["Network B"] |
| 115 | + agent["Workspace agent\n10.21.43.241:56812"] |
| 116 | + corpnat["NAT\n12.34.56.7:41563"] |
| 117 | + end |
| 118 | +
|
| 119 | + subgraph internet["Public Internet"] |
| 120 | + end |
| 121 | +
|
| 122 | + client -- "[12.34.56.7:41563]" --- homenat |
| 123 | + agent -- "[10.21.43.241:56812]" --- corpnat |
| 124 | + corpnat -- "[65.4.3.21:29187]" --> internet |
| 125 | + homenat -- "[12.34.56.7:41563]" --> internet |
| 126 | +
|
| 127 | +``` |
| 128 | + |
| 129 | +### 3. Direct connections with VPN and NAT hairpinning |
| 130 | + |
| 131 | +In this example, the client workstation must use a VPN to connect to the |
| 132 | +corporate network. All traffic from the client will enter through the VPN entry |
| 133 | +node and exit at the VPN exit node inside the corporate network. Traffic from |
| 134 | +the client inside the corporate network will appear to be coming from the IP |
| 135 | +address of the VPN exit node `172.16.1.2`. Traffic from the client to the public |
| 136 | +internet will appear to have the public IP address of the corporate router |
| 137 | +`12.34.56.7`. |
| 138 | + |
| 139 | +The workspace agent is running on a Kubernetes cluster inside the corporate |
| 140 | +network, which is behind its own layer of NAT. To anyone inside the corporate |
| 141 | +network but outside the cluster network, its traffic will appear to be coming |
| 142 | +from `172.16.1.254`. However, traffic from the agent to services on the public |
| 143 | +Internet will also see traffic originating from the public IP address assigned |
| 144 | +to the corporate router. Additionally, the corporate router will most likely |
| 145 | +have a firewall configured to block traffic from the internet to the corporate |
| 146 | +network. |
| 147 | + |
| 148 | +If the client and agent both use the public STUN server, the addresses |
| 149 | +discovered by STUN will both be the public IP address of the corporate router. |
| 150 | +To correctly route the traffic backwards, the corporate router must correctly |
| 151 | +route both: |
| 152 | + |
| 153 | +- Traffic sent from the client to the external IP of the corporate router back |
| 154 | + to the cluster router, and |
| 155 | +- Traffic sent from the agent to the external IP of the corporate router to the |
| 156 | + VPN exit node. |
| 157 | + |
| 158 | +This behaviour is known as "hairpinning", and may not be supported in all |
| 159 | +network configurations. |
| 160 | + |
| 161 | +If hairpinning is not supported, deploying an internal STUN server can aid |
| 162 | +establishing direct connections between client and agent. When the agent and |
| 163 | +client query this internal STUN server, they will be able to determine the |
| 164 | +addresses on the corporate network from which their traffic appears to |
| 165 | +originate. Using these internal addresses is much more likely to result in a |
| 166 | +successful direct connection. |
| 167 | + |
| 168 | +```mermaid |
| 169 | +flowchart TD |
| 170 | + subgraph homenet["Home Network"] |
| 171 | + client["Client workstation\n192.168.1.101"] |
| 172 | + homenat["Home Router/NAT\n65.4.3.21"] |
| 173 | + end |
| 174 | +
|
| 175 | + subgraph internet["Public Internet"] |
| 176 | + stun1["Public STUN"] |
| 177 | + vpn1["VPN entry node"] |
| 178 | + end |
| 179 | +
|
| 180 | + subgraph corpnet["Corp Network 172.16.1.0/24"] |
| 181 | + corpnat["Corp Router/NAT\n172.16.1.1\n12.34.56.7"] |
| 182 | + vpn2["VPN exit node\n172.16.1.2"] |
| 183 | + stun2["Private STUN"] |
| 184 | +
|
| 185 | + subgraph cluster["Cluster Network 10.11.12.0/16"] |
| 186 | + clusternat["Cluster Router/NAT\n10.11.12.1\n172.16.1.254"] |
| 187 | + agent["Workspace agent\n10.11.12.34"] |
| 188 | + end |
| 189 | + end |
| 190 | +
|
| 191 | + vpn1 === vpn2 |
| 192 | + vpn2 --> stun2 |
| 193 | + client === homenat |
| 194 | + homenat === vpn1 |
| 195 | + homenat x-.-x stun1 |
| 196 | + agent --- clusternat |
| 197 | + clusternat --- corpnat |
| 198 | + corpnat --> stun1 |
| 199 | + corpnat --> stun2 |
| 200 | +``` |
0 commit comments