Skip to content

Commit d1a522a

Browse files
johnstcndeansheatherspikecurtis
authored
chore(docs): add requirements re ports and stun server to docs (coder#12026)
Adds documentation on port requirements and a short overview of STUN with some example scenarios. Co-authored-by: Dean Sheather <dean@deansheather.com> Co-authored-by: Spike Curtis <spike@coder.com>
1 parent 2fc3064 commit d1a522a

File tree

3 files changed

+254
-0
lines changed

3 files changed

+254
-0
lines changed

docs/manifest.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,11 @@
292292
"title": "Port Forwarding",
293293
"description": "Learn how to forward ports in Coder",
294294
"path": "./networking/port-forwarding.md"
295+
},
296+
{
297+
"title": "STUN and NAT",
298+
"description": "Learn how Coder establishes direct connections",
299+
"path": "./networking/stun.md"
295300
}
296301
]
297302
},

docs/networking/index.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,49 @@ user <-> workspace connections are end-to-end encrypted.
1313

1414
[Tailscale's open source](https://tailscale.com) backs our networking logic.
1515

16+
## Requirements
17+
18+
In order for clients and workspaces to be able to connect:
19+
20+
- All clients and agents must be able to establish a connection to the Coder
21+
server (`CODER_ACCESS_URL`) over HTTP/HTTPS.
22+
- Any reverse proxy or ingress between the Coder control plane and
23+
clients/agents must support WebSockets.
24+
25+
In order for clients to be able to establish direct connections:
26+
27+
> **Note:** Direct connections via the web browser are not supported. To improve
28+
> latency for browser-based applications running inside Coder workspaces in
29+
> regions far from the Coder control plane, consider deploying one or more
30+
> [workspace proxies](../admin/workspace-proxies.md).
31+
32+
- The client is connecting using the CLI (e.g. `coder ssh` or
33+
`coder port-forward`). Note that the
34+
[VSCode extension](https://marketplace.visualstudio.com/items?itemName=coder.coder-remote)
35+
and [JetBrains Plugin](https://plugins.jetbrains.com/plugin/19620-coder/), and
36+
[`ssh coder.<workspace>`](../cli/config-ssh.md) all utilize the CLI to
37+
establish a workspace connection.
38+
- Either the client or workspace agent are able to discover a reachable
39+
`ip:port` of their counterpart. If the agent and client are able to
40+
communicate with each other using their locally assigned IP addresses, then a
41+
direct connection can be established immediately. Otherwise, the client and
42+
agent will contact
43+
[the configured STUN servers](../cli/server.md#derp-server-stun-addresses) to
44+
try and determine which `ip:port` can be used to communicate with their
45+
counterpart. See [STUN and NAT](./stun.md) for more details on how this
46+
process works.
47+
- All outbound UDP traffic must be allowed for both the client and the agent on
48+
**all ports** to each others' respective networks.
49+
- To establish a direct connection, both agent and client use STUN. This
50+
involves sending UDP packets outbound on `udp/3478` to the configured
51+
[STUN server](../cli/server.md#--derp-server-stun-addresses). If either the
52+
agent or the client are unable to send and receive UDP packets to a STUN
53+
server, then direct connections will not be possible.
54+
- Both agents and clients will then establish a
55+
[WireGuard](https://www.wireguard.com/)️ tunnel and send UDP traffic on
56+
ephemeral (high) ports. If a firewall between the client and the agent
57+
blocks this UDP traffic, direct connections will not be possible.
58+
1659
## coder server
1760

1861
Workspaces connect to the coder server via the server's external address, set
@@ -52,6 +95,12 @@ Direct connections are a straight line between the user and workspace, so there
5295
is no special geo-distribution configuration. To speed up direct connections,
5396
move the user and workspace closer together.
5497

98+
Establishing a direct connection can be an involved process because both the
99+
client and workspace agent will likely be behind at least one level of NAT,
100+
meaning that we need to use STUN to learn the IP address and port under which
101+
the client and agent can both contact each other. See [STUN and NAT](./stun.md)
102+
for more information on how this process works.
103+
55104
If a direct connection is not available (e.g. client or server is behind NAT),
56105
Coder will use a relayed connection. By default,
57106
[Coder uses Google's public STUN server](../cli/server.md#--derp-server-stun-addresses),

docs/networking/stun.md

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# STUN and NAT
2+
3+
> [Session Traversal Utilities for NAT (STUN)](https://www.rfc-editor.org/rfc/rfc8489.html)
4+
> is a protocol used to assist applications in establishing peer-to-peer
5+
> communications across Network Address Translations (NATs) or firewalls.
6+
>
7+
> [Network Address Translation (NAT)](https://en.wikipedia.org/wiki/Network_address_translation)
8+
> is commonly used in private networks to allow multiple devices to share a
9+
> single public IP address. The vast majority of home and corporate internet
10+
> connections use at least one level of NAT.
11+
12+
## Overview
13+
14+
In order for one application to connect to another across a network, the
15+
connecting application needs to know the IP address and port under which the
16+
target application is reachable. If both applications reside on the same
17+
network, then they can most likely connect directly to each other. In the
18+
context of a Coder workspace agent and client, this is generally not the case,
19+
as both agent and client will most likely be running in different _private_
20+
networks (e.g. `192.168.1.0/24`). In this case, at least one of the two will
21+
need to know an IP address and port under which they can reach their
22+
counterpart.
23+
24+
This problem is often referred to as NAT traversal, and Coder uses a standard
25+
protocol named STUN to address this.
26+
27+
Inside of that network, packets from the agent or client will show up as having
28+
source address `192.168.1.X:12345`. However, outside of this private network,
29+
the source address will show up differently (for example, `12.3.4.56:54321`). In
30+
order for the Coder client and agent to establish a direct connection with each
31+
other, one of them needs to know the `ip:port` pair under which their
32+
counterpart can be reached. Once communication succeeds in one direction, we can
33+
inspect the source address of the received packet to determine the return
34+
address.
35+
36+
At a high level, STUN works like this:
37+
38+
> The below glosses over a lot of the complexity of traversing NATs. For a more
39+
> in-depth technical explanation, see
40+
> [How NAT traversal works (tailscale.com)](https://tailscale.com/blog/how-nat-traversal-works).
41+
42+
- **Discovery:** Both the client and agent will send UDP traffic to one or more
43+
configured STUN servers. These STUN servers are generally located on the
44+
public internet, and respond with the public IP address and port from which
45+
the request came.
46+
- **Coordination:** The client and agent then exchange this information through
47+
the Coder server. They will then construct packets that should be able to
48+
successfully traverse their counterpart's NATs successfully.
49+
- **NAT Traversal:** The client and agent then send these crafted packets to
50+
their counterpart's public addresses. If all goes well, the NATs on the other
51+
end should route these packets to the correct internal address.
52+
- **Connection:** Once the packets reach the other side, they send a response
53+
back to the source `ip:port` from the packet. Again, the NATs should recognize
54+
these responses as belonging to an ongoing communication, and forward them
55+
accordingly.
56+
57+
At this point, both the client and agent should be able to send traffic directly
58+
to each other.
59+
60+
## Examples
61+
62+
In this example, both the client and agent are located on the network
63+
`192.168.21.0/24`. Assuming no firewalls are blocking packets in either
64+
direction, both client and agent are able to communicate directly with each
65+
other's locally assigned IP address.
66+
67+
### 1. Direct connections without NAT or STUN
68+
69+
```mermaid
70+
flowchart LR
71+
subgraph corpnet["Private Network\ne.g. Corp. LAN"]
72+
A[Client Workstation\n192.168.21.47:38297]
73+
C[Workspace Agent\n192.168.21.147:41563]
74+
A <--> C
75+
end
76+
```
77+
78+
### 2. Direct connections with one layer of NAT
79+
80+
In this example, client and agent are located on different networks and connect
81+
to each other over the public Internet. Both client and agent connect to a
82+
configured STUN server located on the public Internet to determine the public IP
83+
address and port on which they can be reached.
84+
85+
```mermaid
86+
flowchart LR
87+
subgraph homenet["Network A"]
88+
client["Client workstation\n192.168.1.101:38297"]
89+
homenat["NAT\n??.??.??.??:?????"]
90+
end
91+
subgraph internet["Public Internet"]
92+
stun1["STUN server"]
93+
end
94+
subgraph corpnet["Network B"]
95+
agent["Workspace agent\n10.21.43.241:56812"]
96+
corpnat["NAT\n??.??.??.??:?????"]
97+
end
98+
99+
client --- homenat
100+
agent --- corpnat
101+
corpnat -- "[I see 12.34.56.7:41563]" --> stun1
102+
homenat -- "[I see 65.4.3.21:29187]" --> stun1
103+
```
104+
105+
They then exchange this information through Coder server, and can then
106+
communicate directly with each other through their respective NATs.
107+
108+
```mermaid
109+
flowchart LR
110+
subgraph homenet["Network A"]
111+
client["Client workstation\n192.168.1.101:38297"]
112+
homenat["NAT\n65.4.3.21:29187"]
113+
end
114+
subgraph corpnet["Network B"]
115+
agent["Workspace agent\n10.21.43.241:56812"]
116+
corpnat["NAT\n12.34.56.7:41563"]
117+
end
118+
119+
subgraph internet["Public Internet"]
120+
end
121+
122+
client -- "[12.34.56.7:41563]" --- homenat
123+
agent -- "[10.21.43.241:56812]" --- corpnat
124+
corpnat -- "[65.4.3.21:29187]" --> internet
125+
homenat -- "[12.34.56.7:41563]" --> internet
126+
127+
```
128+
129+
### 3. Direct connections with VPN and NAT hairpinning
130+
131+
In this example, the client workstation must use a VPN to connect to the
132+
corporate network. All traffic from the client will enter through the VPN entry
133+
node and exit at the VPN exit node inside the corporate network. Traffic from
134+
the client inside the corporate network will appear to be coming from the IP
135+
address of the VPN exit node `172.16.1.2`. Traffic from the client to the public
136+
internet will appear to have the public IP address of the corporate router
137+
`12.34.56.7`.
138+
139+
The workspace agent is running on a Kubernetes cluster inside the corporate
140+
network, which is behind its own layer of NAT. To anyone inside the corporate
141+
network but outside the cluster network, its traffic will appear to be coming
142+
from `172.16.1.254`. However, traffic from the agent to services on the public
143+
Internet will also see traffic originating from the public IP address assigned
144+
to the corporate router. Additionally, the corporate router will most likely
145+
have a firewall configured to block traffic from the internet to the corporate
146+
network.
147+
148+
If the client and agent both use the public STUN server, the addresses
149+
discovered by STUN will both be the public IP address of the corporate router.
150+
To correctly route the traffic backwards, the corporate router must correctly
151+
route both:
152+
153+
- Traffic sent from the client to the external IP of the corporate router back
154+
to the cluster router, and
155+
- Traffic sent from the agent to the external IP of the corporate router to the
156+
VPN exit node.
157+
158+
This behaviour is known as "hairpinning", and may not be supported in all
159+
network configurations.
160+
161+
If hairpinning is not supported, deploying an internal STUN server can aid
162+
establishing direct connections between client and agent. When the agent and
163+
client query this internal STUN server, they will be able to determine the
164+
addresses on the corporate network from which their traffic appears to
165+
originate. Using these internal addresses is much more likely to result in a
166+
successful direct connection.
167+
168+
```mermaid
169+
flowchart TD
170+
subgraph homenet["Home Network"]
171+
client["Client workstation\n192.168.1.101"]
172+
homenat["Home Router/NAT\n65.4.3.21"]
173+
end
174+
175+
subgraph internet["Public Internet"]
176+
stun1["Public STUN"]
177+
vpn1["VPN entry node"]
178+
end
179+
180+
subgraph corpnet["Corp Network 172.16.1.0/24"]
181+
corpnat["Corp Router/NAT\n172.16.1.1\n12.34.56.7"]
182+
vpn2["VPN exit node\n172.16.1.2"]
183+
stun2["Private STUN"]
184+
185+
subgraph cluster["Cluster Network 10.11.12.0/16"]
186+
clusternat["Cluster Router/NAT\n10.11.12.1\n172.16.1.254"]
187+
agent["Workspace agent\n10.11.12.34"]
188+
end
189+
end
190+
191+
vpn1 === vpn2
192+
vpn2 --> stun2
193+
client === homenat
194+
homenat === vpn1
195+
homenat x-.-x stun1
196+
agent --- clusternat
197+
clusternat --- corpnat
198+
corpnat --> stun1
199+
corpnat --> stun2
200+
```

0 commit comments

Comments
 (0)