Skip to content

chore(docs): add requirements re ports and stun server to docs #12026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Feb 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,11 @@
"title": "Port Forwarding",
"description": "Learn how to forward ports in Coder",
"path": "./networking/port-forwarding.md"
},
{
"title": "STUN and NAT",
"description": "Learn how Coder establishes direct connections",
"path": "./networking/stun.md"
}
]
},
Expand Down
49 changes: 49 additions & 0 deletions docs/networking/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,49 @@ user <-> workspace connections are end-to-end encrypted.

[Tailscale's open source](https://tailscale.com) backs our networking logic.

## Requirements

In order for clients and workspaces to be able to connect:

- All clients and agents must be able to establish a connection to the Coder
server (`CODER_ACCESS_URL`) over HTTP/HTTPS.
- Any reverse proxy or ingress between the Coder control plane and
clients/agents must support WebSockets.

In order for clients to be able to establish direct connections:

> **Note:** Direct connections via the web browser are not supported. To improve
> latency for browser-based applications running inside Coder workspaces in
> regions far from the Coder control plane, consider deploying one or more
> [workspace proxies](../admin/workspace-proxies.md).

- The client is connecting using the CLI (e.g. `coder ssh` or
`coder port-forward`). Note that the
[VSCode extension](https://marketplace.visualstudio.com/items?itemName=coder.coder-remote)
and [JetBrains Plugin](https://plugins.jetbrains.com/plugin/19620-coder/), and
[`ssh coder.<workspace>`](../cli/config-ssh.md) all utilize the CLI to
establish a workspace connection.
- Either the client or workspace agent are able to discover a reachable
`ip:port` of their counterpart. If the agent and client are able to
communicate with each other using their locally assigned IP addresses, then a
direct connection can be established immediately. Otherwise, the client and
agent will contact
[the configured STUN servers](../cli/server.md#derp-server-stun-addresses) to
try and determine which `ip:port` can be used to communicate with their
counterpart. See [STUN and NAT](./stun.md) for more details on how this
process works.
- All outbound UDP traffic must be allowed for both the client and the agent on
**all ports** to each others' respective networks.
- To establish a direct connection, both agent and client use STUN. This
involves sending UDP packets outbound on `udp/3478` to the configured
[STUN server](../cli/server.md#--derp-server-stun-addresses). If either the
agent or the client are unable to send and receive UDP packets to a STUN
server, then direct connections will not be possible.
- Both agents and clients will then establish a
[WireGuard](https://www.wireguard.com/)️ tunnel and send UDP traffic on
ephemeral (high) ports. If a firewall between the client and the agent
blocks this UDP traffic, direct connections will not be possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should probably be some details about NATs and stuff but I really don't know what to write without getting super technical :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I'd prefer to just link to Tailscale's docs on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK how well Tailscale's docs cut it, this one doesn't really explain much about NAT just about firewalls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I link to https://tailscale.com/blog/how-nat-traversal-works a bit above in the STUN section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good blog post but it's not very good documentation since it's 9000 words long. We should probably dumb it down

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I added a fairly high-level overview.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've Slack'd you some diagrams. If we're going to explain it, then those are my suggested examples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added and moved to a separate page as this one is getting plenty big already.

## coder server

Workspaces connect to the coder server via the server's external address, set
Expand Down Expand Up @@ -52,6 +95,12 @@ Direct connections are a straight line between the user and workspace, so there
is no special geo-distribution configuration. To speed up direct connections,
move the user and workspace closer together.

Establishing a direct connection can be an involved process because both the
client and workspace agent will likely be behind at least one level of NAT,
meaning that we need to use STUN to learn the IP address and port under which
the client and agent can both contact each other. See [STUN and NAT](./stun.md)
for more information on how this process works.

If a direct connection is not available (e.g. client or server is behind NAT),
Coder will use a relayed connection. By default,
[Coder uses Google's public STUN server](../cli/server.md#--derp-server-stun-addresses),
Expand Down
200 changes: 200 additions & 0 deletions docs/networking/stun.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
# STUN and NAT

> [Session Traversal Utilities for NAT (STUN)](https://www.rfc-editor.org/rfc/rfc8489.html)
> is a protocol used to assist applications in establishing peer-to-peer
> communications across Network Address Translations (NATs) or firewalls.
>
> [Network Address Translation (NAT)](https://en.wikipedia.org/wiki/Network_address_translation)
> is commonly used in private networks to allow multiple devices to share a
> single public IP address. The vast majority of home and corporate internet
> connections use at least one level of NAT.

## Overview

In order for one application to connect to another across a network, the
connecting application needs to know the IP address and port under which the
target application is reachable. If both applications reside on the same
network, then they can most likely connect directly to each other. In the
context of a Coder workspace agent and client, this is generally not the case,
as both agent and client will most likely be running in different _private_
networks (e.g. `192.168.1.0/24`). In this case, at least one of the two will
need to know an IP address and port under which they can reach their
counterpart.

This problem is often referred to as NAT traversal, and Coder uses a standard
protocol named STUN to address this.

Inside of that network, packets from the agent or client will show up as having
source address `192.168.1.X:12345`. However, outside of this private network,
the source address will show up differently (for example, `12.3.4.56:54321`). In
order for the Coder client and agent to establish a direct connection with each
other, one of them needs to know the `ip:port` pair under which their
counterpart can be reached. Once communication succeeds in one direction, we can
inspect the source address of the received packet to determine the return
address.

At a high level, STUN works like this:

> The below glosses over a lot of the complexity of traversing NATs. For a more
> in-depth technical explanation, see
> [How NAT traversal works (tailscale.com)](https://tailscale.com/blog/how-nat-traversal-works).

- **Discovery:** Both the client and agent will send UDP traffic to one or more
configured STUN servers. These STUN servers are generally located on the
public internet, and respond with the public IP address and port from which
the request came.
- **Coordination:** The client and agent then exchange this information through
the Coder server. They will then construct packets that should be able to
successfully traverse their counterpart's NATs successfully.
- **NAT Traversal:** The client and agent then send these crafted packets to
their counterpart's public addresses. If all goes well, the NATs on the other
end should route these packets to the correct internal address.
- **Connection:** Once the packets reach the other side, they send a response
back to the source `ip:port` from the packet. Again, the NATs should recognize
these responses as belonging to an ongoing communication, and forward them
accordingly.

At this point, both the client and agent should be able to send traffic directly
to each other.

## Examples

In this example, both the client and agent are located on the network
`192.168.21.0/24`. Assuming no firewalls are blocking packets in either
direction, both client and agent are able to communicate directly with each
other's locally assigned IP address.

### 1. Direct connections without NAT or STUN

```mermaid
flowchart LR
subgraph corpnet["Private Network\ne.g. Corp. LAN"]
A[Client Workstation\n192.168.21.47:38297]
C[Workspace Agent\n192.168.21.147:41563]
A <--> C
end
```
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flowchart LR
    subgraph corpnet["Private Network\ne.g. Corp. LAN"]
    A[Client Workstation\n192.168.21.47:38297]
    C[Workspace Agent\n192.168.21.147:41563]
    A <--> C
    end
Loading


### 2. Direct connections with one layer of NAT

In this example, client and agent are located on different networks and connect
to each other over the public Internet. Both client and agent connect to a
configured STUN server located on the public Internet to determine the public IP
address and port on which they can be reached.

```mermaid
flowchart LR
subgraph homenet["Network A"]
client["Client workstation\n192.168.1.101:38297"]
homenat["NAT\n??.??.??.??:?????"]
end
subgraph internet["Public Internet"]
stun1["STUN server"]
end
subgraph corpnet["Network B"]
agent["Workspace agent\n10.21.43.241:56812"]
corpnat["NAT\n??.??.??.??:?????"]
end

client --- homenat
agent --- corpnat
corpnat -- "[I see 12.34.56.7:41563]" --> stun1
homenat -- "[I see 65.4.3.21:29187]" --> stun1
```
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flowchart LR
  subgraph homenet["Network A"]
    client["Client workstation\n192.168.1.101:38297"]
    homenat["NAT\n??.??.??.??:?????"]
  end
  subgraph internet["Public Internet"]
    stun1["STUN server"]
  end
  subgraph corpnet["Network B"]
    agent["Workspace agent\n10.21.43.241:56812"]
    corpnat["NAT\n??.??.??.??:?????"]
  end
  client --- homenat
  agent --- corpnat
  corpnat -- "[I see 12.34.56.7:41563]" --> stun1
  homenat -- "[I see 65.4.3.21:29187]" --> stun1
Loading


They then exchange this information through Coder server, and can then
communicate directly with each other through their respective NATs.

```mermaid
flowchart LR
subgraph homenet["Network A"]
client["Client workstation\n192.168.1.101:38297"]
homenat["NAT\n65.4.3.21:29187"]
end
subgraph corpnet["Network B"]
agent["Workspace agent\n10.21.43.241:56812"]
corpnat["NAT\n12.34.56.7:41563"]
end

subgraph internet["Public Internet"]
end

client -- "[12.34.56.7:41563]" --- homenat
agent -- "[10.21.43.241:56812]" --- corpnat
corpnat -- "[65.4.3.21:29187]" --> internet
homenat -- "[12.34.56.7:41563]" --> internet

```
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flowchart LR
  subgraph homenet["Home Network"]
    direction LR
    client["Client workstation\n192.168.1.101:38297"]
    homenat["Home Router/NAT\n65.4.3.21:29187"]
  end
  subgraph corpnet["Corp Network"]
    direction LR
    agent["Workspace agent\n10.21.43.241:56812"]
    corpnat["Corp Router/NAT\n12.34.56.7:41563"]
  end
  subgraph internet["Public Internet"]
  end
  client -- "[12.34.56.7:41563]" --- homenat
  homenat -- "[12.34.56.7:41563]" --- internet
  internet -- "[12.34.56.7:41563]" --- corpnat
  corpnat -- "[10.21.43.241:56812]" --> agent
Loading

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate Mermaid lays these out differently. Presumably it's due to the arrow directions... does it look weird to keep the use of arrows consistent?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm honestly considering replacing with some manual drawings just so we can more easily fine-tune it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, fixed by changing the arrow directions.


### 3. Direct connections with VPN and NAT hairpinning

In this example, the client workstation must use a VPN to connect to the
corporate network. All traffic from the client will enter through the VPN entry
node and exit at the VPN exit node inside the corporate network. Traffic from
the client inside the corporate network will appear to be coming from the IP
address of the VPN exit node `172.16.1.2`. Traffic from the client to the public
internet will appear to have the public IP address of the corporate router
`12.34.56.7`.

The workspace agent is running on a Kubernetes cluster inside the corporate
network, which is behind its own layer of NAT. To anyone inside the corporate
network but outside the cluster network, its traffic will appear to be coming
from `172.16.1.254`. However, traffic from the agent to services on the public
Internet will also see traffic originating from the public IP address assigned
to the corporate router. Additionally, the corporate router will most likely
have a firewall configured to block traffic from the internet to the corporate
network.

If the client and agent both use the public STUN server, the addresses
discovered by STUN will both be the public IP address of the corporate router.
To correctly route the traffic backwards, the corporate router must correctly
route both:

- Traffic sent from the client to the external IP of the corporate router back
to the cluster router, and
- Traffic sent from the agent to the external IP of the corporate router to the
VPN exit node.

This behaviour is known as "hairpinning", and may not be supported in all
network configurations.

If hairpinning is not supported, deploying an internal STUN server can aid
establishing direct connections between client and agent. When the agent and
client query this internal STUN server, they will be able to determine the
addresses on the corporate network from which their traffic appears to
originate. Using these internal addresses is much more likely to result in a
successful direct connection.

```mermaid
flowchart TD
subgraph homenet["Home Network"]
client["Client workstation\n192.168.1.101"]
homenat["Home Router/NAT\n65.4.3.21"]
end

subgraph internet["Public Internet"]
stun1["Public STUN"]
vpn1["VPN entry node"]
end

subgraph corpnet["Corp Network 172.16.1.0/24"]
corpnat["Corp Router/NAT\n172.16.1.1\n12.34.56.7"]
vpn2["VPN exit node\n172.16.1.2"]
stun2["Private STUN"]

subgraph cluster["Cluster Network 10.11.12.0/16"]
clusternat["Cluster Router/NAT\n10.11.12.1\n172.16.1.254"]
agent["Workspace agent\n10.11.12.34"]
end
end

vpn1 === vpn2
vpn2 --> stun2
client === homenat
homenat === vpn1
homenat x-.-x stun1
agent --- clusternat
clusternat --- corpnat
corpnat --> stun1
corpnat --> stun2
```
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flowchart TD
  subgraph homenet["Home Network"]
    client["Client workstation\n192.168.1.101"]
    homenat["Home Router/NAT\n65.4.3.21"]
  end
  subgraph internet["Public Internet"]
    stun1["Public STUN"]
    vpn1["VPN entry node"]
  end
  subgraph corpnet["Corp Network 172.16.1.0/24"]
    corpnat["Corp Router/NAT\n172.16.1.1\n12.34.56.7"]
    vpn2["VPN exit node\n172.16.1.2"]
    stun2["Private STUN"]
    subgraph cluster["Cluster Network 10.11.12.0/16"]
      clusternat["Cluster Router/NAT\n10.11.12.1\n172.16.1.254"]
      agent["Workspace agent\n10.11.12.34"]
    end
  end
  vpn1 === vpn2
  vpn2 --> stun2
  client === homenat
  homenat === vpn1
  homenat x-.-x stun1
  agent --- clusternat
  clusternat --- corpnat
  corpnat --> stun1
  corpnat --> stun2
Loading