Skip to content

docs: add high availability #4583

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 17, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Starting to look good
  • Loading branch information
ammario committed Oct 17, 2022
commit 7cf7e56c410a6c5b0f01c5a5bed3100794689ad7
56 changes: 40 additions & 16 deletions docs/admin/high-availability.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,38 +2,62 @@

High Availability (HA) mode solves for horizontal scalability and automatic failover
within a single region. When in HA mode, Coder continues using a single Postgres
endpoint. [GCP](https://cloud.google.com/sql/docs/postgres/high-availability), [AWS](https://docs.aws.amazon.com/prescriptive-guidance/latest/saas-multitenant-managed-postgresql/availability.html), and others offer fully-managed HA Postgres services.
endpoint. [GCP](https://cloud.google.com/sql/docs/postgres/high-availability), [AWS](https://docs.aws.amazon.com/prescriptive-guidance/latest/saas-multitenant-managed-postgresql/availability.html),
and other cloud vendors offer fully-managed HA Postgres services that pair
nicely with Coder.

For Coder to operate correctly, all Coder servers must be within 10ms of each other
For Coder to operate correctly, every node must be within 10ms of each other
and Postgres. We make a best-effort attempt to warn the user when inter-coder
latency is too high, but if requests start dropping, this is one metric to investigate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we surface the database latency, should we document the endpoint/dashboard to check this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do that once the Admin Settings page is in. cc @kylecarbs

Note that this latency requirement applies _only_ to coder services. Coder will
operate correctly even with few seconds of latency on
workspace <-> coder and user <-> coder connections.

## Automatic Setup
## Setup

Coder automatically enters HA mode when multiple instances connect to the same
Postgres endpoint. Thus, enabling HA is as simple as increasing the number
of deployed Coder replicas.
Coder automatically enters HA mode when multiple instances simultaneously connect
to the same Postgres endpoint.

## Kubernetes Setup
HA has one required configuration variable that you must set for each Coder
node: `CODER_DERP_SERVER_RELAY_URL`. The HA nodes use these URLs to communicate
with each other. Inter-node communication is only required while using the
embedded relay (default). If you're using [custom relays](../networking.md#custom-relays), Coder ignores `CODER_DERP_SERVER_RELAY_URL`, since Postgres is the sole rendezvous for the Coder nodes.

- Using our Helm, just increase `coder.replicaCount` in `values.yaml`
- Custom Helm Chart:
```
Since `CODER_ACCESS_URL` is a load balancer to all Coder nodes,
`CODER_DERP_SERVER_RELAY_URL` will never be `CODER_ACCESS_URL`.

Here's an example 3-node network configuration setup:

| Name | `CODER_ADDRESS` | `CODER_DERP_SERVER_RELAY_URL` | `CODER_ACCESS_URL` |
| ------- | --------------- | ----------------------------- | ----------------------- |
| `coder-1` | `*:80` | `http://10.0.0.1:80` | `https://coder.big.corp` |
| `coder-2` | `*:80` | `http://10.0.0.2:80` | `https://coder.big.corp` |
| `coder-3` | `*:80` | `http://10.0.0.3:80` | `https://coder.big.corp` |


## Kubernetes

If you installed Coder via
[our Helm Chart](../install/kubernetes.md#install-coder-with-helm), just
increase `coder.replicaCount` in `values.yaml`.


If you installed Coder by some other means, insert the relay URL via the
environment like so:

```yaml
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: CODER_DERP_SERVER_RELAY_URL
value: http://$(POD_IP)
```

## Virtual Machine Setup

Set `CODER_DERP_SERVER_RELAY_URL` to an address that other instances can access:
```
Then, increase the number of pods.

## Up next

- [Networking](../networking.md)
- [Kubernetes](../install/kubernetes.md.md)
- [Kubernetes](../install/kubernetes.md)
- [Enterprise](./enterprise.md)