Skip to content

docs: provide hardware recommendations for reference architectures #12534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Mar 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions docs/admin/architectures/1k-users.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Reference Architecture: up to 1,000 users

The 1,000 users architecture is designed to cover a wide range of workflows.
Examples of subjects that might utilize this architecture include medium-sized
tech startups, educational units, or small to mid-sized enterprises.

**Target load**: API: up to 180 RPS

**High Availability**: non-essential for small deployments

## Hardware recommendations

### Coderd nodes

| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | ------------------- | ------------------- | --------------- | ---------- | ----------------- |
| Up to 1,000 | 2 vCPU, 8 GB memory | 1-2 / 1 coderd each | `n1-standard-2` | `t3.large` | `Standard_D2s_v3` |

**Footnotes**:

- For small deployments (ca. 100 users, 10 concurrent workspace builds), it is
acceptable to deploy provisioners on `coderd` nodes.

### Provisioner nodes

| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ------------------------------ | ---------------- | ------------ | ----------------- |
| Up to 1,000 | 8 vCPU, 32 GB memory | 2 nodes / 30 provisioners each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |

**Footnotes**:

- An external provisioner is deployed as Kubernetes pod.

### Workspace nodes

| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ----------------------- | ---------------- | ------------ | ----------------- |
| Up to 1,000 | 8 vCPU, 32 GB memory | 64 / 16 workspaces each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |

**Footnotes**:

- Assumed that a workspace user needs at minimum 2 GB memory to perform. We
recommend against over-provisioning memory for developer workloads, as this my
lead to OOMKiller invocations.
- Maximum number of Kubernetes workspace pods per node: 256

### Database nodes

| Users | Node capacity | Replicas | Storage | GCP | AWS | Azure |
| ----------- | ------------------- | -------- | ------- | ------------------ | ------------- | ----------------- |
| Up to 1,000 | 2 vCPU, 8 GB memory | 1 | 512 GB | `db-custom-2-7680` | `db.t3.large` | `Standard_D2s_v3` |
59 changes: 59 additions & 0 deletions docs/admin/architectures/2k-users.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Reference Architecture: up to 2,000 users

In the 2,000 users architecture, there is a moderate increase in traffic,
suggesting a growing user base or expanding operations. This setup is
well-suited for mid-sized companies experiencing growth or for universities
seeking to accommodate their expanding user populations.

Users can be evenly distributed between 2 regions or be attached to different
clusters.

**Target load**: API: up to 300 RPS

**High Availability**: The mode is _enabled_; multiple replicas provide higher
deployment reliability under load.

## Hardware recommendations

### Coderd nodes

| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ----------------------- | --------------- | ----------- | ----------------- |
| Up to 2,000 | 4 vCPU, 16 GB memory | 2 nodes / 1 coderd each | `n1-standard-4` | `t3.xlarge` | `Standard_D4s_v3` |

### Provisioner nodes

| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ------------------------------ | ---------------- | ------------ | ----------------- |
| Up to 2,000 | 8 vCPU, 32 GB memory | 4 nodes / 30 provisioners each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |

**Footnotes**:

- An external provisioner is deployed as Kubernetes pod.
- It is not recommended to run provisioner daemons on `coderd` nodes.
- Consider separating provisioners into different namespaces in favor of
zero-trust or multi-cloud deployments.

### Workspace nodes

| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ------------------------ | ---------------- | ------------ | ----------------- |
| Up to 2,000 | 8 vCPU, 32 GB memory | 128 / 16 workspaces each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |

**Footnotes**:

- Assumed that a workspace user needs 2 GB memory to perform
- Maximum number of Kubernetes workspace pods per node: 256
- Nodes can be distributed in 2 regions, not necessarily evenly split, depending
on developer team sizes

### Database nodes

| Users | Node capacity | Replicas | Storage | GCP | AWS | Azure |
| ----------- | -------------------- | -------- | ------- | ------------------- | -------------- | ----------------- |
| Up to 2,000 | 4 vCPU, 16 GB memory | 1 | 1 TB | `db-custom-4-15360` | `db.t3.xlarge` | `Standard_D4s_v3` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using a db-custom-8-32768 for our 2000 user scaletests, so these CPU number here may be slightly inaccurate. Granted, for a regular (non-scaletest) deployment, I think 4 vCPU is borderline sufficient as it's the 2000 active workspaces that push the DB CPU load up to ~80% (8 vCPU).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, our scaletesting methology was really aggressive, so I would expect a similar pattern on the user side 🤔
What is your recommendation @maf? Should we switch to db-custom-8-32768?


**Footnotes**:

- Consider adding more replicas if the workspace activity is higher than 500
workspace builds per day or to achieve higher RPS.
62 changes: 62 additions & 0 deletions docs/admin/architectures/3k-users.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Reference Architecture: up to 3,000 users

The 3,000 users architecture targets large-scale enterprises, possibly with
on-premises network and cloud deployments.

**Target load**: API: up to 550 RPS

**High Availability**: Typically, such scale requires a fully-managed HA
PostgreSQL service, and all Coder observability features enabled for operational
purposes.

**Observability**: Deploy monitoring solutions to gather Prometheus metrics and
visualize them with Grafana to gain detailed insights into infrastructure and
application behavior. This allows operators to respond quickly to incidents and
continuously improve the reliability and performance of the platform.

## Hardware recommendations

### Coderd nodes

| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ----------------- | --------------- | ----------- | ----------------- |
| Up to 3,000 | 8 vCPU, 32 GB memory | 4 / 1 coderd each | `n1-standard-4` | `t3.xlarge` | `Standard_D4s_v3` |

### Provisioner nodes

| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ------------------------ | ---------------- | ------------ | ----------------- |
| Up to 3,000 | 8 vCPU, 32 GB memory | 8 / 30 provisioners each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |

**Footnotes**:

- An external provisioner is deployed as Kubernetes pod.
- It is strongly discouraged to run provisioner daemons on `coderd` nodes at
this level of scale.
- Separate provisioners into different namespaces in favor of zero-trust or
multi-cloud deployments.

### Workspace nodes

| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | ------------------------------ | ---------------- | ------------ | ----------------- |
| Up to 3,000 | 8 vCPU, 32 GB memory | 256 nodes / 12 workspaces each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |

**Footnotes**:

- Assumed that a workspace user needs 2 GB memory to perform
- Maximum number of Kubernetes workspace pods per node: 256
- As workspace nodes can be distributed between regions, on-premises networks
and cloud areas, consider different namespaces in favor of zero-trust or
multi-cloud deployments.

### Database nodes

| Users | Node capacity | Replicas | Storage | GCP | AWS | Azure |
| ----------- | -------------------- | -------- | ------- | ------------------- | --------------- | ----------------- |
| Up to 3,000 | 8 vCPU, 32 GB memory | 2 | 1.5 TB | `db-custom-8-30720` | `db.t3.2xlarge` | `Standard_D8s_v3` |

**Footnotes**:

- Consider adding more replicas if the workspace activity is higher than 1500
workspace builds per day or to achieve higher RPS.
Loading