Skip to content

docs: provide hardware recommendations for reference architectures #12534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Mar 15, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
now workspaces
  • Loading branch information
mtojek committed Mar 12, 2024
commit fa1215f11d7bd663dc7229da2512d24e08eac09b
2 changes: 1 addition & 1 deletion docs/admin/architectures/1k-users.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@ tech startups, educational units, or small to mid-sized enterprises.

### Coderd nodes

| Users | Cluster capacity | Replicas | GCP | AWS | Azure |
| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | ------------------- | -------- | --------------- | ---------- | ----------------- |
| Up to 1,000 | 2 vCPU, 8 GB memory | 2 | `n1-standard-2` | `t3.large` | `Standard_D2s_v3` |
2 changes: 1 addition & 1 deletion docs/admin/architectures/2k-users.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ enabling it for deployment reliability.

### Coderd nodes

| Users | Cluster capacity | Replicas | GCP | AWS | Azure |
| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | -------- | --------------- | ----------- | ----------------- |
| Up to 2,000 | 4 vCPU, 16 GB memory | 2 | `n1-standard-4` | `t3.xlarge` | `Standard_D4s_v3` |
2 changes: 1 addition & 1 deletion docs/admin/architectures/3k-users.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ purposes.

### Coderd nodes

| Users | Cluster capacity | Replicas | GCP | AWS | Azure |
| Users | Node capacity | Replicas | GCP | AWS | Azure |
| ----------- | -------------------- | -------- | --------------- | ----------- | ----------------- |
| Up to 3,000 | 8 vCPU, 32 GB memory | 4 | `n1-standard-4` | `t3.xlarge` | `Standard_D4s_v3` |
55 changes: 44 additions & 11 deletions docs/admin/architectures/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,27 +205,60 @@ this option enabled unless there are compelling reasons to disable it.

Inactive users do not consume Coder resources.

#### HTTP API latency
#### Scaling formula

API latency/response time average number of HTTP requests
When determining scaling requirements, consider the following factors:

depending on database perf
- 1 vCPU x 2 GB memory x 250 users: A reasonable formula to determine resource
allocation based on the number of users and their expected usage patterns.
- API latency/response time: Monitor API latency and response times to ensure
optimal performance under varying loads.
- Average number of HTTP requests: Track the average number of HTTP requests to
gauge system usage and identify potential bottlenecks.

TODO
**Node Autoscaling**

#### Scaling formula
We recommend to disable autoscaling for `coderd` nodes. Autoscaling can cause
interruptions for user connections, see [Autoscaling](../scale.md#autoscaling)
for more details.

### Workspaces

reasonable ratio/formula: CPU x memory x users reasonable ratio/formula:
provisionerd x users API latency/response time average number of HTTP requests
Assumptions:

TODO
workspaces also run on the same Kubernetes cluster (recommend a different
namespace/node pool)

### Workspaces
developers can pick between 4-8 CPU and 4-16 GB RAM workspaces (limits)

developers have a resource quota of 16 GPU 32 GB RAM (2-maxed out workspaces).

TODO
However, the Coder agent itself requires at minimum 0.1 CPU cores and 256 MB to
run inside a workspace.

web microservice development use case: resources are mostly underutilized but
spike during builds

Case study:

Developers for up to 2000+ users architecture are in 2 regions (a different
cluster) and are evenly split. In practice, this doesn’t change much besides the
diagram and workspaces node pool autoscaling config as it still uses the central
provisioner. Recommend multiple provisioner groups for zero-trust and
multi-cloud use cases. Developers for up to 3000+ users architecture are also in
an on-premises network. Document a provisioner running in a different cloud
environment, and the zero-trust benefits of that.

scaling formula

provisionerd x users: Another formula to consider, focusing on the capacity of
provisioner nodes relative to the number of workspace builds, triggered by
users.

### Database

TODO
PostgreSQL database

measure and document the impact of dbcrypt

###