coder · mtojek · Mar 15, 2024 · Mar 11, 2024 · Mar 12, 2024 · Mar 12, 2024
diff --git a/docs/admin/architectures/1k-users.md b/docs/admin/architectures/1k-users.md
@@ -0,0 +1,43 @@
+# Reference Architecture: up to 1,000 users
+
+The 1,000 users architecture is designed to cover a wide range of workflows.
+Examples of subjects that might utilize this architecture include medium-sized
+tech startups, educational units, or small to mid-sized enterprises.
+
+**Target load**: API: up to 180 RPS
+
+**High Availability**: non-essential for small deployments
+
+## Hardware recommendations
+
+### Coderd nodes
+
+| Users       | Node capacity       | Replicas | GCP             | AWS        | Azure             |
+| ----------- | ------------------- | -------- | --------------- | ---------- | ----------------- |
+| Up to 1,000 | 2 vCPU, 8 GB memory | 2        | `n1-standard-2` | `t3.large` | `Standard_D2s_v3` |
+
+**Footnotes**:
+
+- For small deployments (ca. 100 users, 10 concurrent workspace builds), it is
+  acceptable to deploy provisioners on `coderd` nodes.
+
+### Provisioner nodes
+
+| Users       | Node capacity        | Replicas                 | GCP              | AWS          | Azure             |
+| ----------- | -------------------- | ------------------------ | ---------------- | ------------ | ----------------- |
+| Up to 1,000 | 8 vCPU, 32 GB memory | 2 / 30 provisioners each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |
+
+**Footnotes**:
+
+- An external provisioner is deployed as Kubernetes pod.
+
+### Workspace nodes
+
+| Users       | Node capacity        | Replicas                | GCP              | AWS          | Azure             |
+| ----------- | -------------------- | ----------------------- | ---------------- | ------------ | ----------------- |
+| Up to 1,000 | 8 vCPU, 32 GB memory | 64 / 16 workspaces each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |
+
+**Footnotes**:
+
+- Assumed that a workspace user needs 2 GB memory to perform
+- Maximum number of Kubernetes workspace pods per node: 256
diff --git a/docs/admin/architectures/2k-users.md b/docs/admin/architectures/2k-users.md
@@ -0,0 +1,48 @@
+# Reference Architecture: up to 2,000 users
+
+In the 2,000 users architecture, there is a moderate increase in traffic,
+suggesting a growing user base or expanding operations. This setup is
+well-suited for mid-sized companies experiencing growth or for universities
+seeking to accommodate their expanding user populations.
+
+Users can be evenly distributed between 2 regions or be attached to different
+clusters.
+
+**Target load**: API: up to 300 RPS
+
+**High Availability**: The mode is _disabled_, but administrators may consider
+enabling it for deployment reliability.
+
+## Hardware recommendations
+
+### Coderd nodes
+
+| Users       | Node capacity        | Replicas | GCP             | AWS         | Azure             |
+| ----------- | -------------------- | -------- | --------------- | ----------- | ----------------- |
+| Up to 2,000 | 4 vCPU, 16 GB memory | 2        | `n1-standard-4` | `t3.xlarge` | `Standard_D4s_v3` |
+
+### Provisioner nodes
+
+| Users       | Node capacity        | Replicas                 | GCP              | AWS          | Azure             |
+| ----------- | -------------------- | ------------------------ | ---------------- | ------------ | ----------------- |
+| Up to 2,000 | 8 vCPU, 32 GB memory | 4 / 30 provisioners each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |
+
+**Footnotes**:
+
+- An external provisioner is deployed as Kubernetes pod.
+- It is not recommended to run provisioner daemons on `coderd` nodes.
+- Consider separating provisioners into different namespaces in favor of
+  zero-trust or multi-cloud deployments.
+
+### Workspace nodes
+
+| Users       | Node capacity        | Replicas                 | GCP              | AWS          | Azure             |
+| ----------- | -------------------- | ------------------------ | ---------------- | ------------ | ----------------- |
+| Up to 2,000 | 8 vCPU, 32 GB memory | 128 / 16 workspaces each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |
+
+**Footnotes**:
+
+- Assumed that a workspace user needs 2 GB memory to perform
+- Maximum number of Kubernetes workspace pods per node: 256
+- Nodes can be distributed in 2 regions, not necessarily evenly split, depending
+  on developer team sizes
diff --git a/docs/admin/architectures/3k-users.md b/docs/admin/architectures/3k-users.md
@@ -0,0 +1,45 @@
+# Reference Architecture: up to 3,000 users
+
+The 3,000 users architecture targets large-scale enterprises, possibly with
+on-premises network and cloud deployments.
+
+**Target load**: API: up to 550 RPS
+
+**High Availability**: Typically, such scale requires a fully-managed HA
+PostgreSQL service, and all Coder observability features enabled for operational
+purposes.
+
+## Hardware recommendations
+
+### Coderd nodes
+
+| Users       | Node capacity        | Replicas | GCP             | AWS         | Azure             |
+| ----------- | -------------------- | -------- | --------------- | ----------- | ----------------- |
+| Up to 3,000 | 8 vCPU, 32 GB memory | 4        | `n1-standard-4` | `t3.xlarge` | `Standard_D4s_v3` |
+
+### Provisioner nodes
+
+| Users       | Node capacity        | Replicas                 | GCP              | AWS          | Azure             |
+| ----------- | -------------------- | ------------------------ | ---------------- | ------------ | ----------------- |
+| Up to 3,000 | 8 vCPU, 32 GB memory | 8 / 30 provisioners each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |
+
+**Footnotes**:
+
+- An external provisioner is deployed as Kubernetes pod.
+- It is strongly discouraged to run provisioner daemons on `coderd` nodes.
+- Separate provisioners into different namespaces in favor of zero-trust or
+  multi-cloud deployments.
+
+### Workspace nodes
+
+| Users       | Node capacity        | Replicas                 | GCP              | AWS          | Azure             |
+| ----------- | -------------------- | ------------------------ | ---------------- | ------------ | ----------------- |
+| Up to 3,000 | 8 vCPU, 32 GB memory | 256 / 12 workspaces each | `t2d-standard-8` | `t3.2xlarge` | `Standard_D8s_v3` |
+
+**Footnotes**:
+
+- Assumed that a workspace user needs 2 GB memory to perform
+- Maximum number of Kubernetes workspace pods per node: 256
+- As workspace nodes can be distributed between regions, on-premises networks
+  and cloud areas, consider different namespaces in favor of zero-trust or
+  multi-cloud deployments.
diff --git a/docs/admin/reference-architectures.md → docs/admin/architectures/index.md b/docs/admin/reference-architectures.md → docs/admin/architectures/index.md
@@ -1,4 +1,4 @@
-# Reference architectures
+# Reference Architectures
 
 This document provides prescriptive solutions and reference architectures to
 support successful deployments of up to 2000 users and outlines at a high-level
@@ -156,8 +156,8 @@ Coder:
 
 - Median CPU usage for _coderd_: 3 vCPU, peaking at 3.7 vCPU during dashboard
   tests.
-- Median API request rate: 350 req/s during dashboard tests, 250 req/s during
-  Web Terminal and workspace apps tests.
+- Median API request rate: 350 RPS during dashboard tests, 250 RPS during Web
+  Terminal and workspace apps tests.
 - 2000 agent API connections with latency: p90 at 60 ms, p95 at 220 ms.
 - on average 2400 Web Socket connections during dashboard tests.
 
@@ -171,3 +171,141 @@ Database:
   metadata.
 - Memory utilization averages at 40%.
 - `write_ops_count` between 6.7 and 8.4 operations per second.
+
+## Available reference architectures
+
+[Up to 1,000 users](1k-users.md)
+
+[Up to 2,000 users](2k-users.md)
+
+[Up to 3,000 users](3k-users.md)
+
+## Hardware recommendation
+
+### Control plane: coderd
+
+To ensure stability and reliability of the Coder control plane, it's essential
+to focus on node sizing, resource limits, and the number of replicas. We
+recommend referencing public cloud providers such as AWS, GCP, and Azure for
+guidance on optimal configurations. A reasonable approach involves using scaling
+formulas based on factors like CPU, memory, and the number of users.
+
+While the minimum requirements specify 1 CPU core and 2 GB of memory per
+`coderd` replica, it is recommended to allocate additional resources to ensure
+deployment stability.
+
+#### CPU and memory usage
+
+The memory consumption may increase with enabled agent stats collection by the
+Prometheus metrics aggregator (optional).
+
+Enabling direct connections between users and workspace agents (apps or SSH
+traffic) can help prevent an increase in CPU usage. It is recommended to keep
+this option enabled unless there are compelling reasons to disable it.
+
+Inactive users do not consume Coder resources.
+
+#### Scaling formula
+
+When determining scaling requirements, consider the following factors:
+
+- `1 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource
- `1 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource
+- `0.5 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource
- `1 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource
+- `0.5 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource
+  allocation based on the number of users and their expected usage patterns.
+- API latency/response time: Monitor API latency and response times to ensure
+  optimal performance under varying loads.
+- Average number of HTTP requests: Track the average number of HTTP requests to
+  gauge system usage and identify potential bottlenecks.
+
+**HTTP API latency**
+
+For a reliable Coder deployment dealing with medium to high loads, it's
+important that API calls for workspace/template queries and workspace build
+operations respond within 300 ms. However, API template insights calls, which
+involve browsing workspace agent stats and user activity data, may require more
+time.
+
+Also, if the Coder deployment expects traffic from developers spread across the
+globe, keep in mind that customer-facing latency might be higher because of the
+distance between users and the load balancer.
+
+**Node Autoscaling**
+
+We recommend disabling the autoscaling for `coderd` nodes. Autoscaling can cause
+interruptions for user connections, see [Autoscaling](../scale.md#autoscaling)
+for more details.
+
+### Control plane: provisionerd
+
+Each provisioner can run a single concurrent workspace build. For example,
+running 10 provisioner containers will allow 10 users to start workspaces at the
+same time.
+
+By default, the Coder server runs built-in provisioner daemons, but the
+_Enterprise_ Coder release allows for running external provisioners to separate
+the load caused by workspace provisioning on the `coderd` nodes.
+
+#### Scaling formula
+
+When determining scaling requirements, consider the following factors:
+
+- `1 vCPU x 1 GB memory x 2 concurrent workspace build`: A formula to determine
+  resource allocation based on the number of concurrent workspace builds, and
+  standard complexity of a Terraform template. _Rule of thumb_: the more
+  provisioners are free/available, the more concurrent workspace builds can be
+  performed.
+
+**Node Autoscaling**
+
+Autoscaling provisioners is not an easy problem to solve unless it can be
+predicted when a number of concurrent workspace builds increases.
+
+We recommend disabling autoscaling and adjusting the number of provisioners to
+developer needs based on the workspace build queuing time.
+
+### Data plane: Workspaces
+
+To determine workspace resource limits and keep the best developer experience
+for workspace users, administrators must be aware of a few assumptions.
+
+- Workspace pods run on the same Kubernetes cluster, but possible in a different
+  namespace or a node pool.
+- Workspace limits (per workspace user):
+  - Evaluate the workspace utilization pattern. For instance, a regular web
+    development does not require high CPU capacity all the time, but only during
+    project builds or load tests.
+  - Evaluate minimal limits for single workspace. Include in the calculation
+    requirements for Coder agent running in an idle workspace - 0.1 vCPU and 256
+    MB. For instance, developers can choose between 0.5-8 vCPUs, and 1-16 GB
+    memory.
+
+#### Scaling formula
+
+When determining scaling requirements, consider the following factors:
+
+- `1 vCPU x 2 GB memory x 1 workspace`: A formula to determine resource
+  allocation based on the minimal requirements for an idle workspace with a
+  running Coder agent and occasional CPU and memory bursts for building
+  projects.
+
+**Node Autoscaling**
+
+Workspace nodes can be set to operate in autoscaling mode to mitigate the risk
+of prolonged high resource utilization.
+
+One approach is to scale up workspace nodes when total CPU usage or memory
+consumption reaches 80%. Another option is to scale based on metrics such as the
+number of workspaces or active users. It's important to note that as new users
+onboard, the autoscaling configuration should account for ongoing workspaces.
+
+Scaling down workspace nodes to zero is not recommended, as it will result in
+longer wait times for workspace provisioning by users.
+
+### Database
+
+TODO
+
+PostgreSQL database
+
+measure and document the impact of dbcrypt
+
+###
diff --git a/docs/manifest.json b/docs/manifest.json
@@ -375,10 +375,30 @@
         },
         {
           "title": "Scaling Coder",
-          "description": "Reference architecture and load testing tools",
+          "description": "Learn how to use load testing tools",
           "path": "./admin/scale.md",
           "icon_path": "./images/icons/scale.svg"
         },
+        {
+          "title": "Reference Architectures",
+          "description": "Learn about reference architectures for Coder",
+          "path": "./admin/architectures/index.md",
+          "icon_path": "./images/icons/scale.svg",
+          "children": [
+            {
+              "title": "Up to 1,000 users",
+              "path": "./admin/architectures/1k-users.md"
+            },
+            {
+              "title": "Up to 2,000 users",
+              "path": "./admin/architectures/2k-users.md"
+            },
+            {
+              "title": "Up to 3,000 users",
+              "path": "./admin/architectures/3k-users.md"
+            }
+          ]
+        },
         {
           "title": "External Provisioners",
           "description": "Run provisioners isolated from the Coder server",