coder · bpmct · Nov 30, 2022 · Nov 30, 2022 · Nov 30, 2022 · Nov 30, 2022
diff --git a/docs/admin/scale/gke.md b/docs/admin/scale/gke.md
@@ -0,0 +1,62 @@
+# Scaling Coder on Google Kubernetes Engine (GKE)
+
+This is a reference architecture for Coder on [Google Kubernetes Engine](#). We regurily load test these environments with a standard [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) template.
+
+> Performance and ideal node sizing depends on many factors, including workspace image and the [workspace sizes](https://github.com/coder/coder/issues/3519) you wish to give developers. Use Coder's [scale testing utility](./index.md#scale-testing-utility) to test your own deployment.
+
+## 50 users
+
+### Cluster configuration
+
+- **Autoscaling profile**: `optimize-utilization`
+
+- **Node pools**
+  - Default
+    - **Operating system**: `Ubuntu with containerd`
+    - **Instance type**: `e2-highcpu-8`
+    - **Min nodes**: `1`
+    - **Max nodes**: `4`
+
+### Coder settings
+
+- **Replica count**: `1`
+- **Provisioner daemons**: `30`
+- **Template**: [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes)
+- **Coder server limits**:
+  - CPU: `2 cores`
+  - RAM: `4 GB`
+- **Coder server requests**:
+  - CPU: `2 cores`
+  - RAM: `4 GB`
+
+## 100 users
+
+For deployments with 100+ users, we recommend running the Coder server in a separate node pool via taints, tolerations, and nodeselectors.
+
+### Cluster configuration
+
+- **Node pools**
+  - Coder server
+    - **Instance type**: `e2-highcpu-4`
+    - **Operating system**: `Ubuntu with containerd`
+    - **Autoscaling profile**: `optimize-utilization`
+    - **Min nodes**: `2`
+    - **Max nodes**: `4`
+  - Workspaces
+    - **Instance type**: `e2-highcpu-16`
+    - **Node**: `Ubuntu with containerd`
+    - **Autoscaling profile**: `optimize-utilization`
+    - **Min nodes**: `3`
+    - **Max nodes**: `10`
+
+### Coder settings
+
+- **Replica count**: `4`
+- **Provisioner daemons**: `25`
+- **Template**: [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes)
+- **Coder server limits**:
+  - CPU: `4 cores`
+  - RAM: `8 GB`
+- **Coder server requests**:
+  - CPU: `4 cores`
+  - RAM: `8 GB`
diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md
@@ -0,0 +1,42 @@
+We regularly scale-test Coder against various reference architectures. Additionally, we provide a [scale testing utility](#scaletest-utility) which can be used in your own environment to give insight on how Coder scales with your deployment's specific templates, images, etc.
+
+## Reference Architectures
+
+| Environment                                       | Users         | Last tested  | Status   |
+| ------------------------------------------------- | ------------- | ------------ | -------- |
+| [Google Kubernetes Engine (GKE)](./gke.md)        | 50, 100, 1000 | Nov 29, 2022 | Complete |
+| [AWS Elastic Kubernetes Service (EKS)](./eks.md)  | 50, 100, 1000 | Nov 29, 2022 | Complete |
+| [Google Compute Engine + Docker](./gce-docker.md) | 15, 50        | Nov 29, 2022 | Complete |
+| [Google Compute Engine + VMs](./gce-vms.md)       | 1000          | Nov 29, 2022 | Complete |
+
+## Scale testing utility
+
+Since Coder's performance is highly dependent on the templates and workflows you support, we recommend using our scale testing utility against your own environments.
+
+The following command will run the same scenario against your own Coder deployment. You can also specify a template name and any parameter values.
+
+```sh
+coder scaletest create-workspaces \
+    --count 100 \
+    --template "my-custom-template" \
+    --parameter image="my-custom-image" \
+    --run-command "sleep 2 && echo hello"
+
+# Run `coder scaletest create-workspaces --help` for all usage
+```
+
+> To avoid outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment.
+
+The test does the following:
+
+- create `n` workspaces
+- establish SSH connection to each workspace
+- run `sleep 3 && echo hello` on each workspace via the web terminal
+- close connections, attempt to delete all workspaces
+- return results (e.g. `99 succeeded, 1 failed to connect`)
+
+Workspace jobs run concurrently, meaning that the test will attempt to connect to each workspace as soon as it is provisioned instead of waiting for all 100 workspaces to create.
+
+## Troubleshooting
+
+If a load test fails or if you are experiencing performance issues during day-to-day use, you can leverage Coder's [performance tracing](#) and [prometheus metrics](../prometheus.md) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc.
diff --git a/docs/images/icons/scale.svg b/docs/images/icons/scale.svg
diff --git a/docs/manifest.json b/docs/manifest.json
@@ -253,6 +253,19 @@
           "icon_path": "./images/icons/plug.svg",
           "path": "./admin/automation.md"
         },
+        {
+          "title": "Scaling Coder",
+          "description": "Reference architecture and load testing tools",
+          "icon_path": "./images/icons/scale.svg",
+          "path": "./admin/scale/index.md",
+          "children": [
+            {
+              "title": "GKE",
+              "description": "Learn how to scale Coder on GKE",
+              "path": "./admin/scale/gke.md"
+            }
+          ]
+        },
         {
           "title": "Audit Logs",
           "description": "Learn how to use Audit Logs in your Coder deployment",