diff --git a/docs/admin/scale/gke.md b/docs/admin/scale/gke.md new file mode 100644 index 0000000000000..fe291e10a18bf --- /dev/null +++ b/docs/admin/scale/gke.md @@ -0,0 +1,62 @@ +# Scaling Coder on Google Kubernetes Engine (GKE) + +This is a reference architecture for Coder on [Google Kubernetes Engine](#). We regurily load test these environments with a standard [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) template. + +> Performance and ideal node sizing depends on many factors, including workspace image and the [workspace sizes](https://github.com/coder/coder/issues/3519) you wish to give developers. Use Coder's [scale testing utility](./index.md#scale-testing-utility) to test your own deployment. + +## 50 users + +### Cluster configuration + +- **Autoscaling profile**: `optimize-utilization` + +- **Node pools** + - Default + - **Operating system**: `Ubuntu with containerd` + - **Instance type**: `e2-highcpu-8` + - **Min nodes**: `1` + - **Max nodes**: `4` + +### Coder settings + +- **Replica count**: `1` +- **Provisioner daemons**: `30` +- **Template**: [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) +- **Coder server limits**: + - CPU: `2 cores` + - RAM: `4 GB` +- **Coder server requests**: + - CPU: `2 cores` + - RAM: `4 GB` + +## 100 users + +For deployments with 100+ users, we recommend running the Coder server in a separate node pool via taints, tolerations, and nodeselectors. + +### Cluster configuration + +- **Node pools** + - Coder server + - **Instance type**: `e2-highcpu-4` + - **Operating system**: `Ubuntu with containerd` + - **Autoscaling profile**: `optimize-utilization` + - **Min nodes**: `2` + - **Max nodes**: `4` + - Workspaces + - **Instance type**: `e2-highcpu-16` + - **Node**: `Ubuntu with containerd` + - **Autoscaling profile**: `optimize-utilization` + - **Min nodes**: `3` + - **Max nodes**: `10` + +### Coder settings + +- **Replica count**: `4` +- **Provisioner daemons**: `25` +- **Template**: [kubernetes example](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) +- **Coder server limits**: + - CPU: `4 cores` + - RAM: `8 GB` +- **Coder server requests**: + - CPU: `4 cores` + - RAM: `8 GB` diff --git a/docs/admin/scale/index.md b/docs/admin/scale/index.md new file mode 100644 index 0000000000000..d0dbf758bfd2e --- /dev/null +++ b/docs/admin/scale/index.md @@ -0,0 +1,42 @@ +We regularly scale-test Coder against various reference architectures. Additionally, we provide a [scale testing utility](#scaletest-utility) which can be used in your own environment to give insight on how Coder scales with your deployment's specific templates, images, etc. + +## Reference Architectures + +| Environment | Users | Last tested | Status | +| ------------------------------------------------- | ------------- | ------------ | -------- | +| [Google Kubernetes Engine (GKE)](./gke.md) | 50, 100, 1000 | Nov 29, 2022 | Complete | +| [AWS Elastic Kubernetes Service (EKS)](./eks.md) | 50, 100, 1000 | Nov 29, 2022 | Complete | +| [Google Compute Engine + Docker](./gce-docker.md) | 15, 50 | Nov 29, 2022 | Complete | +| [Google Compute Engine + VMs](./gce-vms.md) | 1000 | Nov 29, 2022 | Complete | + +## Scale testing utility + +Since Coder's performance is highly dependent on the templates and workflows you support, we recommend using our scale testing utility against your own environments. + +The following command will run the same scenario against your own Coder deployment. You can also specify a template name and any parameter values. + +```sh +coder scaletest create-workspaces \ + --count 100 \ + --template "my-custom-template" \ + --parameter image="my-custom-image" \ + --run-command "sleep 2 && echo hello" + +# Run `coder scaletest create-workspaces --help` for all usage +``` + +> To avoid outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment. + +The test does the following: + +- create `n` workspaces +- establish SSH connection to each workspace +- run `sleep 3 && echo hello` on each workspace via the web terminal +- close connections, attempt to delete all workspaces +- return results (e.g. `99 succeeded, 1 failed to connect`) + +Workspace jobs run concurrently, meaning that the test will attempt to connect to each workspace as soon as it is provisioned instead of waiting for all 100 workspaces to create. + +## Troubleshooting + +If a load test fails or if you are experiencing performance issues during day-to-day use, you can leverage Coder's [performance tracing](#) and [prometheus metrics](../prometheus.md) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc. diff --git a/docs/images/icons/scale.svg b/docs/images/icons/scale.svg new file mode 100644 index 0000000000000..3807fa5707081 --- /dev/null +++ b/docs/images/icons/scale.svg @@ -0,0 +1 @@ + diff --git a/docs/manifest.json b/docs/manifest.json index bac69202bbf1a..6937070c744b4 100644 --- a/docs/manifest.json +++ b/docs/manifest.json @@ -253,6 +253,19 @@ "icon_path": "./images/icons/plug.svg", "path": "./admin/automation.md" }, + { + "title": "Scaling Coder", + "description": "Reference architecture and load testing tools", + "icon_path": "./images/icons/scale.svg", + "path": "./admin/scale/index.md", + "children": [ + { + "title": "GKE", + "description": "Learn how to scale Coder on GKE", + "path": "./admin/scale/gke.md" + } + ] + }, { "title": "Audit Logs", "description": "Learn how to use Audit Logs in your Coder deployment",