|
| 1 | +We scale-test Coder with the [same utility](#scaletest-utility) that can be used in your environment for insights into how Coder scales with your infrastructure. |
| 2 | + |
| 3 | +## General concepts |
| 4 | + |
| 5 | +Coder runs workspace operations in a queue. The number of concurrent builds will be limited to the number of provisioner daemons across all coderd replicas. |
| 6 | + |
| 7 | +- **coderd**: Coder’s primary service. Learn more about [Coder’s architecture](../about/architecture.md) |
| 8 | +- **coderd replicas**: Replicas (often via Kubernetes) for high availability, this is an [enterprise feature](../enterprise.md) |
| 9 | +- **concurrent workspace builds**: Workspace operations (e.g. create/stop/delete/apply) across all users |
| 10 | +- **concurrent connections**: Any connection to a workspace (e.g. SSH, web terminal, `coder_app`) |
| 11 | +- **provisioner daemons**: Coder runs one workspace build per provisioner daemon. One coderd replica can host many daemons |
| 12 | +- **scaletest**: Our scale-testing utility, built into the `coder` command line. |
| 13 | + |
| 14 | +```text |
| 15 | +2 coderd replicas * 30 provisioner daemons = 60 max concurrent workspace builds |
| 16 | +``` |
| 17 | + |
| 18 | +## Infrastructure recommendations |
| 19 | + |
| 20 | +### Concurrent workspace builds |
| 21 | + |
| 22 | +Workspace builds are CPU-intensive, as it relies on Terraform. Various [Terraform providers](https://registry.terraform.io/browse/providers) have different resource requirements. When tested with our [kubernetes](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) template, `coderd` will consume roughly 8 cores per 30 concurrent workspace builds. For effective provisioning, our helm chart prefers to schedule [one coderd replica per-node](https://github.com/coder/coder/blob/main/helm/values.yaml#L110-L121). |
| 23 | + |
| 24 | +To support 120 concurrent workspace builds, for example: |
| 25 | + |
| 26 | +- Create a cluster/nodepool with 4 nodes, 8-core each (AWS: `t3.2xlarge` GCP: `e2-highcpu-8`) |
| 27 | +- Run coderd with 4 replicas, 30 provisioner daemons each. (`CODER_PROVISIONER_DAEMONS=30`) |
| 28 | +- Ensure Coder's [PostgreSQL server](./configure.md#postgresql-database) can use up to 1.5 cores |
| 29 | + |
| 30 | +## Recent scale tests |
| 31 | + |
| 32 | +| Environment | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested | |
| 33 | +| ------------------ | ----- | ----------------- | ------------------------------------- | ------------- | ------------ | |
| 34 | +| Kubernetes (GKE) | 1200 | 120 | 10,000 | `v0.14.2` | Jan 10, 2022 | |
| 35 | +| Docker (Single VM) | 500 | 50 | 10,000 | `v0.13.4` | Dec 20, 2022 | |
| 36 | + |
| 37 | +## Scale testing utility |
| 38 | + |
| 39 | +Since Coder's performance is highly dependent on the templates and workflows you support, we recommend using our scale testing utility against your own environments. |
| 40 | + |
| 41 | +The following command will run our scale test against your own Coder deployment. You can also specify a template name and any parameter values. |
| 42 | + |
| 43 | +```sh |
| 44 | +coder scaletest create-workspaces \ |
| 45 | + --count 1000 \ |
| 46 | + --template "kubernetes" \ |
| 47 | + --concurrency 0 \ |
| 48 | + --cleanup-concurrency 0 \ |
| 49 | + --parameter "home_disk_size=10" \ |
| 50 | + --run-command "sleep 2 && echo hello" |
| 51 | + |
| 52 | +# Run `coder scaletest create-workspaces --help` for all usage |
| 53 | +``` |
| 54 | + |
| 55 | +> To avoid potential outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment. |
| 56 | +
|
| 57 | +The test does the following: |
| 58 | + |
| 59 | +1. create `1000` workspaces |
| 60 | +1. establish SSH connection to each workspace |
| 61 | +1. run `sleep 3 && echo hello` on each workspace via the web terminal |
| 62 | +1. close connections, attempt to delete all workspaces |
| 63 | +1. return results (e.g. `998 succeeded, 2 failed to connect`) |
| 64 | + |
| 65 | +Concurrency is configurable. `concurrency 0` means the scaletest test will attempt to create & connect to all workspaces immediately. |
| 66 | + |
| 67 | +## Troubleshooting |
| 68 | + |
| 69 | +If a load test fails or if you are experiencing performance issues during day-to-day use, you can leverage Coder's [prometheus metrics](./prometheus.md) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc. |
0 commit comments