From 70e8fb64140a900a36da1660b396d486084f8000 Mon Sep 17 00:00:00 2001 From: Cian Johnston Date: Thu, 3 Aug 2023 16:28:00 +0100 Subject: [PATCH 1/5] chore(docs): update scaling docs --- docs/admin/scale.md | 87 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 76 insertions(+), 11 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index a3e6e490ea52a..7f357668a1ed9 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -1,4 +1,4 @@ -We scale-test Coder with the [same utility](#scaletest-utility) that can be used in your environment for insights into how Coder scales with your infrastructure. +We scale-test Coder with [a built-in utility](#scaletest-utility) that can be used in your environment for insights into how Coder scales with your infrastructure. ## General concepts @@ -17,11 +17,41 @@ Coder runs workspace operations in a queue. The number of concurrent builds will ## Infrastructure recommendations +> Note: The below are guidelines for planning your infrastructure. Your mileage may vary depending on your templates, workflows, and users. + +When planning your infrastructure, we recommend you consider the following: + +1. CPU and memory requirements for `coderd`. We recommend allocating 1 CPU core and 2 GB RAM per `coderd` replica at minimum. See [Concurrent users](#concurrent-users) for more details. +1. CPU and memory requirements for external provisioners, if applicable. We recommend allocating 1 CPU core and 1 GB RAM per 5 concurrent workspace builds to external provisioners. Note that this may vary depending on the template used. See [Concurrent workspace builds](#concurrent-workspace-builds) for more details. +1. CPU and memory requirements for the database used by `coderd`. We recommend allocating an additional 1 CPU core to the database used by Coder for every 1000 active users. +1. CPU and memory requirements for workspaces created by Coder. This will vary depending on users' needs. However, the Coder agent itself requires at minimum 0.1 CPU cores and 256 MB to run inside a workspace. + + +### Concurrent users + +We recommend allocating 2 CPU cores and 4 GB RAM per `coderd` replica per 1000 active users. +We also recommend allocating an additional 1 CPU core to the database used by Coder for every 1000 active users. +Inactive users do not consume Coder resources, although workspaces configured to auto-start will consume resources when they are built. + +Users' primary mode of accessing Coder will also affect resource requirements. +If users will be accessing workspaces primarily via Coder's HTTP interface, we recommend doubling the number of cores and RAM allocated per user. +For example, if you expect 1000 users accessing workspaces via the web, we recommend allocating 4 CPU cores and 8 GB RAM. + +Users accessing workspaces via SSH will consume fewer resources, as SSH connections are not proxied through Coder. + ### Concurrent workspace builds -Workspace builds are CPU-intensive, as it relies on Terraform. Various [Terraform providers](https://registry.terraform.io/browse/providers) have different resource requirements. When tested with our [kubernetes](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) template, `coderd` will consume roughly 8 cores per 30 concurrent workspace builds. For effective provisioning, our helm chart prefers to schedule [one coderd replica per-node](https://github.com/coder/coder/blob/main/helm/values.yaml#L110-L121). +Workspace builds are CPU-intensive, as it relies on Terraform. Various [Terraform providers](https://registry.terraform.io/browse/providers) have different resource requirements. +When tested with our [kubernetes](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) template, `coderd` will consume roughly 0.25 cores per concurrent workspace build. +For effective provisioning, our helm chart prefers to schedule [one coderd replica per-node](https://github.com/coder/coder/blob/main/helm/values.yaml#L110-L121). + +We recommend: -To support 120 concurrent workspace builds, for example: +- Running `coderd` on a dedicated set of nodes. This will prevent other workloads from interfering with workspace builds. You can use [node selectors](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector), or [taints and tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) to achieve this. +- Disabling autoscaling for `coderd` nodes. Autoscaling can cause interruptions for users, see [Autoscaling](#autoscaling) for more details. +- (Enterprise-only) Running external provisioners instead of Coder's built-in provisioners (`CODER_PROVISIONER_DAEMONS=0`) will separate the load caused by workspace provisioning on the `coderd` nodes. For more details, see [External provisioners](../admin/provisioners.md#running-external-provisioners). + +For example, to support 120 concurrent workspace builds: - Create a cluster/nodepool with 4 nodes, 8-core each (AWS: `t3.2xlarge` GCP: `e2-highcpu-8`) - Run coderd with 4 replicas, 30 provisioner daemons each. (`CODER_PROVISIONER_DAEMONS=30`) @@ -29,19 +59,29 @@ To support 120 concurrent workspace builds, for example: ## Recent scale tests -| Environment | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested | -| ------------------ | ----- | ----------------- | ------------------------------------- | ------------- | ------------ | -| Kubernetes (GKE) | 1200 | 120 | 10,000 | `v0.14.2` | Jan 10, 2022 | -| Docker (Single VM) | 500 | 50 | 10,000 | `v0.13.4` | Dec 20, 2022 | +| Environment | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested | +| ------------------ | ---------------- | ----- | ----------------- | ------------------------------------- | ------------- | ------------ | +| Kubernetes (GKE) | Embedded | 1200 | 120 | 10,000 | `v0.14.2` | Jan 10, 2022 | +| Docker (Single VM) | Embedded | 500 | 50 | 10,000 | `v0.13.4` | Dec 20, 2022 | +| Kubernetes (GKE) | db-f1-micro | 100 | 3 | 100 (40KB/s web terminal each) | `v0.24.1` | Jun 26, 2023 | +| Kubernetes (GKE) | db-n1-standard-1 | 1500 | 20 | 1,500 | `v0.24.1` | Jun 27, 2023 | +| Kubernetes (GKE) | db-n1-standard-1 | 500 | 20 | 500 (640KB/s SSH each) | `v0.27.2` | Jul 27, 2023 | + ## Scale testing utility -Since Coder's performance is highly dependent on the templates and workflows you support, we recommend using our scale testing utility against your own environments. +Since Coder's performance is highly dependent on the templates and workflows you support, you may wish to use our internal scale testing utility against your own environments. + +> Note: This utility is intended for internal use only. It is not subject to any compatibility guarantees, and may cause interruptions for your users. +> To avoid potential outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment. +> Run it against a production environment at your own risk. + +### Workspace Creation The following command will run our scale test against your own Coder deployment. You can also specify a template name and any parameter values. ```sh -coder scaletest create-workspaces \ +coder exp scaletest create-workspaces \ --count 1000 \ --template "kubernetes" \ --concurrency 0 \ @@ -49,10 +89,9 @@ coder scaletest create-workspaces \ --parameter "home_disk_size=10" \ --run-command "sleep 2 && echo hello" -# Run `coder scaletest create-workspaces --help` for all usage +# Run `coder exp scaletest create-workspaces --help` for all usage ``` -> To avoid potential outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment. The test does the following: @@ -64,6 +103,32 @@ The test does the following: Concurrency is configurable. `concurrency 0` means the scaletest test will attempt to create & connect to all workspaces immediately. +If you wish to leave the workspaces running for a period of time, you can specify `--no-cleanup` to skip the cleanup step. +You are responsible for deleting these resources later. + +### Traffic Generation + +Given an existing set of workspaces created previously with `create-workspaces`, the following command will generate traffic similar to that used by Coder's web terminal against those workspaces. + +```sh +coder exp scaletest workspace-traffic \ + --byes-per-tick 128 \ + --tick-interval 100ms \ + --concurrency 0 +``` + +To generate SSH traffic, add the `--ssh` flag. + +### Cleanup + +The scaletest utility will attempt to clean up all workspaces it creates. If you wish to clean up all workspaces, you can run the following command: + +```sh +coder exp scaletest cleanup +``` + +This will delete all workspaces and users with the prefix `scaletest-`. + ## Autoscaling We generally do not recommend using an autoscaler that modifies the number of coderd replicas. In particular, scale From fa0b83499b3242bc4f8c62e982968ed5c55915e9 Mon Sep 17 00:00:00 2001 From: Cian Johnston Date: Thu, 3 Aug 2023 17:00:31 +0100 Subject: [PATCH 2/5] fixup! chore(docs): update scaling docs --- docs/admin/scale.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index 7f357668a1ed9..ae5c5e01ecd37 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -26,7 +26,6 @@ When planning your infrastructure, we recommend you consider the following: 1. CPU and memory requirements for the database used by `coderd`. We recommend allocating an additional 1 CPU core to the database used by Coder for every 1000 active users. 1. CPU and memory requirements for workspaces created by Coder. This will vary depending on users' needs. However, the Coder agent itself requires at minimum 0.1 CPU cores and 256 MB to run inside a workspace. - ### Concurrent users We recommend allocating 2 CPU cores and 4 GB RAM per `coderd` replica per 1000 active users. @@ -67,7 +66,6 @@ For example, to support 120 concurrent workspace builds: | Kubernetes (GKE) | db-n1-standard-1 | 1500 | 20 | 1,500 | `v0.24.1` | Jun 27, 2023 | | Kubernetes (GKE) | db-n1-standard-1 | 500 | 20 | 500 (640KB/s SSH each) | `v0.27.2` | Jul 27, 2023 | - ## Scale testing utility Since Coder's performance is highly dependent on the templates and workflows you support, you may wish to use our internal scale testing utility against your own environments. @@ -92,7 +90,6 @@ coder exp scaletest create-workspaces \ # Run `coder exp scaletest create-workspaces --help` for all usage ``` - The test does the following: 1. create `1000` workspaces From 0950bb3e42967a1e00bea9229bc2a3a92ebc0ac5 Mon Sep 17 00:00:00 2001 From: Cian Johnston Date: Thu, 3 Aug 2023 17:08:10 +0100 Subject: [PATCH 3/5] Update docs/admin/scale.md --- docs/admin/scale.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index ae5c5e01ecd37..b2e74e351c6b8 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -105,7 +105,7 @@ You are responsible for deleting these resources later. ### Traffic Generation -Given an existing set of workspaces created previously with `create-workspaces`, the following command will generate traffic similar to that used by Coder's web terminal against those workspaces. +Given an existing set of workspaces created previously with `create-workspaces`, the following command will generate traffic similar to that of Coder's web terminal against those workspaces. ```sh coder exp scaletest workspace-traffic \ From 30131e0073c087e72e329f824b9cf8cd0d7962b3 Mon Sep 17 00:00:00 2001 From: Cian Johnston Date: Thu, 3 Aug 2023 17:39:58 +0100 Subject: [PATCH 4/5] update with compute resources provided to coder --- docs/admin/scale.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index b2e74e351c6b8..37993eb03ad90 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -58,13 +58,13 @@ For example, to support 120 concurrent workspace builds: ## Recent scale tests -| Environment | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested | -| ------------------ | ---------------- | ----- | ----------------- | ------------------------------------- | ------------- | ------------ | -| Kubernetes (GKE) | Embedded | 1200 | 120 | 10,000 | `v0.14.2` | Jan 10, 2022 | -| Docker (Single VM) | Embedded | 500 | 50 | 10,000 | `v0.13.4` | Dec 20, 2022 | -| Kubernetes (GKE) | db-f1-micro | 100 | 3 | 100 (40KB/s web terminal each) | `v0.24.1` | Jun 26, 2023 | -| Kubernetes (GKE) | db-n1-standard-1 | 1500 | 20 | 1,500 | `v0.24.1` | Jun 27, 2023 | -| Kubernetes (GKE) | db-n1-standard-1 | 500 | 20 | 500 (640KB/s SSH each) | `v0.27.2` | Jul 27, 2023 | +| Environment | Coder CPU | Coder RAM | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested | +| ------------------ | --------- | --------- | ---------------- | ----- | ----------------- | ------------------------------------- | ------------- | ------------ | +| Kubernetes (GKE) | 16 cores | 64 GB | db-custom-1-3840 | 1200 | 120 | 10,000 | `v0.14.2` | Jan 10, 2022 | +| Docker (Single VM) | 16 cores | 64 GB | Embedded | 500 | 50 | 10,000 | `v0.13.4` | Dec 20, 2022 | +| Kubernetes (GKE) | 3 cores | 12 GB | db-f1-micro | 200 | 3 | 200 (40KB/s web terminal each) | `v0.24.1` | Jun 26, 2023 | +| Kubernetes (GKE) | 4 cores | 8 GB | db-custom-1-3840 | 1500 | 20 | 1,500 | `v0.24.1` | Jun 27, 2023 | +| Kubernetes (GKE) | 2 cores | 4 GB | db-custom-1-3840 | 500 | 20 | 500 (640KB/s SSH each) | `v0.27.2` | Jul 27, 2023 | ## Scale testing utility From 8aa52eae81ebc3bfb56a2556117184335f1f1cf5 Mon Sep 17 00:00:00 2001 From: Cian Johnston Date: Thu, 3 Aug 2023 21:47:29 +0100 Subject: [PATCH 5/5] address PR comments --- docs/admin/scale.md | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index 37993eb03ad90..999a30aeae44a 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -22,7 +22,7 @@ Coder runs workspace operations in a queue. The number of concurrent builds will When planning your infrastructure, we recommend you consider the following: 1. CPU and memory requirements for `coderd`. We recommend allocating 1 CPU core and 2 GB RAM per `coderd` replica at minimum. See [Concurrent users](#concurrent-users) for more details. -1. CPU and memory requirements for external provisioners, if applicable. We recommend allocating 1 CPU core and 1 GB RAM per 5 concurrent workspace builds to external provisioners. Note that this may vary depending on the template used. See [Concurrent workspace builds](#concurrent-workspace-builds) for more details. +1. CPU and memory requirements for [external provisioners](../admin/provisioners.md#running-external-provisioners), if required. We recommend allocating 1 CPU core and 1 GB RAM per 5 concurrent workspace builds to external provisioners. Note that this may vary depending on the template used. See [Concurrent workspace builds](#concurrent-workspace-builds) for more details. By default, `coderd` runs 3 integrated provisioners. 1. CPU and memory requirements for the database used by `coderd`. We recommend allocating an additional 1 CPU core to the database used by Coder for every 1000 active users. 1. CPU and memory requirements for workspaces created by Coder. This will vary depending on users' needs. However, the Coder agent itself requires at minimum 0.1 CPU cores and 256 MB to run inside a workspace. @@ -49,6 +49,7 @@ We recommend: - Running `coderd` on a dedicated set of nodes. This will prevent other workloads from interfering with workspace builds. You can use [node selectors](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector), or [taints and tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) to achieve this. - Disabling autoscaling for `coderd` nodes. Autoscaling can cause interruptions for users, see [Autoscaling](#autoscaling) for more details. - (Enterprise-only) Running external provisioners instead of Coder's built-in provisioners (`CODER_PROVISIONER_DAEMONS=0`) will separate the load caused by workspace provisioning on the `coderd` nodes. For more details, see [External provisioners](../admin/provisioners.md#running-external-provisioners). +- Alternatively, if increasing the number of integrated provisioner daemons in `coderd` (`CODER_PROVISIONER_DAEMONS>3`), allocate additional resources to `coderd` to compensate (approx. 0.25 cores and 256 MB per provisioner daemon). For example, to support 120 concurrent workspace builds: @@ -58,13 +59,13 @@ For example, to support 120 concurrent workspace builds: ## Recent scale tests -| Environment | Coder CPU | Coder RAM | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested | -| ------------------ | --------- | --------- | ---------------- | ----- | ----------------- | ------------------------------------- | ------------- | ------------ | -| Kubernetes (GKE) | 16 cores | 64 GB | db-custom-1-3840 | 1200 | 120 | 10,000 | `v0.14.2` | Jan 10, 2022 | -| Docker (Single VM) | 16 cores | 64 GB | Embedded | 500 | 50 | 10,000 | `v0.13.4` | Dec 20, 2022 | -| Kubernetes (GKE) | 3 cores | 12 GB | db-f1-micro | 200 | 3 | 200 (40KB/s web terminal each) | `v0.24.1` | Jun 26, 2023 | -| Kubernetes (GKE) | 4 cores | 8 GB | db-custom-1-3840 | 1500 | 20 | 1,500 | `v0.24.1` | Jun 27, 2023 | -| Kubernetes (GKE) | 2 cores | 4 GB | db-custom-1-3840 | 500 | 20 | 500 (640KB/s SSH each) | `v0.27.2` | Jul 27, 2023 | +> Note: the below information is for reference purposes only, and are not intended to be used as guidelines for infrastructure sizing. + +| Environment | Coder CPU | Coder RAM | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested | +| ---------------- | --------- | --------- | ---------------- | ----- | ----------------- | ------------------------------------- | ------------- | ------------ | +| Kubernetes (GKE) | 3 cores | 12 GB | db-f1-micro | 200 | 3 | 200 (40KB/s web terminal each) | `v0.24.1` | Jun 26, 2023 | +| Kubernetes (GKE) | 4 cores | 8 GB | db-custom-1-3840 | 1500 | 20 | 1,500 | `v0.24.1` | Jun 27, 2023 | +| Kubernetes (GKE) | 2 cores | 4 GB | db-custom-1-3840 | 500 | 20 | 500 (640KB/s SSH each) | `v0.27.2` | Jul 27, 2023 | ## Scale testing utility