-
Notifications
You must be signed in to change notification settings - Fork 899
docs: scaling Coder #5550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
docs: scaling Coder #5550
Changes from 17 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
fc8839d
docs: scaling Coder
bpmct a587e45
change icon
bpmct fdacfad
Update docs/admin/scale/index.md
bpmct 8cd6abb
Update docs/admin/scale/index.md
bpmct 7637f86
Update docs/admin/scale/index.md
bpmct c1de2b4
add prom link
bpmct e1b04a1
Merge branch 'bpmct/scale-docs' of github.com:coder/coder into bpmct/…
bpmct 1cf65aa
add plumbing for gke doc
bpmct 933beac
add limits/requests
bpmct b31e813
changes from feedback
bpmct b493aa9
change
bpmct c5f5af4
simplify
bpmct cdff43b
changes from colin feedback
bpmct cf15182
more edits from testing
bpmct 0188d5e
more fixes from Colin feedback
bpmct 8a6d672
clarify providers have different resource requirments
bpmct d00fa59
Merge branch 'main' into scale-docs-mvp
bpmct 72ae527
kylecarbs feedback
bpmct 0106b3b
format
bpmct e607729
explain concurrency
bpmct 03653ad
move doc
bpmct b73dda0
consolidate table
bpmct 7d62ee6
fix broken links
bpmct File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
We regularly scale-test Coder with our [scale testing utility](#scaletest-utility). The same utility can be used in your own environment for insights on how Coder performs with your specific templates, images, etc. | ||
|
||
## General concepts | ||
|
||
Coder runs workspace operations in a queue. The number of concurrent builds will be limited to the number of provisioner daemons across all coderd replicas. | ||
|
||
```text | ||
2 coderd replicas * 30 provisioner daemons = 60 max concurrent workspace builds | ||
``` | ||
bpmct marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
- **coderd**: Coder’s primary service. Learn more about [Coder’s architecture](../../about/architecture.md) | ||
- **coderd replicas**: Replicas (often via Kubernetes) for high availability, this is an [enterprise feature](../../enterprise.md) | ||
- **concurrent workspace builds**: Workspace operations (e.g. create/stop/delete/apply) across all users | ||
- **concurrent connections**: Any connection to a workspace (e.g. SSH, web terminal, `coder_app`) | ||
- **provisioner daemons**: Coder runs one workspace build per provisioner daemon. One coderd replica can host many daemons | ||
- **scaletest**: Our scale-testing utility, built into the `coder` command line. | ||
|
||
## Infrastructure recommendations | ||
|
||
### Concurrent workspace builds | ||
|
||
Workspace builds are CPU-intensive, as it relies on Terraform. Various [Terraform providers](https://registry.terraform.io/browse/providers) have different resource requirements. When tested with our [kubernetes](https://github.com/coder/coder/tree/main/examples/templates/kubernetes) template, `coderd` will consume roughly 8 cores per 30 concurrent workspace builds. For effective provisioning, our helm chart prefers to schedule [one coderd replica per-node](https://github.com/coder/coder/blob/main/helm/values.yaml#L110-L121). | ||
|
||
To support 120 concurrent workspace builds, for example: | ||
|
||
- Create a cluster/nodepool with four 8-core nodes (AWS: `t3.2xlarge` GCP: `e2-highcpu-8`) | ||
bpmct marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Run coderd with 4 replicas, 30 provisioner daemons each. (`CODER_PROVISIONER_DAEMONS=30`) | ||
- Ensure Coder's [PostgreSQL server](../../admin/configure.md#postgresql-database) can use up to 1.5 cores | ||
|
||
## Recent scale tests | ||
|
||
| Environment | Users | Concurrent builds | Concurrent connections (SSH) | Concurrent connections (web) | Last tested | | ||
bpmct marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ------------------ | ----- | ----------------- | ---------------------------- | ---------------------------- | ------------ | | ||
bpmct marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Kubernetes (GKE) | 1200 | 120 | 10,000 | 10,000 | Jan 10, 2022 | | ||
| Docker (Single VM) | 500 | 50 | 10,000 | 10,000 | Dec 20, 2022 | | ||
|
||
## Scale testing utility | ||
|
||
Since Coder's performance is highly dependent on the templates and workflows you support, we recommend using our scale testing utility against your own environments. | ||
|
||
The following command will run our scale test against your own Coder deployment. You can also specify a template name and any parameter values. | ||
|
||
```sh | ||
coder scaletest create-workspaces \ | ||
--count 1000 \ | ||
--template "kubernetes" \ | ||
--concurrency 0 \ | ||
bpmct marked this conversation as resolved.
Show resolved
Hide resolved
|
||
--cleanup-concurrency 0 \ | ||
--parameter "home_disk_size=10" \ | ||
--run-command "sleep 2 && echo hello" | ||
|
||
# Run `coder scaletest create-workspaces --help` for all usage | ||
``` | ||
|
||
> To avoid potential outages and orphaned resources, we recommend running scale tests on a secondary "staging" environment. | ||
|
||
The test does the following: | ||
|
||
- create `n` workspaces | ||
- establish SSH connection to each workspace | ||
- run `sleep 3 && echo hello` on each workspace via the web terminal | ||
- close connections, attempt to delete all workspaces | ||
- return results (e.g. `99 succeeded, 1 failed to connect`) | ||
bpmct marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Workspace jobs run concurrently, meaning that the test will attempt to connect to each workspace as soon as it is provisioned instead of waiting for all 100 workspaces to create. | ||
|
||
## Troubleshooting | ||
|
||
If a load test fails or if you are experiencing performance issues during day-to-day use, you can leverage Coder's [prometheus metrics](../prometheus.md) to identify bottlenecks during scale tests. Additionally, you can use your existing cloud monitoring stack to measure load, view server logs, etc. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -248,6 +248,12 @@ | |
"path": "./admin/automation.md", | ||
"icon_path": "./images/icons/plug.svg" | ||
}, | ||
{ | ||
"title": "Scaling Coder", | ||
"description": "Reference architecture and load testing tools", | ||
"icon_path": "./images/icons/scale.svg", | ||
"path": "./admin/scale/index.md" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: do we expect more pages or would There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Down the road, we may provide reference architectures! I'll just make it |
||
}, | ||
{ | ||
"title": "Audit Logs", | ||
"description": "Learn how to use Audit Logs in your Coder deployment", | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.