Skip to content

docs: reorganize scaling docs #13574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/admin/architectures/validated-arch.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

Many customers operate Coder in complex organizational environments, consisting
of multiple business units, agencies, and/or subsidiaries. This can lead to
numerous Coder deployments, caused by discrepancies in regulatory compliance,
data sovereignty, and level of funding across groups. The Coder Validated
numerous Coder deployments, due to discrepancies in regulatory compliance, data
sovereignty, and level of funding across groups. The Coder Validated
Architecture (CVA) prescribes a Kubernetes-based deployment approach, enabling
your organization to deploy a stable Coder instance that is easier to maintain
and troubleshoot.
Expand Down
6 changes: 3 additions & 3 deletions docs/admin/provisioners.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ sometimes benefits to running external provisioner daemons:

- **Reduce server load**: External provisioners reduce load and build queue
times from the Coder server. See
[Scaling Coder](./scale.md#concurrent-workspace-builds) for more details.
[Scaling Coder](scaling/scale-utility.md#recent-scale-tests) for more details.

Each provisioner can run a single
[concurrent workspace build](./scale.md#concurrent-workspace-builds). For
example, running 30 provisioner containers will allow 30 users to start
[concurrent workspace build](scaling/scale-testing.md#control-plane-provisionerd).
For example, running 30 provisioner containers will allow 30 users to start
workspaces at the same time.

Provisioners are started with the
Expand Down
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
## Scale Testing
# Scale Testing

Scaling Coder involves planning and testing to ensure it can handle more load
without compromising service. This process encompasses infrastructure setup,
traffic projections, and aggressive testing to identify and mitigate potential
bottlenecks.

A dedicated Kubernetes cluster for Coder is Kubernetes cluster specifically
configured to host and manage Coder workloads. Kubernetes provides container
orchestration capabilities, allowing Coder to efficiently deploy, scale, and
manage workspaces across a distributed infrastructure. This ensures high
availability, fault tolerance, and scalability for Coder deployments. Coder is
deployed on this cluster using the
A dedicated Kubernetes cluster for Coder is recommended to configure, host and
manage Coder workloads. Kubernetes provides container orchestration
capabilities, allowing Coder to efficiently deploy, scale, and manage workspaces
across a distributed infrastructure. This ensures high availability, fault
tolerance, and scalability for Coder deployments. Coder is deployed on this
cluster using the
[Helm chart](../../install/kubernetes.md#install-coder-with-helm).

## Methodology

Our scale tests include the following stages:

1. Prepare environment: create expected users and provision workspaces.
Expand All @@ -33,7 +35,7 @@ Our scale tests include the following stages:

6. Cleanup: delete workspaces and users created in step 1.

### Infrastructure and setup requirements
## Infrastructure and setup requirements

The scale tests runner can distribute the workload to overlap single scenarios
based on the workflow configuration:
Expand All @@ -60,7 +62,7 @@ The test is deemed successful if users did not experience interruptions in their
workflows, `coderd` did not crash or require restarts, and no other internal
errors were observed.

### Traffic Projections
## Traffic Projections

In our scale tests, we simulate activity from 2000 users, 2000 workspaces, and
2000 agents, with two items of workspace agent metadata being sent every 10
Expand Down Expand Up @@ -88,11 +90,11 @@ Database:

## Available reference architectures

[Up to 1,000 users](1k-users.md)
[Up to 1,000 users](../architectures/1k-users.md)

[Up to 2,000 users](2k-users.md)
[Up to 2,000 users](../architectures/2k-users.md)

[Up to 3,000 users](3k-users.md)
[Up to 3,000 users](../architectures/3k-users.md)

## Hardware recommendation

Expand Down Expand Up @@ -151,8 +153,8 @@ with a deployment of Coder [workspace proxies](../workspace-proxies.md).
**Node Autoscaling**

We recommend disabling the autoscaling for `coderd` nodes. Autoscaling can cause
interruptions for user connections, see [Autoscaling](../scale.md#autoscaling)
for more details.
interruptions for user connections, see
[Autoscaling](scale-utility.md#autoscaling) for more details.

### Control plane: Workspace Proxies

Expand Down
10 changes: 6 additions & 4 deletions docs/admin/scale.md → docs/admin/scaling/scale-utility.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,19 @@
# Scale Tests and Utilities

We scale-test Coder with [a built-in utility](#scale-testing-utility) that can
be used in your environment for insights into how Coder scales with your
infrastructure. For scale-testing Kubernetes clusters we recommend to install
and use the dedicated Coder template,
[scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner).

Learn more about [Coder’s architecture](architectures/architecture.md) and our
[scale-testing methodology](architectures/scale-testing.md).
Learn more about [Coder’s architecture](../architectures/architecture.md) and
our [scale-testing methodology](scale-testing.md).

## Recent scale tests

> Note: the below information is for reference purposes only, and are not
> intended to be used as guidelines for infrastructure sizing. Review the
> [Reference Architectures](architectures/validated-arch.md#node-sizing) for
> [Reference Architectures](../architectures/validated-arch.md#node-sizing) for
> hardware sizing recommendations.

| Environment | Coder CPU | Coder RAM | Coder Replicas | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested |
Expand Down Expand Up @@ -247,6 +249,6 @@ an annotation on the coderd deployment.
## Troubleshooting

If a load test fails or if you are experiencing performance issues during
day-to-day use, you can leverage Coder's [Prometheus metrics](./prometheus.md)
day-to-day use, you can leverage Coder's [Prometheus metrics](../prometheus.md)
to identify bottlenecks during scale tests. Additionally, you can use your
existing cloud monitoring stack to measure load, view server logs, etc.
52 changes: 27 additions & 25 deletions docs/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,30 @@
"path": "./admin/README.md",
"icon_path": "./images/icons/wrench.svg",
"children": [
{
"title": "Architecture",
"description": "Learn about validated and reference architectures for Coder",
"path": "./admin/architectures/architecture.md",
"icon_path": "./images/icons/container.svg",
"children": [
{
"title": "Validated Architecture",
"path": "./admin/architectures/validated-arch.md"
},
{
"title": "Up to 1,000 users",
"path": "./admin/architectures/1k-users.md"
},
{
"title": "Up to 2,000 users",
"path": "./admin/architectures/2k-users.md"
},
{
"title": "Up to 3,000 users",
"path": "./admin/architectures/3k-users.md"
}
]
},
{
"title": "Authentication",
"description": "Learn how to set up authentication using GitHub or OpenID Connect",
Expand Down Expand Up @@ -389,34 +413,12 @@
{
"title": "Scaling Coder",
"description": "Learn how to use load testing tools",
"path": "./admin/scale.md",
"icon_path": "./images/icons/scale.svg"
},
{
"title": "Architecture",
"description": "Learn about validated and reference architectures for Coder",
"path": "./admin/architectures/architecture.md",
"path": "./admin/scaling/scale-testing.md",
"icon_path": "./images/icons/scale.svg",
"children": [
{
"title": "Validated Architecture",
"path": "./admin/architectures/validated-arch.md"
},
{
"title": "Scale Testing",
"path": "./admin/architectures/scale-testing.md"
},
{
"title": "Up to 1,000 users",
"path": "./admin/architectures/1k-users.md"
},
{
"title": "Up to 2,000 users",
"path": "./admin/architectures/2k-users.md"
},
{
"title": "Up to 3,000 users",
"path": "./admin/architectures/3k-users.md"
"title": "Scaling Utility",
"path": "./admin/scaling/scale-utility.md"
}
]
},
Expand Down
2 changes: 1 addition & 1 deletion docs/platforms/aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ We recommend keeping the default instance type (`t2.xlarge`, 4 cores and 16 GB
memory) if you plan on provisioning Docker containers as workspaces on this EC2
instance. Keep in mind this platforms is intended for proof-of-concept
deployments and you should adjust your infrastructure when preparing for
production use. See: [Scaling Coder](../admin/scale.md)
production use. See: [Scaling Coder](../admin/scaling/scale-testing.md)

Be sure to add a keypair so that you can connect over SSH to further
[configure Coder](../admin/configure.md).
Expand Down
2 changes: 1 addition & 1 deletion docs/platforms/gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ We recommend keeping the default instance type (`e2-standard-4`, 4 cores and 16
GB memory) if you plan on provisioning Docker containers as workspaces on this
VM instance. Keep in mind this platforms is intended for proof-of-concept
deployments and you should adjust your infrastructure when preparing for
production use. See: [Scaling Coder](../admin/scale.md)
production use. See: [Scaling Coder](../admin/scaling/scale-testing.md)

<video autoplay playsinline loop>
<source src="https://github.com/coder/coder/blob/main/docs/images/platforms/gcp/launch.mp4?raw=true" type="video/mp4">
Expand Down
Loading