docs: provide hardware recommendations for reference architectures #12534

mtojek · 2024-03-11T16:01:54Z

Related: #12426

This PR describes hardware recommendations for Coder references architectures including node sizing for coderd, workspace, provisioners, and the database.

@cian @mafredri
I reviewed the latest state of the art for scale tests, Grafana dashboards, and the current version of Scaling Coder. Feel free to adjust the numbers to be more accurate. I admit that I tried to give extra CPU/mem capacity.

docs/admin/architectures/index.md

docs/admin/architectures/1k-users.md

docs/admin/architectures/index.md

docs/admin/scale.md

docs/admin/architectures/index.md

ericpaulsen · 2024-03-13T19:50:06Z

we may want to link/add to the architecture diagrams i've added here: #12584

mtojek · 2024-03-14T12:28:53Z

we may want to link/add to the architecture diagrams i've added here: #12584

Thanks for linking the PR. I will update the architecture page once your PR is merged. BTW I wanted to elaborate more on these here.

mafredri

Really nice work on this @mtojek, looking great. Left some comments for minor adjustments, and also re-reviewed the text that was moved to index.md.

mafredri · 2024-03-15T08:09:16Z

docs/admin/architectures/2k-users.md

+
+| Users       | Node capacity        | Replicas | Storage | GCP                 | AWS            | Azure             |
+| ----------- | -------------------- | -------- | ------- | ------------------- | -------------- | ----------------- |
+| Up to 2,000 | 4 vCPU, 16 GB memory | 1        | 1 TB    | `db-custom-4-15360` | `db.t3.xlarge` | `Standard_D4s_v3` |


We've been using a db-custom-8-32768 for our 2000 user scaletests, so these CPU number here may be slightly inaccurate. Granted, for a regular (non-scaletest) deployment, I think 4 vCPU is borderline sufficient as it's the 2000 active workspaces that push the DB CPU load up to ~80% (8 vCPU).

On the other hand, our scaletesting methology was really aggressive, so I would expect a similar pattern on the user side 🤔
What is your recommendation @maf? Should we switch to db-custom-8-32768?

docs/admin/architectures/2k-users.md

docs/admin/architectures/index.md

mafredri · 2024-03-15T08:24:31Z

docs/admin/architectures/index.md

+
+When determining scaling requirements, consider the following factors:
+
+- `1 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource


I think this better matches our reference arch?

Suggested change

- `1 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource

- `0.5 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource

Thinking about the future, I would leave 1 vCPU, WDYT?

We can probably argue for wiggle room here based on how certain Terraform providers may be more CPU-intensive than others.

docs/admin/architectures/index.md

docs/admin/scale.md

docs/admin/architectures/index.md

docs/admin/architectures/3k-users.md

docs/admin/architectures/2k-users.md

docs/admin/architectures/3k-users.md

johnstcn

Lots of good stuff here. I think we're going to end up modifying this based later feedback, but 👍 right now.

mtojek · 2024-03-15T14:07:54Z

Thanks for the reviews, folks. I'm going to merge it and we can implement the next changes in follow-ups.

docs: hardware recommendations for reference architectures

2dc5f2f

mtojek self-assigned this Mar 11, 2024

mtojek mentioned this pull request Mar 11, 2024

docs: Build public-facing documentation for Reference Architectures #12426

Closed

9 tasks

Achrs

d2e1f42

mtojek changed the title ~~docs: hardware recommendations for reference architectures~~ docs: provide hardware recommendations for reference architectures Mar 12, 2024

mtojek added 14 commits March 12, 2024 12:01

WIP

842ed58

remodelled

46f3dc2

subpages

59654fd

Target load

894cddb

now workspaces

fa1215f

HTTP API latency

43812e6

fix

f68ed34

WIP

2987193

More TODOs

1a4dfb9

WIP

4721204

2k

233866f

WIP

17e5431

1k 2k 3k

ab95ddd

Workspaces covered

0937f36

bpmct requested a review from ericpaulsen March 13, 2024 12:48

bpmct reviewed Mar 13, 2024

View reviewed changes

docs/admin/architectures/index.md Outdated Show resolved Hide resolved

mtojek added 4 commits March 13, 2024 13:55

More TODOs

67c4604

Database requirements

11dbdd7

Fix

776d4c6

1k 2k 3k

6a87a93

mtojek marked this pull request as ready for review March 13, 2024 14:18

mtojek requested review from mafredri and johnstcn March 13, 2024 14:18

johnstcn reviewed Mar 13, 2024

View reviewed changes

mtojek added 8 commits March 14, 2024 11:14

WIP

cf29c26

WIP

18bd4d2

WIP

d36e893

WIP

066d6ff

WIP: long lived

d774ed5

CODER_BLOCK_DIRECT

395d300

scale.md

9ae4b61

mention proxies

701a205

mtojek requested a review from johnstcn March 14, 2024 12:28

Link to CLI for now

088395a

mafredri approved these changes Mar 15, 2024

View reviewed changes

johnstcn reviewed Mar 15, 2024

View reviewed changes

docs/admin/architectures/index.md Outdated Show resolved Hide resolved

johnstcn reviewed Mar 15, 2024

View reviewed changes

docs/admin/architectures/3k-users.md Outdated Show resolved Hide resolved

johnstcn reviewed Mar 15, 2024

View reviewed changes

docs/admin/architectures/2k-users.md Outdated Show resolved Hide resolved

johnstcn reviewed Mar 15, 2024

View reviewed changes

docs/admin/architectures/3k-users.md Show resolved Hide resolved

johnstcn approved these changes Mar 15, 2024

View reviewed changes

mtojek added 9 commits March 15, 2024 11:45

WIP

34c4903

WIP: coderd each

13dee4c

WIP

bb26800

WIP

40def43

WIP

19ea381

WIP

627e26f

WIP

8d87b34

WIP

d0c9fd6

Observability

a34ae19

mtojek merged commit bed0d85 into 12426-main Mar 15, 2024

mtojek deleted the 12426-recommend-control-plane-3 branch March 15, 2024 14:08

github-actions bot locked and limited conversation to collaborators Mar 15, 2024


		When determining scaling requirements, consider the following factors:

		- `1 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource

	- `1 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource
	- `0.5 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource

docs: provide hardware recommendations for reference architectures #12534

docs: provide hardware recommendations for reference architectures #12534

Uh oh!

Conversation

mtojek commented Mar 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ericpaulsen commented Mar 13, 2024

Uh oh!

mtojek commented Mar 14, 2024

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

mafredri Mar 15, 2024

Choose a reason for hiding this comment

Uh oh!

mtojek Mar 15, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mafredri Mar 15, 2024

Choose a reason for hiding this comment

Uh oh!

mtojek Mar 15, 2024

Choose a reason for hiding this comment

Uh oh!

johnstcn Mar 15, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johnstcn left a comment

Choose a reason for hiding this comment

Uh oh!

mtojek commented Mar 15, 2024

Uh oh!

Uh oh!

mtojek commented Mar 11, 2024 •

edited

Loading