Skip to content

docs: provide hardware recommendations for reference architectures #12534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Mar 15, 2024

Conversation

mtojek
Copy link
Member

@mtojek mtojek commented Mar 11, 2024

Related: #12426

This PR describes hardware recommendations for Coder references architectures including node sizing for coderd, workspace, provisioners, and the database.

@cian @mafredri
I reviewed the latest state of the art for scale tests, Grafana dashboards, and the current version of Scaling Coder. Feel free to adjust the numbers to be more accurate. I admit that I tried to give extra CPU/mem capacity.

@mtojek mtojek changed the title docs: hardware recommendations for reference architectures docs: provide hardware recommendations for reference architectures Mar 12, 2024
@bpmct bpmct requested a review from ericpaulsen March 13, 2024 12:48
@mtojek mtojek marked this pull request as ready for review March 13, 2024 14:18
@mtojek mtojek requested review from mafredri and johnstcn March 13, 2024 14:18
@ericpaulsen
Copy link
Member

we may want to link/add to the architecture diagrams i've added here: #12584

@mtojek
Copy link
Member Author

mtojek commented Mar 14, 2024

we may want to link/add to the architecture diagrams i've added here: #12584

Thanks for linking the PR. I will update the architecture page once your PR is merged. BTW I wanted to elaborate more on these here.

@mtojek mtojek requested a review from johnstcn March 14, 2024 12:28
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work on this @mtojek, looking great. Left some comments for minor adjustments, and also re-reviewed the text that was moved to index.md.


| Users | Node capacity | Replicas | Storage | GCP | AWS | Azure |
| ----------- | -------------------- | -------- | ------- | ------------------- | -------------- | ----------------- |
| Up to 2,000 | 4 vCPU, 16 GB memory | 1 | 1 TB | `db-custom-4-15360` | `db.t3.xlarge` | `Standard_D4s_v3` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using a db-custom-8-32768 for our 2000 user scaletests, so these CPU number here may be slightly inaccurate. Granted, for a regular (non-scaletest) deployment, I think 4 vCPU is borderline sufficient as it's the 2000 active workspaces that push the DB CPU load up to ~80% (8 vCPU).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand, our scaletesting methology was really aggressive, so I would expect a similar pattern on the user side 🤔
What is your recommendation @maf? Should we switch to db-custom-8-32768?


When determining scaling requirements, consider the following factors:

- `1 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this better matches our reference arch?

Suggested change
- `1 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource
- `0.5 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about the future, I would leave 1 vCPU, WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably argue for wiggle room here based on how certain Terraform providers may be more CPU-intensive than others.

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of good stuff here. I think we're going to end up modifying this based later feedback, but 👍 right now.

@mtojek
Copy link
Member Author

mtojek commented Mar 15, 2024

Thanks for the reviews, folks. I'm going to merge it and we can implement the next changes in follow-ups.

@mtojek mtojek merged commit bed0d85 into 12426-main Mar 15, 2024
@mtojek mtojek deleted the 12426-recommend-control-plane-3 branch March 15, 2024 14:08
@github-actions github-actions bot locked and limited conversation to collaborators Mar 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants