-
Notifications
You must be signed in to change notification settings - Fork 881
docs: provide hardware recommendations for reference architectures #12534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
we may want to link/add to the architecture diagrams i've added here: #12584 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work on this @mtojek, looking great. Left some comments for minor adjustments, and also re-reviewed the text that was moved to index.md
.
|
||
| Users | Node capacity | Replicas | Storage | GCP | AWS | Azure | | ||
| ----------- | -------------------- | -------- | ------- | ------------------- | -------------- | ----------------- | | ||
| Up to 2,000 | 4 vCPU, 16 GB memory | 1 | 1 TB | `db-custom-4-15360` | `db.t3.xlarge` | `Standard_D4s_v3` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been using a db-custom-8-32768
for our 2000 user scaletests, so these CPU number here may be slightly inaccurate. Granted, for a regular (non-scaletest) deployment, I think 4 vCPU is borderline sufficient as it's the 2000 active workspaces that push the DB CPU load up to ~80% (8 vCPU).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, our scaletesting methology was really aggressive, so I would expect a similar pattern on the user side 🤔
What is your recommendation @maf? Should we switch to db-custom-8-32768
?
|
||
When determining scaling requirements, consider the following factors: | ||
|
||
- `1 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this better matches our reference arch?
- `1 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource | |
- `0.5 vCPU x 2 GB memory x 250 users`: A reasonable formula to determine resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about the future, I would leave 1 vCPU
, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably argue for wiggle room here based on how certain Terraform providers may be more CPU-intensive than others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of good stuff here. I think we're going to end up modifying this based later feedback, but 👍 right now.
Thanks for the reviews, folks. I'm going to merge it and we can implement the next changes in follow-ups. |
Related: #12426
This PR describes hardware recommendations for Coder references architectures including node sizing for
coderd
, workspace, provisioners, and the database.@cian @mafredri
I reviewed the latest state of the art for scale tests, Grafana dashboards, and the current version of Scaling Coder. Feel free to adjust the numbers to be more accurate. I admit that I tried to give extra CPU/mem capacity.