docs: Build public-facing documentation for Reference Architectures

Today, as part of our V2 Scale Tests, we are simulating a load much higher than current Coder usage, aiming for 2000 concurrent active users. This effort is crucial for identifying bugs and informing scaling recommendations in preparation for potential customer demand.

Considering our customers' aggressive scaling goals, including up to 5k known users, it's evident that our customers require assurance that Coder and their infrastructure can handle such user loads effectively. To support successful customer deployments, especially for thousands of users, it's essential to provide not only scalability but also other key ingredients.

Therefore, we need to develop public-facing documentation that outlines reference architectures and offers prescriptive solutions for different use cases. This documentation will not only benefit our customers but also streamline our onboarding process and provide a clearer roadmap for future scalability efforts.

_Architecture by Scale_ (phase 1):
- [x] https://github.com/coder/coder/pull/12438 Coder Glossary
  - admin, user, workspace, template, proxy, provisioner, registry, Kubernetes cluster for Coder
- [x] https://github.com/coder/coder/pull/12438 Describe scale tests methodology
  - infra/setup requirements, web terminal, applications
  - traffic projections during scale tests (continuous volume, req/s, users, aggressive approach)
- [x] https://github.com/coder/coder/pull/12534 Hardware recommendations for Coder control plane
  - node sizing, resource limits, number of replicas
    - refer to public cloud providers: AWS, GCP, Azure
  - reasonable ratio/formula: CPU x memory x users
  - reasonable ratio/formula: provisionerd x users
  - API latency/response time
  - average number of HTTP requests
  - advice: do not enable autoscaling
- [x] https://github.com/coder/coder/pull/12534 Hardware recommendations for Coder workspaces
  - Assumptions: 
    - workspaces also run on the same Kubernetes cluster (recommend a different namespace/node pool)
    - developers can pick between 4-8 CPU and 4-16 GB RAM workspaces (limits)
    - developers have a resource quota of 16 GPU 32 GB RAM (2-maxed out workspaces).
    - web microservice development use case: resources are mostly underutilized but spike during builds
   - Case study:
     - Developers for up to 2000+ users architecture are in 2 regions (a different cluster) and are evenly split. In practice, this doesn’t change much besides the diagram and workspaces node pool autoscaling config as it still uses the central provisioner. Recommend multiple provisioner groups for zero-trust and multi-cloud use cases.
    - Developers for up to 3000+ users architecture are also in an on-premises network. Document a provisioner running in a different cloud environment, and the zero-trust benefits of that.
- [x] https://github.com/coder/coder/pull/12534 Hardware sizing recommendations for PostgreSQL database
  - measure and document the impact of `dbcrypt`
- [x] https://github.com/coder/coder/pull/12643 Describe how to run scale tests against our own infrastructure
  - use coder CLI, link dedicated command docs
  - expose to public Coder internal template for scale tests
  - expose to public Grafana dashboards
- [x] https://github.com/coder/coder/pull/12723 Operational Readiness
  - Configuration: Helm values, environment variables
  - Template configuration managed via CI/CD.
  - Observability, e.g. Promethus/Grafana
  - Database backups
    - link scripts if we have any
  - Custom support links
  - Custom agent troubleshooting links
- [ ] Network recommendations for Coder
  - control plane ↔ workspaces ↔ clients

TODO:
- [x] Mention that as users onboard, the autoscaling config should take care of ongoing workspaces




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Build public-facing documentation for Reference Architectures #12426

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

docs: Build public-facing documentation for Reference Architectures #12426

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions