Skip to content

docs: Build public-facing documentation for Reference Architectures #12426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
8 of 9 tasks
mtojek opened this issue Mar 5, 2024 · 0 comments · Fixed by #12723
Closed
8 of 9 tasks

docs: Build public-facing documentation for Reference Architectures #12426

mtojek opened this issue Mar 5, 2024 · 0 comments · Fixed by #12723
Assignees
Labels
docs Area: coder.com/docs scaletest Issues related to scale testing.

Comments

@mtojek
Copy link
Member

mtojek commented Mar 5, 2024

Today, as part of our V2 Scale Tests, we are simulating a load much higher than current Coder usage, aiming for 2000 concurrent active users. This effort is crucial for identifying bugs and informing scaling recommendations in preparation for potential customer demand.

Considering our customers' aggressive scaling goals, including up to 5k known users, it's evident that our customers require assurance that Coder and their infrastructure can handle such user loads effectively. To support successful customer deployments, especially for thousands of users, it's essential to provide not only scalability but also other key ingredients.

Therefore, we need to develop public-facing documentation that outlines reference architectures and offers prescriptive solutions for different use cases. This documentation will not only benefit our customers but also streamline our onboarding process and provide a clearer roadmap for future scalability efforts.

Architecture by Scale (phase 1):

  • docs: update reference architecture: glossary, scale tests methodology #12438 Coder Glossary
    • admin, user, workspace, template, proxy, provisioner, registry, Kubernetes cluster for Coder
  • docs: update reference architecture: glossary, scale tests methodology #12438 Describe scale tests methodology
    • infra/setup requirements, web terminal, applications
    • traffic projections during scale tests (continuous volume, req/s, users, aggressive approach)
  • docs: provide hardware recommendations for reference architectures #12534 Hardware recommendations for Coder control plane
    • node sizing, resource limits, number of replicas
      • refer to public cloud providers: AWS, GCP, Azure
    • reasonable ratio/formula: CPU x memory x users
    • reasonable ratio/formula: provisionerd x users
    • API latency/response time
    • average number of HTTP requests
    • advice: do not enable autoscaling
  • docs: provide hardware recommendations for reference architectures #12534 Hardware recommendations for Coder workspaces
    • Assumptions:
      • workspaces also run on the same Kubernetes cluster (recommend a different namespace/node pool)
      • developers can pick between 4-8 CPU and 4-16 GB RAM workspaces (limits)
      • developers have a resource quota of 16 GPU 32 GB RAM (2-maxed out workspaces).
      • web microservice development use case: resources are mostly underutilized but spike during builds
    • Case study:
      • Developers for up to 2000+ users architecture are in 2 regions (a different cluster) and are evenly split. In practice, this doesn’t change much besides the diagram and workspaces node pool autoscaling config as it still uses the central provisioner. Recommend multiple provisioner groups for zero-trust and multi-cloud use cases.
    • Developers for up to 3000+ users architecture are also in an on-premises network. Document a provisioner running in a different cloud environment, and the zero-trust benefits of that.
  • docs: provide hardware recommendations for reference architectures #12534 Hardware sizing recommendations for PostgreSQL database
    • measure and document the impact of dbcrypt
  • docs: use scale testing utility #12643 Describe how to run scale tests against our own infrastructure
    • use coder CLI, link dedicated command docs
    • expose to public Coder internal template for scale tests
    • expose to public Grafana dashboards
  • docs: describe operational readiness #12723 Operational Readiness
    • Configuration: Helm values, environment variables
    • Template configuration managed via CI/CD.
    • Observability, e.g. Promethus/Grafana
    • Database backups
      • link scripts if we have any
    • Custom support links
    • Custom agent troubleshooting links
  • Network recommendations for Coder
    • control plane ↔ workspaces ↔ clients

TODO:

  • Mention that as users onboard, the autoscaling config should take care of ongoing workspaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Area: coder.com/docs scaletest Issues related to scale testing.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant