Skip to content

Commit 199e83d

Browse files
committed
docs: add validated architecture
1 parent e320661 commit 199e83d

File tree

4 files changed

+348
-193
lines changed

4 files changed

+348
-193
lines changed
File renamed without changes.

docs/admin/architectures/index.md renamed to docs/admin/architectures/scale-testing.md

Lines changed: 2 additions & 181 deletions
Original file line numberDiff line numberDiff line change
@@ -1,90 +1,4 @@
1-
# Reference Architectures
2-
3-
This document provides prescriptive solutions and reference architectures to
4-
support successful deployments of up to 3000 users and outlines at a high-level
5-
the methodology currently used to scale-test Coder.
6-
7-
## General concepts
8-
9-
This section outlines core concepts and terminology essential for understanding
10-
Coder's architecture and deployment strategies.
11-
12-
### Administrator
13-
14-
An administrator is a user role within the Coder platform with elevated
15-
privileges. Admins have access to administrative functions such as user
16-
management, template definitions, insights, and deployment configuration.
17-
18-
### Coder
19-
20-
Coder, also known as _coderd_, is the main service recommended for deployment
21-
with multiple replicas to ensure high availability. It provides an API for
22-
managing workspaces and templates. Each _coderd_ replica has the capability to
23-
host multiple [provisioners](#provisioner).
24-
25-
### User
26-
27-
A user is an individual who utilizes the Coder platform to develop, test, and
28-
deploy applications using workspaces. Users can select available templates to
29-
provision workspaces. They interact with Coder using the web interface, the CLI
30-
tool, or directly calling API methods.
31-
32-
### Workspace
33-
34-
A workspace refers to an isolated development environment where users can write,
35-
build, and run code. Workspaces are fully configurable and can be tailored to
36-
specific project requirements, providing developers with a consistent and
37-
efficient development environment. Workspaces can be autostarted and
38-
autostopped, enabling efficient resource management.
39-
40-
Users can connect to workspaces using SSH or via workspace applications like
41-
`code-server`, facilitating collaboration and remote access. Additionally,
42-
workspaces can be parameterized, allowing users to customize settings and
43-
configurations based on their unique needs. Workspaces are instantiated using
44-
Coder templates and deployed on resources created by provisioners.
45-
46-
### Template
47-
48-
A template in Coder is a predefined configuration for creating workspaces.
49-
Templates streamline the process of workspace creation by providing
50-
pre-configured settings, tooling, and dependencies. They are built by template
51-
administrators on top of Terraform, allowing for efficient management of
52-
infrastructure resources. Additionally, templates can utilize Coder modules to
53-
leverage existing features shared with other templates, enhancing flexibility
54-
and consistency across deployments. Templates describe provisioning rules for
55-
infrastructure resources offered by Terraform providers.
56-
57-
### Workspace Proxy
58-
59-
A workspace proxy serves as a relay connection option for developers connecting
60-
to their workspace over SSH, a workspace app, or through port forwarding. It
61-
helps reduce network latency for geo-distributed teams by minimizing the
62-
distance network traffic needs to travel. Notably, workspace proxies do not
63-
handle dashboard connections or API calls.
64-
65-
### Provisioner
66-
67-
Provisioners in Coder execute Terraform during workspace and template builds.
68-
While the platform includes built-in provisioner daemons by default, there are
69-
advantages to employing external provisioners. These external daemons provide
70-
secure build environments and reduce server load, improving performance and
71-
scalability. Each provisioner can handle a single concurrent workspace build,
72-
allowing for efficient resource allocation and workload management.
73-
74-
### Registry
75-
76-
The Coder Registry is a platform where you can find starter templates and
77-
_Modules_ for various cloud services and platforms.
78-
79-
Templates help create self-service development environments using
80-
Terraform-defined infrastructure, while _Modules_ simplify template creation by
81-
providing common features like workspace applications, third-party integrations,
82-
or helper scripts.
83-
84-
Please note that the Registry is a hosted service and isn't available for
85-
offline use.
86-
87-
## Scale-testing methodology
1+
## Scale Testing
882

893
Scaling Coder involves planning and testing to ensure it can handle more load
904
without compromising service. This process encompasses infrastructure setup,
@@ -95,7 +9,7 @@ A dedicated Kubernetes cluster for Coder is Kubernetes cluster specifically
959
configured to host and manage Coder workloads. Kubernetes provides container
9610
orchestration capabilities, allowing Coder to efficiently deploy, scale, and
9711
manage workspaces across a distributed infrastructure. This ensures high
98-
availability, fault tolerance, and scalability for Coder deployments. Code is
12+
availability, fault tolerance, and scalability for Coder deployments. Coder is
9913
deployed on this cluster using the
10014
[Helm chart](../../install/kubernetes.md#install-coder-with-helm).
10115

@@ -315,96 +229,3 @@ Scaling down workspace nodes to zero is not recommended, as it will result in
315229
longer wait times for workspace provisioning by users. However, this may be
316230
necessary for workspaces with special resource requirements (e.g. GPUs) that
317231
incur significant cost overheads.
318-
319-
### Data plane: External database
320-
321-
While running in production, Coder requires a access to an external PostgreSQL
322-
database. Depending on the scale of the user-base, workspace activity, and High
323-
Availability requirements, the amount of CPU and memory resources required by
324-
Coder's database may differ.
325-
326-
#### Scaling formula
327-
328-
When determining scaling requirements, take into account the following
329-
considerations:
330-
331-
- `2 vCPU x 8 GB RAM x 512 GB storage`: A baseline for database requirements for
332-
Coder deployment with less than 1000 users, and low activity level (30% active
333-
users). This capacity should be sufficient to support 100 external
334-
provisioners.
335-
- Storage size depends on user activity, workspace builds, log verbosity,
336-
overhead on database encryption, etc.
337-
- Allocate two additional CPU core to the database instance for every 1000
338-
active users.
339-
- Enable _High Availability_ mode for database engine for large scale
340-
deployments.
341-
342-
If you enable [database encryption](../encryption.md) in Coder, consider
343-
allocating an additional CPU core to every `coderd` replica.
344-
345-
#### Performance optimization guidelines
346-
347-
We provide the following general recommendations for PostgreSQL settings:
348-
349-
- Increase number of vCPU if CPU utilization or database latency is high.
350-
- Allocate extra memory if database performance is poor, CPU utilization is low,
351-
and memory utilization is high.
352-
- Utilize faster disk options (higher IOPS) such as SSDs or NVMe drives for
353-
optimal performance enhancement and possibly reduce database load.
354-
355-
## Operational readiness
356-
357-
Operational readiness in Coder is about ensuring that everything is set up
358-
correctly before launching a platform into production. It involves making sure
359-
that the service is reliable, secure, and easily scales accordingly to user-base
360-
needs. Operational readiness is crucial because it helps prevent issues that
361-
could affect workspace users experience once the platform is live.
362-
363-
Learn about Coder design principles and architectural best practices described
364-
in the
365-
[Well-Architected Framework](https://coder.com/blog/coder-well-architected-framework).
366-
367-
### Configuration
368-
369-
1. Identify the required Helm values for configuration.
370-
1. Create `values.yaml` and add it to a version control system. _Note:_ it is
371-
highly recommended that you create a custom `values.yaml` as opposed to
372-
copying the entire default values.
373-
1. Determine the necessary environment variables.
374-
375-
### Template configuration
376-
377-
1. Establish a dedicated user account for the _Template Administrator_.
378-
1. Maintain Coder templates using version control.
379-
1. Consider implementing a GitOps workflow to automatically push new template.
380-
For example, on Github, you can use the
381-
[Update Coder Template](https://github.com/marketplace/actions/update-coder-template)
382-
action.
383-
1. Evaluate enabling automatic template updates upon workspace startup.
384-
385-
### Deployment
386-
387-
1. Leverage automation tooling to automate deployment and upgrades of Coder.
388-
389-
### Observability
390-
391-
1. Enable the Prometheus endpoint (environment variable:
392-
`CODER_PROMETHEUS_ENABLE`).
393-
1. Deploy a visual monitoring system such as Grafana for metrics visualization.
394-
1. Deploy a centralized logs aggregation solution to collect and monitor
395-
application logs.
396-
1. Review the [Prometheus response](../prometheus.md) and set up alarms on
397-
selected metrics.
398-
399-
### Database backups
400-
401-
1. Prepare internal scripts for dumping and restoring databases.
402-
1. Schedule regular database backups, especially before release upgrades.
403-
404-
### User support
405-
406-
1. Incorporate [support links](../appearance.md#support-links) into internal
407-
documentation accessible from the user context menu. Ensure that hyperlinks
408-
are valid and lead to up-to-date materials.
409-
1. Encourage the use of `coder support bundle` to allow workspace users to
410-
generate and provide network-related diagnostic data.

0 commit comments

Comments
 (0)