Skip to content

docs: add validated architecture #13561

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,6 @@ The Coder deployment model is flexible and offers various components that
platform administrators can deploy and scale depending on their use case. This
page describes possible deployments, challenges, and risks associated with them.

Learn more about our [Reference Architectures](../admin/architectures/index.md)
and platform scaling capabilities.

## Primary components

### coderd
Expand All @@ -29,7 +26,7 @@ _provisionerd_ is the execution context for infrastructure modifying providers.
At the moment, the only provider is Terraform (running `terraform`).

By default, the Coder server runs multiple provisioner daemons.
[External provisioners](../admin/provisioners.md) can be added for security or
[External provisioners](../provisioners.md) can be added for security or
scalability purposes.

### Agents
Expand All @@ -46,7 +43,7 @@ It offers the following services along with much more:
- `startup_script` automation

Templates are responsible for
[creating and running agents](../templates/index.md#coder-agent) within
[creating and running agents](../../templates/index.md#coder-agent) within
workspaces.

### Service Bundling
Expand Down Expand Up @@ -76,7 +73,7 @@ they're destroyed on workspace stop.

### Single region architecture

![Architecture Diagram](../images/architecture-single-region.png)
![Architecture Diagram](../../images/architecture-single-region.png)

#### Components

Expand Down Expand Up @@ -121,11 +118,11 @@ and _Coder workspaces_ deployed in the same region.

- Integrate with existing Single Sign-On (SSO) solutions used within the
organization via the supported OAuth 2.0 or OpenID Connect standards.
- Learn more about [Authentication in Coder](../admin/auth.md).
- Learn more about [Authentication in Coder](../auth.md).

### Multi-region architecture

![Architecture Diagram](../images/architecture-multi-region.png)
![Architecture Diagram](../../images/architecture-multi-region.png)

#### Components

Expand Down Expand Up @@ -171,7 +168,7 @@ disruptions. Additionally, multi-cloud deployment enables organizations to
leverage the unique features and capabilities offered by each cloud provider,
such as region availability and pricing models.

![Architecture Diagram](../images/architecture-multi-cloud.png)
![Architecture Diagram](../../images/architecture-multi-cloud.png)

#### Components

Expand Down Expand Up @@ -205,7 +202,7 @@ nearest region and technical specifications provided by the cloud providers.
**Workspace proxy**

- _Security recommendation_: Use `coder` CLI to create
[authentication tokens for every workspace proxy](../admin/workspace-proxies.md#requirements),
[authentication tokens for every workspace proxy](../workspace-proxies.md#requirements),
and keep them in regional secret stores. Remember to distribute them using
safe, encrypted communication channel.

Expand All @@ -226,8 +223,8 @@ nearest region and technical specifications provided by the cloud providers.
See how to deploy
[Coder on Azure Kubernetes Service](https://github.com/ericpaulsen/coder-aks).

Learn more about [security requirements](../install/kubernetes.md) for deploying
Coder on Kubernetes.
Learn more about [security requirements](../../install/kubernetes.md) for
deploying Coder on Kubernetes.

**Load balancer**

Expand Down Expand Up @@ -286,9 +283,9 @@ The key features of the air-gapped architecture include:
- _Secure data transfer_: Enable encrypted communication channels and robust
access controls to safeguard sensitive information.

Learn more about [offline deployments](../install/offline.md) of Coder.
Learn more about [offline deployments](../../install/offline.md) of Coder.

![Architecture Diagram](../images/architecture-air-gapped.png)
![Architecture Diagram](../../images/architecture-air-gapped.png)

#### Components

Expand Down Expand Up @@ -330,8 +327,8 @@ across multiple regions and diverse cloud platforms.
- Since the _Registry_ is isolated from the internet, platform engineers are
responsible for maintaining Workspace container images and conducting periodic
updates of base Docker images.
- It is recommended to keep [Dev Containers](../templates/dev-containers.md) up
to date with the latest released
- It is recommended to keep [Dev Containers](../../templates/dev-containers.md)
up to date with the latest released
[Envbuilder](https://github.com/coder/envbuilder) runtime.

**Mirror of Terraform Registry**
Expand Down Expand Up @@ -363,7 +360,7 @@ Learn more about
[Dev containers support](https://coder.com/docs/v2/latest/templates/dev-containers)
in Coder.

![Architecture Diagram](../images/architecture-devcontainers.png)
![Architecture Diagram](../../images/architecture-devcontainers.png)

#### Components

Expand Down
Original file line number Diff line number Diff line change
@@ -1,90 +1,4 @@
# Reference Architectures

This document provides prescriptive solutions and reference architectures to
support successful deployments of up to 3000 users and outlines at a high-level
the methodology currently used to scale-test Coder.

## General concepts

This section outlines core concepts and terminology essential for understanding
Coder's architecture and deployment strategies.

### Administrator

An administrator is a user role within the Coder platform with elevated
privileges. Admins have access to administrative functions such as user
management, template definitions, insights, and deployment configuration.

### Coder

Coder, also known as _coderd_, is the main service recommended for deployment
with multiple replicas to ensure high availability. It provides an API for
managing workspaces and templates. Each _coderd_ replica has the capability to
host multiple [provisioners](#provisioner).

### User

A user is an individual who utilizes the Coder platform to develop, test, and
deploy applications using workspaces. Users can select available templates to
provision workspaces. They interact with Coder using the web interface, the CLI
tool, or directly calling API methods.

### Workspace

A workspace refers to an isolated development environment where users can write,
build, and run code. Workspaces are fully configurable and can be tailored to
specific project requirements, providing developers with a consistent and
efficient development environment. Workspaces can be autostarted and
autostopped, enabling efficient resource management.

Users can connect to workspaces using SSH or via workspace applications like
`code-server`, facilitating collaboration and remote access. Additionally,
workspaces can be parameterized, allowing users to customize settings and
configurations based on their unique needs. Workspaces are instantiated using
Coder templates and deployed on resources created by provisioners.

### Template

A template in Coder is a predefined configuration for creating workspaces.
Templates streamline the process of workspace creation by providing
pre-configured settings, tooling, and dependencies. They are built by template
administrators on top of Terraform, allowing for efficient management of
infrastructure resources. Additionally, templates can utilize Coder modules to
leverage existing features shared with other templates, enhancing flexibility
and consistency across deployments. Templates describe provisioning rules for
infrastructure resources offered by Terraform providers.

### Workspace Proxy

A workspace proxy serves as a relay connection option for developers connecting
to their workspace over SSH, a workspace app, or through port forwarding. It
helps reduce network latency for geo-distributed teams by minimizing the
distance network traffic needs to travel. Notably, workspace proxies do not
handle dashboard connections or API calls.

### Provisioner

Provisioners in Coder execute Terraform during workspace and template builds.
While the platform includes built-in provisioner daemons by default, there are
advantages to employing external provisioners. These external daemons provide
secure build environments and reduce server load, improving performance and
scalability. Each provisioner can handle a single concurrent workspace build,
allowing for efficient resource allocation and workload management.

### Registry

The Coder Registry is a platform where you can find starter templates and
_Modules_ for various cloud services and platforms.

Templates help create self-service development environments using
Terraform-defined infrastructure, while _Modules_ simplify template creation by
providing common features like workspace applications, third-party integrations,
or helper scripts.

Please note that the Registry is a hosted service and isn't available for
offline use.

## Scale-testing methodology
## Scale Testing

Scaling Coder involves planning and testing to ensure it can handle more load
without compromising service. This process encompasses infrastructure setup,
Expand All @@ -95,7 +9,7 @@ A dedicated Kubernetes cluster for Coder is Kubernetes cluster specifically
configured to host and manage Coder workloads. Kubernetes provides container
orchestration capabilities, allowing Coder to efficiently deploy, scale, and
manage workspaces across a distributed infrastructure. This ensures high
availability, fault tolerance, and scalability for Coder deployments. Code is
availability, fault tolerance, and scalability for Coder deployments. Coder is
deployed on this cluster using the
[Helm chart](../../install/kubernetes.md#install-coder-with-helm).

Expand Down Expand Up @@ -315,96 +229,3 @@ Scaling down workspace nodes to zero is not recommended, as it will result in
longer wait times for workspace provisioning by users. However, this may be
necessary for workspaces with special resource requirements (e.g. GPUs) that
incur significant cost overheads.

### Data plane: External database

While running in production, Coder requires a access to an external PostgreSQL
database. Depending on the scale of the user-base, workspace activity, and High
Availability requirements, the amount of CPU and memory resources required by
Coder's database may differ.

#### Scaling formula

When determining scaling requirements, take into account the following
considerations:

- `2 vCPU x 8 GB RAM x 512 GB storage`: A baseline for database requirements for
Coder deployment with less than 1000 users, and low activity level (30% active
users). This capacity should be sufficient to support 100 external
provisioners.
- Storage size depends on user activity, workspace builds, log verbosity,
overhead on database encryption, etc.
- Allocate two additional CPU core to the database instance for every 1000
active users.
- Enable _High Availability_ mode for database engine for large scale
deployments.

If you enable [database encryption](../encryption.md) in Coder, consider
allocating an additional CPU core to every `coderd` replica.

#### Performance optimization guidelines

We provide the following general recommendations for PostgreSQL settings:

- Increase number of vCPU if CPU utilization or database latency is high.
- Allocate extra memory if database performance is poor, CPU utilization is low,
and memory utilization is high.
- Utilize faster disk options (higher IOPS) such as SSDs or NVMe drives for
optimal performance enhancement and possibly reduce database load.

## Operational readiness

Operational readiness in Coder is about ensuring that everything is set up
correctly before launching a platform into production. It involves making sure
that the service is reliable, secure, and easily scales accordingly to user-base
needs. Operational readiness is crucial because it helps prevent issues that
could affect workspace users experience once the platform is live.

Learn about Coder design principles and architectural best practices described
in the
[Well-Architected Framework](https://coder.com/blog/coder-well-architected-framework).

### Configuration

1. Identify the required Helm values for configuration.
1. Create `values.yaml` and add it to a version control system. _Note:_ it is
highly recommended that you create a custom `values.yaml` as opposed to
copying the entire default values.
1. Determine the necessary environment variables.

### Template configuration

1. Establish a dedicated user account for the _Template Administrator_.
1. Maintain Coder templates using version control.
1. Consider implementing a GitOps workflow to automatically push new template.
For example, on Github, you can use the
[Update Coder Template](https://github.com/marketplace/actions/update-coder-template)
action.
1. Evaluate enabling automatic template updates upon workspace startup.

### Deployment

1. Leverage automation tooling to automate deployment and upgrades of Coder.

### Observability

1. Enable the Prometheus endpoint (environment variable:
`CODER_PROMETHEUS_ENABLE`).
1. Deploy a visual monitoring system such as Grafana for metrics visualization.
1. Deploy a centralized logs aggregation solution to collect and monitor
application logs.
1. Review the [Prometheus response](../prometheus.md) and set up alarms on
selected metrics.

### Database backups

1. Prepare internal scripts for dumping and restoring databases.
1. Schedule regular database backups, especially before release upgrades.

### User support

1. Incorporate [support links](../appearance.md#support-links) into internal
documentation accessible from the user context menu. Ensure that hyperlinks
are valid and lead to up-to-date materials.
1. Encourage the use of `coder support bundle` to allow workspace users to
generate and provide network-related diagnostic data.
Loading
Loading