From 199e83d4d57c880d433c9e48e2fa90398a730192 Mon Sep 17 00:00:00 2001 From: Eric Date: Wed, 12 Jun 2024 20:10:50 +0000 Subject: [PATCH 1/6] docs: add validated architecture --- .../architectures}/architecture.md | 0 .../{index.md => scale-testing.md} | 183 +--------- docs/admin/architectures/validated-arch.md | 334 ++++++++++++++++++ docs/manifest.json | 24 +- 4 files changed, 348 insertions(+), 193 deletions(-) rename docs/{about => admin/architectures}/architecture.md (100%) rename docs/admin/architectures/{index.md => scale-testing.md} (55%) create mode 100644 docs/admin/architectures/validated-arch.md diff --git a/docs/about/architecture.md b/docs/admin/architectures/architecture.md similarity index 100% rename from docs/about/architecture.md rename to docs/admin/architectures/architecture.md diff --git a/docs/admin/architectures/index.md b/docs/admin/architectures/scale-testing.md similarity index 55% rename from docs/admin/architectures/index.md rename to docs/admin/architectures/scale-testing.md index 85c06a650dee9..38e27b63be1ca 100644 --- a/docs/admin/architectures/index.md +++ b/docs/admin/architectures/scale-testing.md @@ -1,90 +1,4 @@ -# Reference Architectures - -This document provides prescriptive solutions and reference architectures to -support successful deployments of up to 3000 users and outlines at a high-level -the methodology currently used to scale-test Coder. - -## General concepts - -This section outlines core concepts and terminology essential for understanding -Coder's architecture and deployment strategies. - -### Administrator - -An administrator is a user role within the Coder platform with elevated -privileges. Admins have access to administrative functions such as user -management, template definitions, insights, and deployment configuration. - -### Coder - -Coder, also known as _coderd_, is the main service recommended for deployment -with multiple replicas to ensure high availability. It provides an API for -managing workspaces and templates. Each _coderd_ replica has the capability to -host multiple [provisioners](#provisioner). - -### User - -A user is an individual who utilizes the Coder platform to develop, test, and -deploy applications using workspaces. Users can select available templates to -provision workspaces. They interact with Coder using the web interface, the CLI -tool, or directly calling API methods. - -### Workspace - -A workspace refers to an isolated development environment where users can write, -build, and run code. Workspaces are fully configurable and can be tailored to -specific project requirements, providing developers with a consistent and -efficient development environment. Workspaces can be autostarted and -autostopped, enabling efficient resource management. - -Users can connect to workspaces using SSH or via workspace applications like -`code-server`, facilitating collaboration and remote access. Additionally, -workspaces can be parameterized, allowing users to customize settings and -configurations based on their unique needs. Workspaces are instantiated using -Coder templates and deployed on resources created by provisioners. - -### Template - -A template in Coder is a predefined configuration for creating workspaces. -Templates streamline the process of workspace creation by providing -pre-configured settings, tooling, and dependencies. They are built by template -administrators on top of Terraform, allowing for efficient management of -infrastructure resources. Additionally, templates can utilize Coder modules to -leverage existing features shared with other templates, enhancing flexibility -and consistency across deployments. Templates describe provisioning rules for -infrastructure resources offered by Terraform providers. - -### Workspace Proxy - -A workspace proxy serves as a relay connection option for developers connecting -to their workspace over SSH, a workspace app, or through port forwarding. It -helps reduce network latency for geo-distributed teams by minimizing the -distance network traffic needs to travel. Notably, workspace proxies do not -handle dashboard connections or API calls. - -### Provisioner - -Provisioners in Coder execute Terraform during workspace and template builds. -While the platform includes built-in provisioner daemons by default, there are -advantages to employing external provisioners. These external daemons provide -secure build environments and reduce server load, improving performance and -scalability. Each provisioner can handle a single concurrent workspace build, -allowing for efficient resource allocation and workload management. - -### Registry - -The Coder Registry is a platform where you can find starter templates and -_Modules_ for various cloud services and platforms. - -Templates help create self-service development environments using -Terraform-defined infrastructure, while _Modules_ simplify template creation by -providing common features like workspace applications, third-party integrations, -or helper scripts. - -Please note that the Registry is a hosted service and isn't available for -offline use. - -## Scale-testing methodology +## Scale Testing Scaling Coder involves planning and testing to ensure it can handle more load without compromising service. This process encompasses infrastructure setup, @@ -95,7 +9,7 @@ A dedicated Kubernetes cluster for Coder is Kubernetes cluster specifically configured to host and manage Coder workloads. Kubernetes provides container orchestration capabilities, allowing Coder to efficiently deploy, scale, and manage workspaces across a distributed infrastructure. This ensures high -availability, fault tolerance, and scalability for Coder deployments. Code is +availability, fault tolerance, and scalability for Coder deployments. Coder is deployed on this cluster using the [Helm chart](../../install/kubernetes.md#install-coder-with-helm). @@ -315,96 +229,3 @@ Scaling down workspace nodes to zero is not recommended, as it will result in longer wait times for workspace provisioning by users. However, this may be necessary for workspaces with special resource requirements (e.g. GPUs) that incur significant cost overheads. - -### Data plane: External database - -While running in production, Coder requires a access to an external PostgreSQL -database. Depending on the scale of the user-base, workspace activity, and High -Availability requirements, the amount of CPU and memory resources required by -Coder's database may differ. - -#### Scaling formula - -When determining scaling requirements, take into account the following -considerations: - -- `2 vCPU x 8 GB RAM x 512 GB storage`: A baseline for database requirements for - Coder deployment with less than 1000 users, and low activity level (30% active - users). This capacity should be sufficient to support 100 external - provisioners. -- Storage size depends on user activity, workspace builds, log verbosity, - overhead on database encryption, etc. -- Allocate two additional CPU core to the database instance for every 1000 - active users. -- Enable _High Availability_ mode for database engine for large scale - deployments. - -If you enable [database encryption](../encryption.md) in Coder, consider -allocating an additional CPU core to every `coderd` replica. - -#### Performance optimization guidelines - -We provide the following general recommendations for PostgreSQL settings: - -- Increase number of vCPU if CPU utilization or database latency is high. -- Allocate extra memory if database performance is poor, CPU utilization is low, - and memory utilization is high. -- Utilize faster disk options (higher IOPS) such as SSDs or NVMe drives for - optimal performance enhancement and possibly reduce database load. - -## Operational readiness - -Operational readiness in Coder is about ensuring that everything is set up -correctly before launching a platform into production. It involves making sure -that the service is reliable, secure, and easily scales accordingly to user-base -needs. Operational readiness is crucial because it helps prevent issues that -could affect workspace users experience once the platform is live. - -Learn about Coder design principles and architectural best practices described -in the -[Well-Architected Framework](https://coder.com/blog/coder-well-architected-framework). - -### Configuration - -1. Identify the required Helm values for configuration. -1. Create `values.yaml` and add it to a version control system. _Note:_ it is - highly recommended that you create a custom `values.yaml` as opposed to - copying the entire default values. -1. Determine the necessary environment variables. - -### Template configuration - -1. Establish a dedicated user account for the _Template Administrator_. -1. Maintain Coder templates using version control. -1. Consider implementing a GitOps workflow to automatically push new template. - For example, on Github, you can use the - [Update Coder Template](https://github.com/marketplace/actions/update-coder-template) - action. -1. Evaluate enabling automatic template updates upon workspace startup. - -### Deployment - -1. Leverage automation tooling to automate deployment and upgrades of Coder. - -### Observability - -1. Enable the Prometheus endpoint (environment variable: - `CODER_PROMETHEUS_ENABLE`). -1. Deploy a visual monitoring system such as Grafana for metrics visualization. -1. Deploy a centralized logs aggregation solution to collect and monitor - application logs. -1. Review the [Prometheus response](../prometheus.md) and set up alarms on - selected metrics. - -### Database backups - -1. Prepare internal scripts for dumping and restoring databases. -1. Schedule regular database backups, especially before release upgrades. - -### User support - -1. Incorporate [support links](../appearance.md#support-links) into internal - documentation accessible from the user context menu. Ensure that hyperlinks - are valid and lead to up-to-date materials. -1. Encourage the use of `coder support bundle` to allow workspace users to - generate and provide network-related diagnostic data. diff --git a/docs/admin/architectures/validated-arch.md b/docs/admin/architectures/validated-arch.md new file mode 100644 index 0000000000000..8c6658b711b5f --- /dev/null +++ b/docs/admin/architectures/validated-arch.md @@ -0,0 +1,334 @@ +# Coder Validated Architecture + +Many customers operate Coder in complex organizational environments, consisting +of multiple business units, agencies, and/or subsidiaries. This can lead to +numerous Coder deployments, caused by discrepancies in regulatory compliance, +data sovereignty, and level of funding across groups. The Coder Validated Architecture +(CVA) prescribes a Kubernetes-based deployment approach, enabling your organization +to deploy a stable Coder instance that is easier to maintain and troubleshoot. + +The following sections will detail the components of the Coder Validated +Architecture, provide guidance on how to configure and deploy these components, +and offer insights into how to maintain and troubleshoot your Coder environment. + +- [General concepts](#general-concepts) +- [Kubernetes Infrastructure](#kubernetes-infrastructure) +- [PostgreSQL Database](#postgresql-database) +- [Operational readiness](#operational-readiness) + +## Who is this document for? + +This guide targets the following personas. It assumes a basic understanding of +cloud/on-premise computing, containerization, and the Coder platform. + +| Role | Description | +| ------------------------- | ------------------------------------------------------------------------------ | +| Platform Engineers | Responsible for deploying, operating the Coder deployment and infrastructure | +| Enterprise Architects | Responsible for architecting Coder deployments to meet enterprise requirements | +| Managed Service Providers | Entities that deploy and run Coder software as a service for customers | + +## CVA Guidance + +| CVA provides: | CVA does not provide: | +| ---------------------------------------------- | ---------------------------------------------------------------------------------------- | +| Single and multi-region K8s deployment options | Prescribing OS, or cloud vs. on-premise | +| Reference architectures for up to 3,000 users | An approval of your architecture; the CVA solely provides recommendations and guidelines | +| Best practices for building a Coder deployment | Recommendations for every possible deployment scenario | + +> For higher level design principles and architectural best practices, see Coder's [Well-Architected Framework](https://coder.com/blog/coder-well-architected-framework). + +## General concepts + +This section outlines core concepts and terminology essential for understanding +Coder's architecture and deployment strategies. + +### Administrator + +An administrator is a user role within the Coder platform with elevated +privileges. Admins have access to administrative functions such as user +management, template definitions, insights, and deployment configuration. + +### Coder control plane + +Coder's control plane, also known as _coderd_, is the main service recommended for deployment +with multiple replicas to ensure high availability. It provides an API for +managing workspaces and templates, and serves the dashboard UI. In addition, +each _coderd_ replica hosts 3 Terraform [provisioners](#provisioner) by default. + +### User + +A [user](../users.md) is an individual who utilizes the Coder platform to +develop, test, and deploy applications using workspaces. Users can select +available templates to provision workspaces. They interact with Coder using the +web interface, the CLI tool, or directly calling API methods. + +### Workspace + +A [workspace](../../workspaces.md) refers to an isolated development environment +where users can write, build, and run code. Workspaces are fully configurable +and can be tailored to specific project requirements, providing developers with +a consistent and efficient development environment. Workspaces can be +autostarted and autostopped, enabling efficient resource management. + +Users can connect to workspaces using SSH or via workspace applications like +`code-server`, facilitating collaboration and remote access. Additionally, +workspaces can be parameterized, allowing users to customize settings and +configurations based on their unique needs. Workspaces are instantiated using +Coder templates and deployed on resources created by provisioners. + +### Template + +A [template](../../templates/index.md) in Coder is a predefined configuration +for creating workspaces. Templates streamline the process of workspace creation +by providing pre-configured settings, tooling, and dependencies. They are built +by template administrators on top of Terraform, allowing for efficient +management of infrastructure resources. Additionally, templates can utilize +Coder modules to leverage existing features shared with other templates, +enhancing flexibility and consistency across deployments. Templates describe +provisioning rules for infrastructure resources offered by Terraform providers. + +### Workspace Proxy + +A [workspace proxy](../workspace-proxies.md) serves as a relay connection option +for developers connecting to their workspace over SSH, a workspace app, or +through port forwarding. It helps reduce network latency for geo-distributed +teams by minimizing the distance network traffic needs to travel. Notably, +workspace proxies do not handle dashboard connections or API calls. + +### Provisioner + +Provisioners in Coder execute Terraform during workspace and template builds. +While the platform includes built-in provisioner daemons by default, there are +advantages to employing external provisioners. These external daemons provide +secure build environments and reduce server load, improving performance and +scalability. Each provisioner can handle a single concurrent workspace build, +allowing for efficient resource allocation and workload management. + +### Registry + +The [Coder Registry](https://registry.coder.com) is a platform where you can +find starter templates and _Modules_ for various cloud services and platforms. + +Templates help create self-service development environments using +Terraform-defined infrastructure, while _Modules_ simplify template creation by +providing common features like workspace applications, third-party integrations, +or helper scripts. + +Please note that the Registry is a hosted service and isn't available for +offline use. + +## Kubernetes Infrastructure + +Kubernetes is the recommended, and supported platform for deploying Coder in the enterprise. It +is the hosting platform of choice for a large majority of Coder's Fortune 500 customers, +and it is the platform in which we build and test against here at Coder. + +### General recommendations + +In general, it is recommended to deploy Coder into its own respective cluster, separate +from production applications. Keep in mind that Coder runs development workloads, +so the cluster should be deployed as such, without production-level configurations. + +### Compute + +Deploy your Kubernetes cluster with two node groups, one for Coder's control plane, +and another for user workspaces (if you intend on leveraging K8s for end-user compute). + +#### Control plane nodes + +The Coder control plane node group must be static, to prevent scale down events from +dropping pods, and thus dropping user connections to the dashboard UI and their workspaces. + +Coder's Helm Chart supports [defining nodeSelectors, affinities, and tolerations](https://github.com/coder/coder/blob/e96652ebbcdd7554977594286b32015115c3f5b6/helm/coder/values.yaml#L221-L249) +to schedule the control plane pods on the appropriate node group. + +#### Workspace nodes + +Coder workspaces can be deployed either as Pods or Deployments in Kubernetes. Configure +the workspace node group to be auto-scaling, to dynamically allocate compute as users +start/stop workspaces at the beginning and end of their day. Set nodeSelectors, affinities, +and tolerations in Coder templates to assign workspaces to the given node group: + +```hcl +resource "kubernetes_deployment" "coder" { + spec { + template { + metadata { + labels = { + app = "coder-workspace" + } + } + + spec { + affinity { + pod_anti_affinity { + preferred_during_scheduling_ignored_during_execution { + weight = 1 + pod_affinity_term { + label_selector { + match_expressions { + key = "app.kubernetes.io/instance" + operator = "In" + values = ["coder-workspace"] + } + } + topology_key = # add your node group label here + } + } + } + } + + tolerations { + # Add your tolerations here + } + + node_selector { + # Add your node selectors here + } + + container { + image = "coder-workspace:latest" + name = "dev" + } + } + } + } +} +``` + +#### Node sizing + +For sizing recommendations, see the below reference architectures: + +[Up to 1,000 users](1k-users.md) + +[Up to 2,000 users](2k-users.md) + +[Up to 3,000 users](3k-users.md) + +### Networking + +It is likely your enterprise deploys Kubernetes clusters with various networking +restrictions. With this in mind, Coder requires the following connectivity: + +- Egress from workspace pods to the Coder control plane pods +- Egress from control plane pods to your PostgreSQL database +- Egress from control plane pods to your version control and artifact repositories +- Ingress from user endpoints to the control plane Load Balancer or Ingress controller + +We recommend configuring your network policies in accordance with the above. +Note that Coder workspaces do not require any ports to be open. + +### Storage + +If running Coder workspaces as Kubernetes Pods or Deployments, you will need to +assign persistent storage. We recommend leveraging a [supported Container Storage +Interface (CSI) driver](https://kubernetes-csi.github.io/docs/drivers.html) in your cluster, +with Dynamic Provisioning and read/write, to provide on-demand storage to end-user workspaces. + +The following Kubernetes volume types have been validated by Coder internally, and/or by our customers: + +- [PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/volumes/#persistentvolumeclaim) +- [NFS](https://kubernetes.io/docs/concepts/storage/volumes/#nfs) +- [subPath](https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath) +- [cephfs](https://kubernetes.io/docs/concepts/storage/volumes/#cephfs) + +Our [example Kubernetes workspace template](https://github.com/coder/coder/blob/5b9a65e5c137232351381fc337d9784bc9aeecfc/examples/templates/kubernetes/main.tf#L191-L219) +provisions a PersistentVolumeClaim block storage device, attached to the Deployment. + +It is not recommended to mount volumes from the host node(s) into workspaces, +for security and reliability purposes. The below volume types are _not_ recommended for use +with Coder: + +- [Local](https://kubernetes.io/docs/concepts/storage/volumes/#local) +- [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) + +Not that Coder's control plane filesystem is ephemeral, so no persistent storage +is required. + +## PostgreSQL database + +Coder requires access to an external PostgreSQL database to store user data, +workspace state, template files, and more. Depending on the scale of the +user-base, workspace activity, and High Availability requirements, the amount of +CPU and memory resources required by Coder's database may differ. + +### Disaster recovery + +Prepare internal scripts for dumping and restoring your database. We recommend scheduling +regular database backups, especially before upgrading Coder to a new release. Coder +does not support downgrades without initially restoring the database to the prior version. + +### Performance efficiency + +We highly recommend deploying the PostgreSQL instance in the same region (and if +possible, same availability zone) as the Coder server to optimize for low +latency connections. We recommend keeping latency under 10ms between the Coder +server and database. + +When determining scaling requirements, take into account the following +considerations: + +- `2 vCPU x 8 GB RAM x 512 GB storage`: A baseline for database requirements for + Coder deployment with less than 1000 users, and low activity level (30% active + users). This capacity should be sufficient to support 100 external + provisioners. +- Storage size depends on user activity, workspace builds, log verbosity, + overhead on database encryption, etc. +- Allocate two additional CPU core to the database instance for every 1000 + active users. +- Enable High Availability mode for database engine for large scale deployments. + +If you enable [database encryption](../encryption.md) in Coder, consider +allocating an additional CPU core to every `coderd` replica. + +#### Resource utilization guidelines + +Below are general recommendations for sizing your PostgreSQL instance: + +- Increase number of vCPU if CPU utilization or database latency is high. +- Allocate extra memory if database performance is poor, CPU utilization is low, + and memory utilization is high. +- Utilize faster disk options (higher IOPS) such as SSDs or NVMe drives for + optimal performance enhancement and possibly reduce database load. + +## Operational readiness + +Operational readiness in Coder is about ensuring that everything is set up +correctly before launching a platform into production. It involves making sure +that the service is reliable, secure, and easily scales accordingly to user-base +needs. Operational readiness is crucial because it helps prevent issues that +could affect workspace users experience once the platform is live. + +### Helm Chart Configuration + +1. Reference our [Helm chart values file](../../../helm/coder/values.yaml) and identify the required values for deployment. +1. Create a `values.yaml` and add it to your version control system. +1. Determine the necessary environment variables. Here is the [full list of supported server environment variables](../../cli/server.md). +1. Follow our documented [steps for installing Coder via Helm](../../install/kubernetes.md). + +### Template configuration + +1. Establish dedicated accounts for users with the _Template Administrator_ role. +1. Maintain Coder templates using [version control](../../templates/change-management.md). +1. Consider implementing a GitOps workflow to automatically push new template versions into Coder from git. + For example, on Github, you can use the + [Update Coder Template](https://github.com/marketplace/actions/update-coder-template) + action. +1. Evaluate enabling [automatic template updates](../../templates/general-settings.md#require-automatic-updates-enterprise) upon workspace startup. + +### Observability + +1. Enable the Prometheus endpoint (environment variable: + `CODER_PROMETHEUS_ENABLE`). +1. Deploy the [Coder Observability bundle](https://github.com/coder/observability) to leverage pre-configured dashboards, alerts, and runbooks for monitoring Coder. This includes integrations between Prometheus, Grafana, Loki, and Alertmanager. +1. Review the [Prometheus response](../prometheus.md) and set up alarms on + selected metrics. + +### User support + +1. Incorporate [support links](../appearance.md#support-links) into internal + documentation accessible from the user context menu. Ensure that hyperlinks + are valid and lead to up-to-date materials. +1. Encourage the use of `coder support bundle` to allow workspace users to + generate and provide network-related diagnostic data. diff --git a/docs/manifest.json b/docs/manifest.json index 067aecac8e69c..8acd4ad517313 100644 --- a/docs/manifest.json +++ b/docs/manifest.json @@ -5,15 +5,7 @@ "title": "About", "description": "About Coder", "path": "./README.md", - "icon_path": "./images/icons/home.svg", - "children": [ - { - "title": "Architecture", - "description": "Learn how Coder works", - "path": "./about/architecture.md", - "icon_path": "./images/icons/protractor.svg" - } - ] + "icon_path": "./images/icons/home.svg" }, { "title": "Installation", @@ -401,11 +393,19 @@ "icon_path": "./images/icons/scale.svg" }, { - "title": "Reference Architectures", - "description": "Learn about reference architectures for Coder", - "path": "./admin/architectures/index.md", + "title": "Architecture", + "description": "Learn about validated and reference architectures for Coder", + "path": "./admin/architectures/architecture.md", "icon_path": "./images/icons/scale.svg", "children": [ + { + "title": "Validated Architecture", + "path": "./admin/architectures/validated-arch.md" + }, + { + "title": "Scale Testing", + "path": "./admin/architectures/scale-testing.md" + }, { "title": "Up to 1,000 users", "path": "./admin/architectures/1k-users.md" From 5669d1b396782f42ec9a10b6c41ca0c191082ed3 Mon Sep 17 00:00:00 2001 From: Eric Date: Thu, 13 Jun 2024 15:39:47 +0000 Subject: [PATCH 2/6] make: fmt --- docs/admin/architectures/validated-arch.md | 124 +++++++++++++-------- 1 file changed, 77 insertions(+), 47 deletions(-) diff --git a/docs/admin/architectures/validated-arch.md b/docs/admin/architectures/validated-arch.md index 8c6658b711b5f..fc1b7b57f842d 100644 --- a/docs/admin/architectures/validated-arch.md +++ b/docs/admin/architectures/validated-arch.md @@ -3,9 +3,10 @@ Many customers operate Coder in complex organizational environments, consisting of multiple business units, agencies, and/or subsidiaries. This can lead to numerous Coder deployments, caused by discrepancies in regulatory compliance, -data sovereignty, and level of funding across groups. The Coder Validated Architecture -(CVA) prescribes a Kubernetes-based deployment approach, enabling your organization -to deploy a stable Coder instance that is easier to maintain and troubleshoot. +data sovereignty, and level of funding across groups. The Coder Validated +Architecture (CVA) prescribes a Kubernetes-based deployment approach, enabling +your organization to deploy a stable Coder instance that is easier to maintain +and troubleshoot. The following sections will detail the components of the Coder Validated Architecture, provide guidance on how to configure and deploy these components, @@ -29,13 +30,15 @@ cloud/on-premise computing, containerization, and the Coder platform. ## CVA Guidance -| CVA provides: | CVA does not provide: | +| CVA provides: | CVA does not provide: | | ---------------------------------------------- | ---------------------------------------------------------------------------------------- | | Single and multi-region K8s deployment options | Prescribing OS, or cloud vs. on-premise | | Reference architectures for up to 3,000 users | An approval of your architecture; the CVA solely provides recommendations and guidelines | | Best practices for building a Coder deployment | Recommendations for every possible deployment scenario | -> For higher level design principles and architectural best practices, see Coder's [Well-Architected Framework](https://coder.com/blog/coder-well-architected-framework). +> For higher level design principles and architectural best practices, see +> Coder's +> [Well-Architected Framework](https://coder.com/blog/coder-well-architected-framework). ## General concepts @@ -50,10 +53,11 @@ management, template definitions, insights, and deployment configuration. ### Coder control plane -Coder's control plane, also known as _coderd_, is the main service recommended for deployment -with multiple replicas to ensure high availability. It provides an API for -managing workspaces and templates, and serves the dashboard UI. In addition, -each _coderd_ replica hosts 3 Terraform [provisioners](#provisioner) by default. +Coder's control plane, also known as _coderd_, is the main service recommended +for deployment with multiple replicas to ensure high availability. It provides +an API for managing workspaces and templates, and serves the dashboard UI. In +addition, each _coderd_ replica hosts 3 Terraform [provisioners](#provisioner) +by default. ### User @@ -119,35 +123,43 @@ offline use. ## Kubernetes Infrastructure -Kubernetes is the recommended, and supported platform for deploying Coder in the enterprise. It -is the hosting platform of choice for a large majority of Coder's Fortune 500 customers, -and it is the platform in which we build and test against here at Coder. +Kubernetes is the recommended, and supported platform for deploying Coder in the +enterprise. It is the hosting platform of choice for a large majority of Coder's +Fortune 500 customers, and it is the platform in which we build and test against +here at Coder. ### General recommendations -In general, it is recommended to deploy Coder into its own respective cluster, separate -from production applications. Keep in mind that Coder runs development workloads, -so the cluster should be deployed as such, without production-level configurations. +In general, it is recommended to deploy Coder into its own respective cluster, +separate from production applications. Keep in mind that Coder runs development +workloads, so the cluster should be deployed as such, without production-level +configurations. ### Compute -Deploy your Kubernetes cluster with two node groups, one for Coder's control plane, -and another for user workspaces (if you intend on leveraging K8s for end-user compute). +Deploy your Kubernetes cluster with two node groups, one for Coder's control +plane, and another for user workspaces (if you intend on leveraging K8s for +end-user compute). #### Control plane nodes -The Coder control plane node group must be static, to prevent scale down events from -dropping pods, and thus dropping user connections to the dashboard UI and their workspaces. +The Coder control plane node group must be static, to prevent scale down events +from dropping pods, and thus dropping user connections to the dashboard UI and +their workspaces. -Coder's Helm Chart supports [defining nodeSelectors, affinities, and tolerations](https://github.com/coder/coder/blob/e96652ebbcdd7554977594286b32015115c3f5b6/helm/coder/values.yaml#L221-L249) +Coder's Helm Chart supports +[defining nodeSelectors, affinities, and tolerations](https://github.com/coder/coder/blob/e96652ebbcdd7554977594286b32015115c3f5b6/helm/coder/values.yaml#L221-L249) to schedule the control plane pods on the appropriate node group. #### Workspace nodes -Coder workspaces can be deployed either as Pods or Deployments in Kubernetes. Configure -the workspace node group to be auto-scaling, to dynamically allocate compute as users -start/stop workspaces at the beginning and end of their day. Set nodeSelectors, affinities, -and tolerations in Coder templates to assign workspaces to the given node group: +Coder workspaces can be deployed either as Pods or Deployments in Kubernetes. +See our +[example Kubernetes workspace template](https://github.com/coder/coder/tree/main/examples/templates/kubernetes). +Configure the workspace node group to be auto-scaling, to dynamically allocate +compute as users start/stop workspaces at the beginning and end of their day. +Set nodeSelectors, affinities, and tolerations in Coder templates to assign +workspaces to the given node group: ```hcl resource "kubernetes_deployment" "coder" { @@ -211,10 +223,12 @@ For sizing recommendations, see the below reference architectures: It is likely your enterprise deploys Kubernetes clusters with various networking restrictions. With this in mind, Coder requires the following connectivity: -- Egress from workspace pods to the Coder control plane pods +- Egress from workspace compute to the Coder control plane pods - Egress from control plane pods to your PostgreSQL database -- Egress from control plane pods to your version control and artifact repositories -- Ingress from user endpoints to the control plane Load Balancer or Ingress controller +- Egress from control plane pods to your version control and artifact + repositories +- Ingress from user endpoints to the control plane Load Balancer or Ingress + controller We recommend configuring your network policies in accordance with the above. Note that Coder workspaces do not require any ports to be open. @@ -222,23 +236,27 @@ Note that Coder workspaces do not require any ports to be open. ### Storage If running Coder workspaces as Kubernetes Pods or Deployments, you will need to -assign persistent storage. We recommend leveraging a [supported Container Storage -Interface (CSI) driver](https://kubernetes-csi.github.io/docs/drivers.html) in your cluster, -with Dynamic Provisioning and read/write, to provide on-demand storage to end-user workspaces. +assign persistent storage. We recommend leveraging a +[supported Container Storage Interface (CSI) driver](https://kubernetes-csi.github.io/docs/drivers.html) +in your cluster, with Dynamic Provisioning and read/write, to provide on-demand +storage to end-user workspaces. -The following Kubernetes volume types have been validated by Coder internally, and/or by our customers: +The following Kubernetes volume types have been validated by Coder internally, +and/or by our customers: - [PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/volumes/#persistentvolumeclaim) - [NFS](https://kubernetes.io/docs/concepts/storage/volumes/#nfs) - [subPath](https://kubernetes.io/docs/concepts/storage/volumes/#using-subpath) - [cephfs](https://kubernetes.io/docs/concepts/storage/volumes/#cephfs) -Our [example Kubernetes workspace template](https://github.com/coder/coder/blob/5b9a65e5c137232351381fc337d9784bc9aeecfc/examples/templates/kubernetes/main.tf#L191-L219) -provisions a PersistentVolumeClaim block storage device, attached to the Deployment. +Our +[example Kubernetes workspace template](https://github.com/coder/coder/blob/5b9a65e5c137232351381fc337d9784bc9aeecfc/examples/templates/kubernetes/main.tf#L191-L219) +provisions a PersistentVolumeClaim block storage device, attached to the +Deployment. It is not recommended to mount volumes from the host node(s) into workspaces, -for security and reliability purposes. The below volume types are _not_ recommended for use -with Coder: +for security and reliability purposes. The below volume types are _not_ +recommended for use with Coder: - [Local](https://kubernetes.io/docs/concepts/storage/volumes/#local) - [hostPath](https://kubernetes.io/docs/concepts/storage/volumes/#hostpath) @@ -255,9 +273,10 @@ CPU and memory resources required by Coder's database may differ. ### Disaster recovery -Prepare internal scripts for dumping and restoring your database. We recommend scheduling -regular database backups, especially before upgrading Coder to a new release. Coder -does not support downgrades without initially restoring the database to the prior version. +Prepare internal scripts for dumping and restoring your database. We recommend +scheduling regular database backups, especially before upgrading Coder to a new +release. Coder does not support downgrades without initially restoring the +database to the prior version. ### Performance efficiency @@ -302,26 +321,37 @@ could affect workspace users experience once the platform is live. ### Helm Chart Configuration -1. Reference our [Helm chart values file](../../../helm/coder/values.yaml) and identify the required values for deployment. +1. Reference our [Helm chart values file](../../../helm/coder/values.yaml) and + identify the required values for deployment. 1. Create a `values.yaml` and add it to your version control system. -1. Determine the necessary environment variables. Here is the [full list of supported server environment variables](../../cli/server.md). -1. Follow our documented [steps for installing Coder via Helm](../../install/kubernetes.md). +1. Determine the necessary environment variables. Here is the + [full list of supported server environment variables](../../cli/server.md). +1. Follow our documented + [steps for installing Coder via Helm](../../install/kubernetes.md). ### Template configuration -1. Establish dedicated accounts for users with the _Template Administrator_ role. -1. Maintain Coder templates using [version control](../../templates/change-management.md). -1. Consider implementing a GitOps workflow to automatically push new template versions into Coder from git. - For example, on Github, you can use the +1. Establish dedicated accounts for users with the _Template Administrator_ + role. +1. Maintain Coder templates using + [version control](../../templates/change-management.md). +1. Consider implementing a GitOps workflow to automatically push new template + versions into Coder from git. For example, on Github, you can use the [Update Coder Template](https://github.com/marketplace/actions/update-coder-template) action. -1. Evaluate enabling [automatic template updates](../../templates/general-settings.md#require-automatic-updates-enterprise) upon workspace startup. +1. Evaluate enabling + [automatic template updates](../../templates/general-settings.md#require-automatic-updates-enterprise) + upon workspace startup. ### Observability 1. Enable the Prometheus endpoint (environment variable: `CODER_PROMETHEUS_ENABLE`). -1. Deploy the [Coder Observability bundle](https://github.com/coder/observability) to leverage pre-configured dashboards, alerts, and runbooks for monitoring Coder. This includes integrations between Prometheus, Grafana, Loki, and Alertmanager. +1. Deploy the + [Coder Observability bundle](https://github.com/coder/observability) to + leverage pre-configured dashboards, alerts, and runbooks for monitoring + Coder. This includes integrations between Prometheus, Grafana, Loki, and + Alertmanager. 1. Review the [Prometheus response](../prometheus.md) and set up alarms on selected metrics. From da97e600dd6675eaa37bbbd22b228c3b813bf187 Mon Sep 17 00:00:00 2001 From: Eric Date: Thu, 13 Jun 2024 15:44:47 +0000 Subject: [PATCH 3/6] formatting --- docs/admin/architectures/validated-arch.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/admin/architectures/validated-arch.md b/docs/admin/architectures/validated-arch.md index fc1b7b57f842d..0595b29cf3019 100644 --- a/docs/admin/architectures/validated-arch.md +++ b/docs/admin/architectures/validated-arch.md @@ -212,11 +212,11 @@ resource "kubernetes_deployment" "coder" { For sizing recommendations, see the below reference architectures: -[Up to 1,000 users](1k-users.md) +- [Up to 1,000 users](1k-users.md) -[Up to 2,000 users](2k-users.md) +- [Up to 2,000 users](2k-users.md) -[Up to 3,000 users](3k-users.md) +- [Up to 3,000 users](3k-users.md) ### Networking @@ -224,10 +224,9 @@ It is likely your enterprise deploys Kubernetes clusters with various networking restrictions. With this in mind, Coder requires the following connectivity: - Egress from workspace compute to the Coder control plane pods -- Egress from control plane pods to your PostgreSQL database -- Egress from control plane pods to your version control and artifact - repositories -- Ingress from user endpoints to the control plane Load Balancer or Ingress +- Egress from control plane pods to Coder's PostgreSQL database +- Egress from control plane pods to git and package repositories +- Ingress from user devices to the control plane Load Balancer or Ingress controller We recommend configuring your network policies in accordance with the above. From eb50dc1d12aee044cfb1e944490a9eb2359f71ae Mon Sep 17 00:00:00 2001 From: Eric Date: Thu, 13 Jun 2024 16:13:48 +0000 Subject: [PATCH 4/6] fix 404s --- docs/admin/scale.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/admin/scale.md b/docs/admin/scale.md index 883516d9146f7..6c13fcd4b18e6 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -4,14 +4,14 @@ infrastructure. For scale-testing Kubernetes clusters we recommend to install and use the dedicated Coder template, [scaletest-runner](https://github.com/coder/coder/tree/main/scaletest/templates/scaletest-runner). -Learn more about [Coder’s architecture](../about/architecture.md) and our -[scale-testing methodology](architectures/index.md#scale-testing-methodology). +Learn more about [Coder’s architecture](architectures/architecture.md) and our +[scale-testing methodology](architectures/scale-testing.md). ## Recent scale tests > Note: the below information is for reference purposes only, and are not > intended to be used as guidelines for infrastructure sizing. Review the -> [Reference Architectures](architectures/index.md) for hardware sizing +> [Reference Architectures](architectures/validated-arch.md#node-sizing) for hardware sizing > recommendations. | Environment | Coder CPU | Coder RAM | Coder Replicas | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested | From f8b8fb6f5ba92d721047e92774e7e788bb301efb Mon Sep 17 00:00:00 2001 From: Eric Date: Thu, 13 Jun 2024 16:27:40 +0000 Subject: [PATCH 5/6] fix 404s pt 2 --- docs/admin/architectures/architecture.md | 25 +++++++++++------------- docs/admin/scale.md | 4 ++-- 2 files changed, 13 insertions(+), 16 deletions(-) diff --git a/docs/admin/architectures/architecture.md b/docs/admin/architectures/architecture.md index af826ef784145..d96a20d8813db 100644 --- a/docs/admin/architectures/architecture.md +++ b/docs/admin/architectures/architecture.md @@ -4,9 +4,6 @@ The Coder deployment model is flexible and offers various components that platform administrators can deploy and scale depending on their use case. This page describes possible deployments, challenges, and risks associated with them. -Learn more about our [Reference Architectures](../admin/architectures/index.md) -and platform scaling capabilities. - ## Primary components ### coderd @@ -29,7 +26,7 @@ _provisionerd_ is the execution context for infrastructure modifying providers. At the moment, the only provider is Terraform (running `terraform`). By default, the Coder server runs multiple provisioner daemons. -[External provisioners](../admin/provisioners.md) can be added for security or +[External provisioners](../provisioners.md) can be added for security or scalability purposes. ### Agents @@ -46,7 +43,7 @@ It offers the following services along with much more: - `startup_script` automation Templates are responsible for -[creating and running agents](../templates/index.md#coder-agent) within +[creating and running agents](../../templates/index.md#coder-agent) within workspaces. ### Service Bundling @@ -76,7 +73,7 @@ they're destroyed on workspace stop. ### Single region architecture -![Architecture Diagram](../images/architecture-single-region.png) +![Architecture Diagram](../../images/architecture-single-region.png) #### Components @@ -125,7 +122,7 @@ and _Coder workspaces_ deployed in the same region. ### Multi-region architecture -![Architecture Diagram](../images/architecture-multi-region.png) +![Architecture Diagram](../../images/architecture-multi-region.png) #### Components @@ -171,7 +168,7 @@ disruptions. Additionally, multi-cloud deployment enables organizations to leverage the unique features and capabilities offered by each cloud provider, such as region availability and pricing models. -![Architecture Diagram](../images/architecture-multi-cloud.png) +![Architecture Diagram](../../images/architecture-multi-cloud.png) #### Components @@ -205,7 +202,7 @@ nearest region and technical specifications provided by the cloud providers. **Workspace proxy** - _Security recommendation_: Use `coder` CLI to create - [authentication tokens for every workspace proxy](../admin/workspace-proxies.md#requirements), + [authentication tokens for every workspace proxy](../workspace-proxies.md#requirements), and keep them in regional secret stores. Remember to distribute them using safe, encrypted communication channel. @@ -226,8 +223,8 @@ nearest region and technical specifications provided by the cloud providers. See how to deploy [Coder on Azure Kubernetes Service](https://github.com/ericpaulsen/coder-aks). -Learn more about [security requirements](../install/kubernetes.md) for deploying -Coder on Kubernetes. +Learn more about [security requirements](../../install/kubernetes.md) for +deploying Coder on Kubernetes. **Load balancer** @@ -286,9 +283,9 @@ The key features of the air-gapped architecture include: - _Secure data transfer_: Enable encrypted communication channels and robust access controls to safeguard sensitive information. -Learn more about [offline deployments](../install/offline.md) of Coder. +Learn more about [offline deployments](../../install/offline.md) of Coder. -![Architecture Diagram](../images/architecture-air-gapped.png) +![Architecture Diagram](../../images/architecture-air-gapped.png) #### Components @@ -363,7 +360,7 @@ Learn more about [Dev containers support](https://coder.com/docs/v2/latest/templates/dev-containers) in Coder. -![Architecture Diagram](../images/architecture-devcontainers.png) +![Architecture Diagram](../../images/architecture-devcontainers.png) #### Components diff --git a/docs/admin/scale.md b/docs/admin/scale.md index 6c13fcd4b18e6..d8569fb8dffef 100644 --- a/docs/admin/scale.md +++ b/docs/admin/scale.md @@ -11,8 +11,8 @@ Learn more about [Coder’s architecture](architectures/architecture.md) and our > Note: the below information is for reference purposes only, and are not > intended to be used as guidelines for infrastructure sizing. Review the -> [Reference Architectures](architectures/validated-arch.md#node-sizing) for hardware sizing -> recommendations. +> [Reference Architectures](architectures/validated-arch.md#node-sizing) for +> hardware sizing recommendations. | Environment | Coder CPU | Coder RAM | Coder Replicas | Database | Users | Concurrent builds | Concurrent connections (Terminal/SSH) | Coder Version | Last tested | | ---------------- | --------- | --------- | -------------- | ----------------- | ----- | ----------------- | ------------------------------------- | ------------- | ------------ | From 1f3ea07d23f84eca644db49e623c952f188077f0 Mon Sep 17 00:00:00 2001 From: Eric Date: Thu, 13 Jun 2024 16:48:53 +0000 Subject: [PATCH 6/6] fix 404s pt 3 --- docs/admin/architectures/architecture.md | 6 +++--- docs/platforms/other.md | 3 ++- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/docs/admin/architectures/architecture.md b/docs/admin/architectures/architecture.md index d96a20d8813db..318e8e7d5356a 100644 --- a/docs/admin/architectures/architecture.md +++ b/docs/admin/architectures/architecture.md @@ -118,7 +118,7 @@ and _Coder workspaces_ deployed in the same region. - Integrate with existing Single Sign-On (SSO) solutions used within the organization via the supported OAuth 2.0 or OpenID Connect standards. -- Learn more about [Authentication in Coder](../admin/auth.md). +- Learn more about [Authentication in Coder](../auth.md). ### Multi-region architecture @@ -327,8 +327,8 @@ across multiple regions and diverse cloud platforms. - Since the _Registry_ is isolated from the internet, platform engineers are responsible for maintaining Workspace container images and conducting periodic updates of base Docker images. -- It is recommended to keep [Dev Containers](../templates/dev-containers.md) up - to date with the latest released +- It is recommended to keep [Dev Containers](../../templates/dev-containers.md) + up to date with the latest released [Envbuilder](https://github.com/coder/envbuilder) runtime. **Mirror of Terraform Registry** diff --git a/docs/platforms/other.md b/docs/platforms/other.md index d2f08ebd2d357..474efe56a46e2 100644 --- a/docs/platforms/other.md +++ b/docs/platforms/other.md @@ -3,7 +3,8 @@ Coder is highly extensible and is not limited to the platforms outlined in these docs. The control plane can be provisioned on any VM or container compute, and workspaces can include any Terraform resource. See our -[architecture diagram](../about/architecture.md) for more details. +[architecture documentation](../admin/architectures/architecture.md) for more +details. The following resources may help as you're deploying Coder.