Skip to content

chore: add terraform for spinning up load test cluster #7504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
May 15, 2023
Merged
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
29016dc
Port over initial version of v1 loadtest infra tf code
johnstcn May 5, 2023
8759e23
Fix VPC peering for CloudSQL
johnstcn May 8, 2023
bef9887
create clusters in vpc native networking mode
johnstcn May 8, 2023
ea42b44
gitignore tfstate
johnstcn May 8, 2023
9875f84
create pg database and user, fix db url
johnstcn May 8, 2023
8fb3511
ensure db is destroyed properly with terraform destroy
johnstcn May 8, 2023
13ca9e4
enable prometheus, add podmonitor spec
johnstcn May 10, 2023
3a3509b
add inline kubernetes template
johnstcn May 10, 2023
41ab251
add script to init coder instance and import template
johnstcn May 11, 2023
0bbf206
modify tls cert def
johnstcn May 11, 2023
d4b1fe6
fixup template
johnstcn May 11, 2023
2b5a15b
multiple fixes
johnstcn May 11, 2023
dbcfc64
rebuild docker image with certs
johnstcn May 11, 2023
69bdfd1
remove self-signed https for now
johnstcn May 12, 2023
99c0f3c
move monitoring manifest out of helm chart
johnstcn May 12, 2023
ccda05d
move generated files into .coderv2, create shim script
johnstcn May 12, 2023
6ace619
adjust template limits
johnstcn May 12, 2023
01c6d39
make fmt
johnstcn May 12, 2023
9f7c165
make lint
johnstcn May 12, 2023
660959c
make gen
johnstcn May 12, 2023
34f8b02
update README
johnstcn May 12, 2023
75d1746
fixup! update README
johnstcn May 12, 2023
5a3c801
update cluster monitoriong and workload identity
johnstcn May 12, 2023
caa04d4
fix coder depoyment node affinity
johnstcn May 12, 2023
9419701
address PR comments
johnstcn May 15, 2023
435e74d
make fmt
johnstcn May 15, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -48,9 +48,14 @@ site/stats/
*.lock.hcl
.terraform/

/.coderv2/*
**/.coderv2/*
**/__debug_bin

# direnv
.envrc
*.test

# Loadtesting
./scaletest/terraform/.terraform
./scaletest/terraform/.terraform.lock.hcl
terraform.tfstate.*
7 changes: 6 additions & 1 deletion .prettierignore
Original file line number Diff line number Diff line change
@@ -51,12 +51,17 @@ site/stats/
*.lock.hcl
.terraform/

/.coderv2/*
**/.coderv2/*
**/__debug_bin

# direnv
.envrc
*.test

# Loadtesting
./scaletest/terraform/.terraform
./scaletest/terraform/.terraform.lock.hcl
terraform.tfstate.*
# .prettierignore.include:
# Helm templates contain variables that are invalid YAML and can't be formatted
# by Prettier.
40 changes: 40 additions & 0 deletions scaletest/terraform/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Load Test Terraform

This folder contains Terraform code and scripts to aid in performing load tests of Coder.
It does the following:

- Creates a GCP VPC.
- Creates a CloudSQL instance with a global peering rule so it's accessible inside the VPC.
- Creates a GKE cluster inside the VPC with separate nodegroups for Coder and workspaces.
- Installs Coder in a new namespace, using the CloudSQL instance.

## Usage

> You must have an existing Google Cloud project available.

1. Create a file named `override.tfvars` with the following content, modifying as appropriate:

```terraform
name = "some_unique_identifier"
project_id = "some_google_project_id"
```

1. Inspect `vars.tf` and override any other variables you deem necessary.

1. Run `terraform init`.

1. Run `terraform plan -var-file=override.tfvars` and inspect the output.
If you are not satisfied, modify `override.tfvars` until you are.

1. Run `terraform apply -var-file=override.tfvars`. This will spin up a pre-configured environment
and emit the Coder URL as an output.

1. Run `coder_init.sh <coder_url>` to setup an initial user and a pre-configured Kubernetes
template. It will also download the Coder CLI from the Coder instance locally.

1. Do whatever you need to do with the Coder instance.

> To run Coder commands against the instance, you can use `coder_shim.sh <command>`.
> You don't need to run `coder login` yourself.

1. When you are finished, you can run `terraform destroy -var-file=override.tfvars`.
250 changes: 250 additions & 0 deletions scaletest/terraform/coder.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
data "google_client_config" "default" {}

locals {
coder_helm_repo = "https://helm.coder.com/v2"
coder_helm_chart = "coder"
coder_release_name = var.name
coder_namespace = "coder-${var.name}"
coder_admin_email = "admin@coder.com"
coder_admin_user = "coder"
coder_address = google_compute_address.coder.address
coder_url = "http://${google_compute_address.coder.address}"
}

provider "kubernetes" {
host = "https://${google_container_cluster.primary.endpoint}"
cluster_ca_certificate = base64decode(google_container_cluster.primary.master_auth.0.cluster_ca_certificate)
token = data.google_client_config.default.access_token
}

provider "helm" {
kubernetes {
host = "https://${google_container_cluster.primary.endpoint}"
cluster_ca_certificate = base64decode(google_container_cluster.primary.master_auth.0.cluster_ca_certificate)
token = data.google_client_config.default.access_token
}
}

resource "kubernetes_namespace" "coder_namespace" {
metadata {
name = local.coder_namespace
}
depends_on = [
google_container_node_pool.coder
]
}

resource "random_password" "postgres-admin-password" {
length = 12
}

resource "random_password" "coder-postgres-password" {
length = 12
}

resource "kubernetes_secret" "coder-db" {
type = "" # Opaque
metadata {
name = "coder-db-url"
namespace = kubernetes_namespace.coder_namespace.metadata.0.name
}
data = {
url = "postgres://${google_sql_user.coder.name}:${urlencode(random_password.coder-postgres-password.result)}@${google_sql_database_instance.db.private_ip_address}/${google_sql_database.coder.name}?sslmode=disable"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine for now, but I think TLS would be better here & more realistic

}
}

resource "helm_release" "coder-chart" {
repository = local.coder_helm_repo
chart = local.coder_helm_chart
name = local.coder_release_name
version = var.coder_chart_version
namespace = kubernetes_namespace.coder_namespace.metadata.0.name
depends_on = [
google_container_node_pool.coder,
]
values = [<<EOF
coder:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "cloud.google.com/gke-nodepool"
operator: "In"
values: ["${google_container_node_pool.coder.name}"]
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: "kubernetes.io/hostname"
labelSelector:
matchExpressions:
- key: "app.kubernetes.io/instance"
operator: "In"
values: ["${local.coder_release_name}"]
env:
- name: "CODER_CACHE_DIRECTORY"
value: "/tmp/coder"
- name: "CODER_ENABLE_TELEMETRY"
value: "false"
- name: "CODER_LOGGING_HUMAN"
value: "/dev/null"
- name: "CODER_LOGGING_STACKDRIVER"
value: "/dev/stderr"
- name: "CODER_PG_CONNECTION_URL"
valueFrom:
secretKeyRef:
name: "${kubernetes_secret.coder-db.metadata.0.name}"
key: url
- name: "CODER_PROMETHEUS_ENABLE"
value: "true"
- name: "CODER_VERBOSE"
value: "true"
image:
repo: ${var.coder_image_repo}
tag: ${var.coder_image_tag}
replicaCount: "${var.coder_replicas}"
resources:
requests:
cpu: "${var.coder_cpu}"
memory: "${var.coder_mem}"
limits:
cpu: "${var.coder_cpu}"
memory: "${var.coder_mem}"
securityContext:
readOnlyRootFilesystem: true
service:
enable: true
loadBalancerIP: "${local.coder_address}"
volumeMounts:
- mountPath: "/tmp"
name: cache
readOnly: false
volumes:
- emptyDir:
sizeLimit: 1024Mi
name: cache
EOF
]
}

resource "local_file" "coder-monitoring-manifest" {
filename = "${path.module}/.coderv2/coder-monitoring.yaml"
content = <<EOF
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
namespace: ${kubernetes_namespace.coder_namespace.metadata.0.name}
name: coder-monitoring
spec:
selector:
matchLabels:
app.kubernetes.io/name: coder
endpoints:
- port: prometheus-http
interval: 30s
EOF
}

resource "null_resource" "coder-monitoring-manifest_apply" {
provisioner "local-exec" {
working_dir = "${abspath(path.module)}/.coderv2"
command = <<EOF
KUBECONFIG=${var.name}-cluster.kubeconfig gcloud container clusters get-credentials ${var.name}-cluster --project=${var.project_id} --zone=${var.zone} && \
KUBECONFIG=${var.name}-cluster.kubeconfig kubectl apply -f ${abspath(local_file.coder-monitoring-manifest.filename)}
EOF
}
}

resource "local_file" "kubernetes_template" {
filename = "${path.module}/.coderv2/templates/kubernetes/main.tf"
content = <<EOF
terraform {
required_providers {
coder = {
source = "coder/coder"
version = "~> 0.7.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.18"
}
}
}

provider "coder" {}

provider "kubernetes" {
config_path = null # always use host
}

data "coder_workspace" "me" {}

resource "coder_agent" "main" {
os = "linux"
arch = "amd64"
startup_script_timeout = 180
startup_script = ""
}

resource "kubernetes_pod" "main" {
count = data.coder_workspace.me.start_count
metadata {
name = "coder-$${lower(data.coder_workspace.me.owner)}-$${lower(data.coder_workspace.me.name)}"
namespace = "${kubernetes_namespace.coder_namespace.metadata.0.name}"
labels = {
"app.kubernetes.io/name" = "coder-workspace"
"app.kubernetes.io/instance" = "coder-workspace-$${lower(data.coder_workspace.me.owner)}-$${lower(data.coder_workspace.me.name)}"
}
}
spec {
security_context {
run_as_user = "1000"
fs_group = "1000"
}
container {
name = "dev"
image = "${var.workspace_image}"
image_pull_policy = "Always"
command = ["sh", "-c", coder_agent.main.init_script]
security_context {
run_as_user = "1000"
}
env {
name = "CODER_AGENT_TOKEN"
value = coder_agent.main.token
}
resources {
requests = {
"cpu" = "0.1"
"memory" = "128Mi"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this what we observe the agent using in practice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close enough at idle, yes.

}
limits = {
"cpu" = "1"
"memory" = "1Gi"
}
}
}

affinity {
node_affinity {
required_during_scheduling_ignored_during_execution {
node_selector_term {
match_expressions {
key = "cloud.google.com/gke-nodepool"
operator = "In"
values = ["${google_container_node_pool.workspaces.name}"]
}
}
}
}
}
}
}
EOF
}

output "coder_url" {
description = "URL of the Coder deployment"
value = local.coder_url
}
51 changes: 51 additions & 0 deletions scaletest/terraform/coder_init.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
#!/usr/bin/env bash

set -euo pipefail

if [[ $# -lt 1 ]]; then
echo "Usage: $0 <coder URL>"
exit 1
fi

# Allow toggling verbose output
[[ -n ${VERBOSE:-} ]] && set -x

CODER_URL=$1
CONFIG_DIR="${PWD}/.coderv2"
ARCH="$(arch)"
if [[ "$ARCH" == "x86_64" ]]; then
ARCH="amd64"
fi
PLATFORM="$(uname | tr '[:upper:]' '[:lower:]')"

mkdir -p "${CONFIG_DIR}"
echo "Fetching Coder CLI for first-time setup!"
curl -fsSLk "${CODER_URL}/bin/coder-${PLATFORM}-${ARCH}" -o "${CONFIG_DIR}/coder"
chmod +x "${CONFIG_DIR}/coder"

set +o pipefail
RANDOM_ADMIN_PASSWORD=$(tr </dev/urandom -dc _A-Z-a-z-0-9 | head -c16)
set -o pipefail
CODER_FIRST_USER_EMAIL="admin@coder.com"
CODER_FIRST_USER_USERNAME="coder"
CODER_FIRST_USER_PASSWORD="${RANDOM_ADMIN_PASSWORD}"
CODER_FIRST_USER_TRIAL="false"
echo "Running login command!"
"${CONFIG_DIR}/coder" login "${CODER_URL}" \
--global-config="${CONFIG_DIR}" \
--first-user-username="${CODER_FIRST_USER_USERNAME}" \
--first-user-email="${CODER_FIRST_USER_EMAIL}" \
--first-user-password="${CODER_FIRST_USER_PASSWORD}" \
--first-user-trial=false

echo "Writing credentials to ${CONFIG_DIR}/coder.env"
cat <<EOF >"${CONFIG_DIR}/coder.env"
CODER_FIRST_USER_EMAIL=admin@coder.com
CODER_FIRST_USER_USERNAME=coder
CODER_FIRST_USER_PASSWORD="${RANDOM_ADMIN_PASSWORD}"
CODER_FIRST_USER_TRIAL="${CODER_FIRST_USER_TRIAL}"
EOF

echo "Importing kubernetes template"
"${CONFIG_DIR}/coder" templates create --global-config="${CONFIG_DIR}" \
--directory "${CONFIG_DIR}/templates/kubernetes" --yes kubernetes
8 changes: 8 additions & 0 deletions scaletest/terraform/coder_shim.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash

# This is a shim for easily executing Coder commands against a loadtest cluster
# without having to overwrite your own session/URL
SCRIPT_DIR=$(dirname "${BASH_SOURCE[0]}")
CONFIG_DIR="${SCRIPT_DIR}/.coderv2"
CODER_BIN="${CONFIG_DIR}/coder"
exec "${CODER_BIN}" --global-config "${CONFIG_DIR}" "$@"
Loading