Skip to content

Commit efbb558

Browse files
authored
chore: add scaletest convenience script (#7819)
- Adds a convenience script `scaletest.sh` to automate process of running scale tests - Enables pprof endpoint by default, and captures pprof traces before tearing down infra. - Improves idempotency of coder_init.sh - Removes the promtest.Float64 invocations in workspacetraffic runner, these metrics will be in prometheus. - Increases default workspace traffic output to 40KB/s/workspace.
1 parent 9ec1fcf commit efbb558

18 files changed

+347
-84
lines changed

.gitignore

+2-2
Original file line numberDiff line numberDiff line change
@@ -58,5 +58,5 @@ site/stats/
5858
# Loadtesting
5959
./scaletest/terraform/.terraform
6060
./scaletest/terraform/.terraform.lock.hcl
61-
terraform.tfstate.*
62-
**/*.tfvars
61+
scaletest/terraform/secrets.tfvars
62+
.terraform.tfstate.*

.prettierignore

+2-2
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,8 @@ site/stats/
6161
# Loadtesting
6262
./scaletest/terraform/.terraform
6363
./scaletest/terraform/.terraform.lock.hcl
64-
terraform.tfstate.*
65-
**/*.tfvars
64+
scaletest/terraform/secrets.tfvars
65+
.terraform.tfstate.*
6666
# .prettierignore.include:
6767
# Helm templates contain variables that are invalid YAML and can't be formatted
6868
# by Prettier.

scaletest/README.md

+83
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
# Scale Testing
2+
3+
This folder contains CLI commands, Terraform code, and scripts to aid in performing load tests of Coder.
4+
At a high level, it performs the following steps:
5+
6+
- Using the Terraform code in `./terraform`, stands up a preconfigured Google Cloud environment
7+
consisting of a VPC, GKE Cluster, and CloudSQL instance.
8+
> **Note: You must have an existing Google Cloud project available.**
9+
- Creates a dedicated namespace for Coder and installs Coder using the Helm chart in this namespace.
10+
- Configures the Coder deployment with random credentials and a predefined Kubernetes template.
11+
> **Note:** These credentials are stored in `${PROJECT_ROOT}/scaletest/.coderv2/coder.env`.
12+
- Creates a number of workspaces and waits for them to all start successfully. These workspaces
13+
are ephemeral and do not contain any persistent resources.
14+
- Waits for 10 minutes to allow things to settle and establish a baseline.
15+
- Generates web terminal traffic to all workspaces for 30 minutes.
16+
- Directly after traffic generation, captures goroutine and heap snapshots of the Coder deployment.
17+
- Tears down all resources (unless `--skip-cleanup` is specified).
18+
19+
## Usage
20+
21+
The main entrypoint is the `scaletest.sh` script.
22+
23+
```console
24+
$ scaletest.sh --help
25+
Usage: scaletest.sh --name <name> --project <project> --num-workspaces <num-workspaces> --scenario <scenario> [--dry-run] [--skip-cleanup]
26+
```
27+
28+
### Required arguments:
29+
30+
- `--name`: Name for the loadtest. This is added as a prefix to resources created by Terraform (e.g. `joe-big-loadtest`).
31+
- `--project`: Google Cloud project in which to create the resources (example: `my-loadtest-project`).
32+
- `--num-workspaces`: Number of workspaces to create (example: `10`).
33+
- `--scenario`: Deployment scenario to use (example: `small`). See `terraform/scenario-*.tfvars`.
34+
35+
> **Note:** In order to capture Prometheus metrics, you must define the environment variables
36+
> `SCALETEST_PROMETHEUS_REMOTE_WRITE_USER` and `SCALETEST_PROMETHEUS_REMOTE_WRITE_PASSWORD`.
37+
38+
### Optional arguments:
39+
40+
- `--dry-run`: Do not perform any action and instead print what would be executed.
41+
- `--skip-cleanup`: Do not perform any cleanup. You will be responsible for deleting any resources this creates.
42+
43+
### Environment Variables
44+
45+
All of the above arguments may be specified as environment variables. Consult the script for details.
46+
47+
### Prometheus Metrics
48+
49+
To capture Prometheus metrics from the loadtest, two environment
50+
51+
## Scenarios
52+
53+
A scenario defines a number of variables that override the default Terraform variables.
54+
A number of existing scenarios are provided in `scaletest/terraform/scenario-*.tfvars`.
55+
56+
For example, `scenario-small.tfvars` includes the following variable definitions:
57+
58+
```
59+
nodepool_machine_type_coder = "t2d-standard-2"
60+
nodepool_machine_type_workspaces = "t2d-standard-2"
61+
coder_cpu = "1000m" # Leaving 1 CPU for system workloads
62+
coder_mem = "4Gi" # Leaving 4GB for system workloads
63+
```
64+
65+
To create your own scenario, simply add a new file `terraform/scenario-$SCENARIO_NAME.tfvars`.
66+
In this file, override variables as required, consulting `vars.tf` as needed.
67+
You can then use this scenario by specifying `--scenario $SCENARIO_NAME`.
68+
For example, if your scenario file were named `scenario-big-whopper2x.tfvars`, you would specify
69+
`--scenario=big-whopper2x`.
70+
71+
## Utility scripts
72+
73+
A number of utility scripts are provided in `lib`, and are used by `scaletest.sh`:
74+
75+
- `coder_shim.sh`: a convenience script to run the `coder` binary with a predefined config root.
76+
This is intended to allow running Coder CLI commands against the loadtest cluster without
77+
modifying a user's existing Coder CLI configuration.
78+
- `coder_init.sh`: Performs first-time user setup of an existing Coder instance, generating
79+
a random password for the admin user. The admin user is named `admin@coder.com` by default.
80+
Credentials are written to `scaletest/.coderv2/coder.env`.
81+
- `coder_workspacetraffic.sh`: Runs traffic generation against the loadtest cluster and creates
82+
a monitoring manifest for the traffic generation pod. This pod will restart automatically
83+
after the traffic generation has completed.

scaletest/terraform/coder_init.sh renamed to scaletest/lib/coder_init.sh

+20-8
Original file line numberDiff line numberDiff line change
@@ -11,17 +11,27 @@ fi
1111
[[ -n ${VERBOSE:-} ]] && set -x
1212

1313
CODER_URL=$1
14-
CONFIG_DIR="${PWD}/.coderv2"
14+
DRY_RUN="${DRY_RUN:-0}"
15+
PROJECT_ROOT="$(git rev-parse --show-toplevel)"
16+
# shellcheck source=scripts/lib.sh
17+
source "${PROJECT_ROOT}/scripts/lib.sh"
18+
CONFIG_DIR="${PROJECT_ROOT}/scaletest/.coderv2"
1519
ARCH="$(arch)"
1620
if [[ "$ARCH" == "x86_64" ]]; then
1721
ARCH="amd64"
1822
fi
1923
PLATFORM="$(uname | tr '[:upper:]' '[:lower:]')"
2024

21-
mkdir -p "${CONFIG_DIR}"
25+
if [[ -f "${CONFIG_DIR}/coder.env" ]]; then
26+
echo "Found existing coder.env in ${CONFIG_DIR}!"
27+
echo "Nothing to do, exiting."
28+
exit 0
29+
fi
30+
31+
maybedryrun "$DRY_RUN" mkdir -p "${CONFIG_DIR}"
2232
echo "Fetching Coder CLI for first-time setup!"
23-
curl -fsSLk "${CODER_URL}/bin/coder-${PLATFORM}-${ARCH}" -o "${CONFIG_DIR}/coder"
24-
chmod +x "${CONFIG_DIR}/coder"
33+
maybedryrun "$DRY_RUN" curl -fsSLk "${CODER_URL}/bin/coder-${PLATFORM}-${ARCH}" -o "${CONFIG_DIR}/coder"
34+
maybedryrun "$DRY_RUN" chmod +x "${CONFIG_DIR}/coder"
2535

2636
set +o pipefail
2737
RANDOM_ADMIN_PASSWORD=$(tr </dev/urandom -dc _A-Z-a-z-0-9 | head -c16)
@@ -31,21 +41,23 @@ CODER_FIRST_USER_USERNAME="coder"
3141
CODER_FIRST_USER_PASSWORD="${RANDOM_ADMIN_PASSWORD}"
3242
CODER_FIRST_USER_TRIAL="false"
3343
echo "Running login command!"
34-
"${CONFIG_DIR}/coder" login "${CODER_URL}" \
44+
DRY_RUN="$DRY_RUN" "${PROJECT_ROOT}/scaletest/lib/coder_shim.sh" login "${CODER_URL}" \
3545
--global-config="${CONFIG_DIR}" \
3646
--first-user-username="${CODER_FIRST_USER_USERNAME}" \
3747
--first-user-email="${CODER_FIRST_USER_EMAIL}" \
3848
--first-user-password="${CODER_FIRST_USER_PASSWORD}" \
3949
--first-user-trial=false
4050

4151
echo "Writing credentials to ${CONFIG_DIR}/coder.env"
42-
cat <<EOF >"${CONFIG_DIR}/coder.env"
52+
maybedryrun "$DRY_RUN" cat <<EOF >"${CONFIG_DIR}/coder.env"
4353
CODER_FIRST_USER_EMAIL=admin@coder.com
4454
CODER_FIRST_USER_USERNAME=coder
4555
CODER_FIRST_USER_PASSWORD="${RANDOM_ADMIN_PASSWORD}"
4656
CODER_FIRST_USER_TRIAL="${CODER_FIRST_USER_TRIAL}"
4757
EOF
4858

4959
echo "Importing kubernetes template"
50-
"${CONFIG_DIR}/coder" templates create --global-config="${CONFIG_DIR}" \
51-
--directory "${CONFIG_DIR}/templates/kubernetes" --yes kubernetes
60+
DRY_RUN="$DRY_RUN" "$PROJECT_ROOT/scaletest/lib/coder_shim.sh" templates create \
61+
--global-config="${CONFIG_DIR}" \
62+
--directory "${CONFIG_DIR}/templates/kubernetes" \
63+
--yes kubernetes

scaletest/lib/coder_shim.sh

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/usr/bin/env bash
2+
3+
# This is a shim for easily executing Coder commands against a loadtest cluster
4+
# without having to overwrite your own session/URL
5+
PROJECT_ROOT="$(git rev-parse --show-toplevel)"
6+
# shellcheck source=scripts/lib.sh
7+
source "${PROJECT_ROOT}/scripts/lib.sh"
8+
CONFIG_DIR="${PROJECT_ROOT}/scaletest/.coderv2"
9+
CODER_BIN="${CONFIG_DIR}/coder"
10+
DRY_RUN="${DRY_RUN:-0}"
11+
maybedryrun "$DRY_RUN" exec "${CODER_BIN}" --global-config "${CONFIG_DIR}" "$@"

scaletest/terraform/coder_workspacetraffic.sh renamed to scaletest/lib/coder_workspacetraffic.sh

+7-3
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,13 @@ fi
1111
[[ -n ${VERBOSE:-} ]] && set -x
1212

1313
LOADTEST_NAME="$1"
14-
CODER_TOKEN=$(./coder_shim.sh tokens create)
14+
PROJECT_ROOT="$(git rev-parse --show-toplevel)"
15+
CODER_TOKEN=$("${PROJECT_ROOT}/scaletest/lib/coder_shim.sh" tokens create)
1516
CODER_URL="http://coder.coder-${LOADTEST_NAME}.svc.cluster.local"
16-
export KUBECONFIG="${PWD}/.coderv2/${LOADTEST_NAME}-cluster.kubeconfig"
17+
export KUBECONFIG="${PROJECT_ROOT}/scaletest/.coderv2/${LOADTEST_NAME}-cluster.kubeconfig"
18+
19+
# Clean up any pre-existing pods
20+
kubectl -n "coder-${LOADTEST_NAME}" delete pod coder-scaletest-workspace-traffic --force || true
1721

1822
cat <<EOF | kubectl apply -f -
1923
apiVersion: v1
@@ -37,7 +41,7 @@ spec:
3741
- command:
3842
- sh
3943
- -c
40-
- "curl -fsSL $CODER_URL/bin/coder-linux-amd64 -o /tmp/coder && chmod +x /tmp/coder && /tmp/coder --url=$CODER_URL --token=$CODER_TOKEN scaletest workspace-traffic"
44+
- "curl -fsSL $CODER_URL/bin/coder-linux-amd64 -o /tmp/coder && chmod +x /tmp/coder && /tmp/coder --verbose --url=$CODER_URL --token=$CODER_TOKEN scaletest workspace-traffic --concurrency=0 --bytes-per-tick=4096 --tick-interval=100ms"
4145
env:
4246
- name: CODER_URL
4347
value: $CODER_URL

scaletest/scaletest.sh

+190
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
#!/usr/bin/env bash
2+
3+
[[ -n ${VERBOSE:-} ]] && set -x
4+
set -euo pipefail
5+
6+
PROJECT_ROOT="$(git rev-parse --show-toplevel)"
7+
# shellcheck source=scripts/lib.sh
8+
source "${PROJECT_ROOT}/scripts/lib.sh"
9+
10+
DRY_RUN="${DRY_RUN:-0}"
11+
SCALETEST_NAME="${SCALETEST_NAME:-}"
12+
SCALETEST_NUM_WORKSPACES="${SCALETEST_NUM_WORKSPACES:-}"
13+
SCALETEST_SCENARIO="${SCALETEST_SCENARIO:-}"
14+
SCALETEST_PROJECT="${SCALETEST_PROJECT:-}"
15+
SCALETEST_PROMETHEUS_REMOTE_WRITE_USER="${SCALETEST_PROMETHEUS_REMOTE_WRITE_USER:-}"
16+
SCALETEST_PROMETHEUS_REMOTE_WRITE_PASSWORD="${SCALETEST_PROMETHEUS_REMOTE_WRITE_PASSWORD:-}"
17+
SCALETEST_SKIP_CLEANUP="${SCALETEST_SKIP_CLEANUP:-0}"
18+
19+
script_name=$(basename "$0")
20+
args="$(getopt -o "" -l dry-run,help,name:,num-workspaces:,project:,scenario:,skip-cleanup -- "$@")"
21+
eval set -- "$args"
22+
while true; do
23+
case "$1" in
24+
--dry-run)
25+
DRY_RUN=1
26+
shift
27+
;;
28+
--help)
29+
echo "Usage: $script_name --name <name> --project <project> --num-workspaces <num-workspaces> --scenario <scenario> [--dry-run] [--skip-cleanup]"
30+
exit 1
31+
;;
32+
--name)
33+
SCALETEST_NAME="$2"
34+
shift 2
35+
;;
36+
--num-workspaces)
37+
SCALETEST_NUM_WORKSPACES="$2"
38+
shift 2
39+
;;
40+
--project)
41+
SCALETEST_PROJECT="$2"
42+
shift 2
43+
;;
44+
--scenario)
45+
SCALETEST_SCENARIO="$2"
46+
shift 2
47+
;;
48+
--skip-cleanup)
49+
SCALETEST_SKIP_CLEANUP=1
50+
shift
51+
;;
52+
--)
53+
shift
54+
break
55+
;;
56+
*)
57+
error "Unrecognized option: $1"
58+
;;
59+
esac
60+
done
61+
62+
dependencies gcloud kubectl terraform
63+
64+
if [[ -z "${SCALETEST_NAME}" ]]; then
65+
echo "Must specify --name"
66+
exit 1
67+
fi
68+
69+
if [[ -z "${SCALETEST_PROJECT}" ]]; then
70+
echo "Must specify --project"
71+
exit 1
72+
fi
73+
74+
if [[ -z "${SCALETEST_NUM_WORKSPACES}" ]]; then
75+
echo "Must specify --num-workspaces"
76+
exit 1
77+
fi
78+
79+
if [[ -z "${SCALETEST_SCENARIO}" ]]; then
80+
echo "Must specify --scenario"
81+
exit 1
82+
fi
83+
84+
if [[ -z "${SCALETEST_PROMETHEUS_REMOTE_WRITE_USER}" ]] || [[ -z "${SCALETEST_PROMETHEUS_REMOTE_WRITE_PASSWORD}" ]]; then
85+
echo "SCALETEST_PROMETHEUS_REMOTE_WRITE_USER or SCALETEST_PROMETHEUS_REMOTE_WRITE_PASSWORD not specified."
86+
echo "No prometheus metrics will be collected!"
87+
read -pr "Continue (y/N)? " choice
88+
case "$choice" in
89+
y | Y | yes | YES) ;;
90+
*) exit 1 ;;
91+
esac
92+
fi
93+
94+
SCALETEST_SCENARIO_VARS="${PROJECT_ROOT}/scaletest/terraform/scenario-${SCALETEST_SCENARIO}.tfvars"
95+
if [[ ! -f "${SCALETEST_SCENARIO_VARS}" ]]; then
96+
echo "Scenario ${SCALETEST_SCENARIO_VARS} not found."
97+
echo "Please create it or choose another scenario:"
98+
find "${PROJECT_ROOT}/scaletest/terraform" -type f -name 'scenario-*.tfvars'
99+
exit 1
100+
fi
101+
102+
if [[ "${SCALETEST_SKIP_CLEANUP}" == 1 ]]; then
103+
log "WARNING: you told me to not clean up after myself, so this is now your job!"
104+
fi
105+
106+
CONFIG_DIR="${PROJECT_ROOT}/scaletest/.coderv2"
107+
if [[ -d "${CONFIG_DIR}" ]] && files=$(ls -qAH -- "${CONFIG_DIR}") && [[ -z "$files" ]]; then
108+
echo "Cleaning previous configuration"
109+
maybedryrun "$DRY_RUN" rm -fv "${CONFIG_DIR}/*"
110+
fi
111+
maybedryrun "$DRY_RUN" mkdir -p "${CONFIG_DIR}"
112+
113+
SCALETEST_SCENARIO_VARS="${PROJECT_ROOT}/scaletest/terraform/scenario-${SCALETEST_SCENARIO}.tfvars"
114+
SCALETEST_SECRETS="${PROJECT_ROOT}/scaletest/terraform/secrets.tfvars"
115+
SCALETEST_SECRETS_TEMPLATE="${PROJECT_ROOT}/scaletest/terraform/secrets.tfvars.tpl"
116+
117+
log "Writing scaletest secrets to file."
118+
SCALETEST_NAME="${SCALETEST_NAME}" \
119+
SCALETEST_PROJECT="${SCALETEST_PROJECT}" \
120+
SCALETEST_PROMETHEUS_REMOTE_WRITE_USER="${SCALETEST_PROMETHEUS_REMOTE_WRITE_USER}" \
121+
SCALETEST_PROMETHEUS_REMOTE_WRITE_PASSWORD="${SCALETEST_PROMETHEUS_REMOTE_WRITE_PASSWORD}" \
122+
envsubst <"${SCALETEST_SECRETS_TEMPLATE}" >"${SCALETEST_SECRETS}"
123+
124+
pushd "${PROJECT_ROOT}/scaletest/terraform"
125+
126+
echo "Initializing terraform."
127+
maybedryrun "$DRY_RUN" terraform init
128+
129+
echo "Setting up infrastructure."
130+
maybedryrun "$DRY_RUN" terraform apply --var-file="${SCALETEST_SCENARIO_VARS}" --var-file="${SCALETEST_SECRETS}" --auto-approve
131+
132+
if [[ "${DRY_RUN}" != 1 ]]; then
133+
SCALETEST_CODER_URL=$(<"${CONFIG_DIR}/url")
134+
else
135+
SCALETEST_CODER_URL="http://coder.dryrun.local:3000"
136+
fi
137+
KUBECONFIG="${PROJECT_ROOT}/scaletest/.coderv2/${SCALETEST_NAME}-cluster.kubeconfig"
138+
echo "Waiting for Coder deployment at ${SCALETEST_CODER_URL} to become ready"
139+
maybedryrun "$DRY_RUN" kubectl --kubeconfig="${KUBECONFIG}" -n "coder-${SCALETEST_NAME}" rollout status deployment/coder
140+
141+
echo "Initializing Coder deployment."
142+
DRY_RUN="$DRY_RUN" "${PROJECT_ROOT}/scaletest/lib/coder_init.sh" "${SCALETEST_CODER_URL}"
143+
144+
echo "Creating ${SCALETEST_NUM_WORKSPACES} workspaces."
145+
DRY_RUN="$DRY_RUN" "${PROJECT_ROOT}/scaletest/lib/coder_shim.sh" scaletest create-workspaces \
146+
--count "${SCALETEST_NUM_WORKSPACES}" \
147+
--template=kubernetes \
148+
--concurrency 10 \
149+
--no-cleanup
150+
151+
echo "Sleeping 10 minutes to establish a baseline measurement."
152+
maybedryrun "$DRY_RUN" sleep 600
153+
154+
echo "Sending traffic to workspaces"
155+
maybedryrun "$DRY_RUN" "${PROJECT_ROOT}/scaletest/lib/coder_workspacetraffic.sh" "${SCALETEST_NAME}"
156+
maybedryrun "$DRY_RUN" kubectl --kubeconfig="${KUBECONFIG}" -n "coder-${SCALETEST_NAME}" wait pods coder-scaletest-workspace-traffic --for condition=Ready
157+
maybedryrun "$DRY_RUN" kubectl --kubeconfig="${KUBECONFIG}" -n "coder-${SCALETEST_NAME}" logs -f pod/coder-scaletest-workspace-traffic
158+
159+
echo "Starting pprof"
160+
maybedryrun "$DRY_RUN" kubectl -n "coder-${SCALETEST_NAME}" port-forward deployment/coder 6061:6060 &
161+
pfpid=$!
162+
maybedryrun "$DRY_RUN" trap "kill $pfpid" EXIT
163+
164+
echo "Waiting for pprof endpoint to become available"
165+
pprof_attempt_counter=0
166+
while ! maybedryrun "$DRY_RUN" timeout 1 bash -c "echo > /dev/tcp/localhost/6061"; do
167+
if [[ $pprof_attempt_counter -eq 10 ]]; then
168+
echo
169+
echo "pprof failed to become ready in time!"
170+
exit 1
171+
fi
172+
maybedryrun "$DRY_RUN" sleep 3
173+
done
174+
175+
echo "Taking pprof snapshots"
176+
maybedryrun "$DRY_RUN" curl --silent --fail --output "${SCALETEST_NAME}-heap.pprof.gz" http://localhost:6061/debug/pprof/heap
177+
maybedryrun "$DRY_RUN" curl --silent --fail --output "${SCALETEST_NAME}-goroutine.pprof.gz" http://localhost:6061/debug/pprof/goroutine
178+
# No longer need to port-forward
179+
maybedryrun "$DRY_RUN" kill "$pfpid"
180+
maybedryrun "$DRY_RUN" trap - EXIT
181+
182+
if [[ "${SCALETEST_SKIP_CLEANUP}" == 1 ]]; then
183+
echo "Leaving resources up for you to inspect."
184+
echo "Please don't forget to clean up afterwards:"
185+
echo "cd terraform && terraform destroy --var-file=${SCALETEST_SCENARIO_VARS} --var-file=${SCALETEST_SECRETS} --auto-approve"
186+
exit 0
187+
fi
188+
189+
echo "Cleaning up"
190+
maybedryrun "$DRY_RUN" terraform destroy --var-file="${SCALETEST_SCENARIO_VARS}" --var-file="${SCALETEST_SECRETS}" --auto-approve

0 commit comments

Comments
 (0)