-
Notifications
You must be signed in to change notification settings - Fork 907
feat: scaletest: scale down nodegroups by default #8276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do I have a feeling that it performs more than just starting with nodecount = 0
:)
There were some... interesting gotcha's. The main thing I found is that when you try to delete a namespace in GKE with no active nodepools is that the namespace will hang in the I spent a while chasing the 'right' way to ignore the resource deletion in Terraform before deciding to just use a null_resource for creating the namespaces and moving on with my life. |
max_attempts=10 | ||
for attempt in $(seq 1 $max_attempts); do | ||
maybedryrun "$DRY_RUN" curl --silent --fail --output /dev/null "${SCALETEST_CODER_URL}/api/v2/buildinfo" | ||
curl_status=$? | ||
if [[ $curl_status -eq 0 ]]; then | ||
break | ||
fi | ||
if attempt -eq $max_attempts; then | ||
echo | ||
echo "Coder deployment failed to become ready in time!" | ||
exit 1 | ||
fi | ||
echo "Coder deployment not ready yet (${attempt}/${max_attempts}), sleeping 3 seconds" | ||
maybedryrun "$DRY_RUN" sleep 3 | ||
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review: there is a race condition between the rollout status
returning true and the service actually becoming ready; so I'm just going back to curl :-)
resource "null_resource" "cluster_kubeconfig" { | ||
depends_on = [google_container_cluster.primary] | ||
triggers = { | ||
path = local.cluster_kubeconfig_path | ||
name = google_container_cluster.primary.name | ||
project_id = var.project_id | ||
zone = var.zone | ||
} | ||
provisioner "local-exec" { | ||
command = <<EOF | ||
KUBECONFIG=${self.triggers.path} gcloud container clusters get-credentials ${self.triggers.name} --project=${self.triggers.project_id} --zone=${self.triggers.zone} | ||
EOF | ||
} | ||
|
||
provisioner "local-exec" { | ||
when = destroy | ||
command = <<EOF | ||
rm -f ${self.triggers.path} | ||
EOF | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review: this was previously being created when applying prom monitoring manifests; moved it to its own resource declaration here.
This PR modifies the scaletest terraform to allow scaling down the cluster by setting
-var state=stopped
. This should hopefully save some time performing scaletests as we no longer have to wait for the cloudsql database etc. to be created.By default,
scaletest.sh
will scale down the nodepools unless the--destroy
argument is passed.