Skip to content

feat: scaletest: scale down nodegroups by default #8276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 30, 2023

Conversation

johnstcn
Copy link
Member

@johnstcn johnstcn commented Jun 30, 2023

This PR modifies the scaletest terraform to allow scaling down the cluster by setting -var state=stopped. This should hopefully save some time performing scaletests as we no longer have to wait for the cloudsql database etc. to be created.

By default, scaletest.sh will scale down the nodepools unless the --destroy argument is passed.

@johnstcn johnstcn self-assigned this Jun 30, 2023
Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do I have a feeling that it performs more than just starting with nodecount = 0 :)

@johnstcn
Copy link
Member Author

Why do I have a feeling that it performs more than just starting with nodecount = 0 :)

There were some... interesting gotcha's. The main thing I found is that when you try to delete a namespace in GKE with no active nodepools is that the namespace will hang in the Deleting state, probably due to some finalizers that haven't run.

I spent a while chasing the 'right' way to ignore the resource deletion in Terraform before deciding to just use a null_resource for creating the namespaces and moving on with my life.

Comment on lines +159 to +173
max_attempts=10
for attempt in $(seq 1 $max_attempts); do
maybedryrun "$DRY_RUN" curl --silent --fail --output /dev/null "${SCALETEST_CODER_URL}/api/v2/buildinfo"
curl_status=$?
if [[ $curl_status -eq 0 ]]; then
break
fi
if attempt -eq $max_attempts; then
echo
echo "Coder deployment failed to become ready in time!"
exit 1
fi
echo "Coder deployment not ready yet (${attempt}/${max_attempts}), sleeping 3 seconds"
maybedryrun "$DRY_RUN" sleep 3
done
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review: there is a race condition between the rollout status returning true and the service actually becoming ready; so I'm just going back to curl :-)

Comment on lines +132 to +152
resource "null_resource" "cluster_kubeconfig" {
depends_on = [google_container_cluster.primary]
triggers = {
path = local.cluster_kubeconfig_path
name = google_container_cluster.primary.name
project_id = var.project_id
zone = var.zone
}
provisioner "local-exec" {
command = <<EOF
KUBECONFIG=${self.triggers.path} gcloud container clusters get-credentials ${self.triggers.name} --project=${self.triggers.project_id} --zone=${self.triggers.zone}
EOF
}

provisioner "local-exec" {
when = destroy
command = <<EOF
rm -f ${self.triggers.path}
EOF
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review: this was previously being created when applying prom monitoring manifests; moved it to its own resource declaration here.

@johnstcn johnstcn merged commit 1e8cc2c into main Jun 30, 2023
@johnstcn johnstcn deleted the cj/scaletest-scaledown branch June 30, 2023 15:07
@github-actions github-actions bot locked and limited conversation to collaborators Jun 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants