Description
To ensure workspaces (and agents) are stopped gracefully, we want to make sure Terraform providers are configured in a way that tries (best-effort) to respect the agent shutdown procedure.
What does graceful shutdown mean? It means the provider issues a kill -INT [coder_agent_pid]
(one of HUP
, INT
, TERM
) when the resource is stopped. After a specified timeout, it may issue kill -KILL [coder_agent_pid]
to ensure de-provision progresses.
Note: This will not be the primary form of ensuring an agent is stopped gracefully. In the ideal scenario, this happens in a pre-provision step and this acts as a safety net (#6175).
We need to figure out how to perform graceful shutdown with (or verify that it works):
- kreuzwerker/terraform-provider-docker (see below)
- hashicorp/terraform-provider-aws
- Is there a better option than trying to gracefully interrupt in user_data script?
- Container should work out-of-the-box, timeout can be controlled via
stopTimeout
/ECS_CONTAINER_STOP_TIMEOUT
- Related: Can the Coder agent immediately realize it's disconnected? #5901
- hashicorp/terraform-provider-azurerm
- digitalocean/terraform-provider-digitalocean
- Droplets
- K8S
- hashicorp/terraform-provider-google
- hashicorp/terraform-provider-kubernetes
Eventually:
- Update documentation and example templates
Docker:
resource "docker_container" "workspace" {
destroy_grace_seconds = 30
stop_timeout = 30
stop_signal = "SIGINT"
}
EDIT: It looks like destroy_grace_seconds
is required to trigger a graceful stop, stop_{timeout,signal}
is insufficient alone. It may be that stop_{timeout,signal}
can be omitted in normal use-cases though (needs verification).