Open
Description
In the ideal scenario where a workspace agent is connected to the coder server, we should initiate agent shutdown before we initiate Terraform provisioning to destroy/re-create the resource(s).
Why?
- Providers behave differently, some may not initiate graceful shutdowns and we might not be able to control timeouts for them
- Template authors may use the agent
shutdown_script
to perform a critical task that must complete successfully (e.g. backing up filesystem)- The task/script may take a long time
- We can leave the workspace running and allow debugging an agent that didn't successfully execute its
shutdown_script
To consider:
- At the end we must not exit the agent process
- Only applies when graceful shutdown is initiated by coder server!
- We should de-register signal handlers and wait "indefinitely"
- Let the next signal terminate the process
- Why? This prevents e.g. a systemd service from restarting the agent
- Should behavior differ if a workspace is started/stopped?
- What happens when an agent is disconnected and we can't tell it to shut down? -> Block or allow de-provision? Require the use of "force"?