Skip to content

Graceful shutdowns for coder agents and shutdown scripts #6175

@mafredri

Description

@mafredri

In the ideal scenario where a workspace agent is connected to the coder server, we should initiate agent shutdown before we initiate Terraform provisioning to destroy/re-create the resource(s).

Why?

  • Providers behave differently, some may not initiate graceful shutdowns and we might not be able to control timeouts for them
  • Template authors may use the agent shutdown_script to perform a critical task that must complete successfully (e.g. backing up filesystem)
    • The task/script may take a long time
  • We can leave the workspace running and allow debugging an agent that didn't successfully execute its shutdown_script

To consider:

  • At the end we must not exit the agent process
    • Only applies when graceful shutdown is initiated by coder server!
    • We should de-register signal handlers and wait "indefinitely"
    • Let the next signal terminate the process
    • Why? This prevents e.g. a systemd service from restarting the agent
  • Should behavior differ if a workspace is started/stopped?
  • What happens when an agent is disconnected and we can't tell it to shut down? -> Block or allow de-provision? Require the use of "force"?

Related: #4677, #5914, #6139

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiArea: HTTP APIcliArea: CLI

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions