Skip to content

bug: template upgrade fails on claimed prebuilt workspace #17840

@dannykopping

Description

@dannykopping

Problem

If a prebuilt workspace's template uses ignore_changes as recommended in the docs, its agent may not reconnect after a workspace template upgrade.

e.g.

resource "docker_container" "workspace" {
  lifecycle {
    ignore_changes = all
  }
  
  count = data.coder_workspace.me.start_count

  entrypoint = ["sh", "-c", coder_agent.main.init_script]
  env = ["CODER_AGENT_TOKEN=${coder_agent.main.token}"]
  ...
}

Details

A template upgrade kicks off a start build. start builds set coder_workspace.start_count to 1, which is used in the count attribute of compute resources (see above example). If the workspace is already started, then any resources which already have count=1 will attempt to be updated in-place.

A start build causes the coder_agent to be recreated, which generates a new auth token. Normally, without ignore_changes, the env attribute above would be modified, since the token value changes. env is immutable (i.e. defined as ForceNew), therefore Terraform will see any changes to this attribute as drift from the original and force a replacement. This would lead to the docker_container being recreated and the agent would start afresh and connect to the control plane.

With ignore_changes, however, changes to these attributes are ignored in order for prebuilds to work, which means the template update for the workspace has no real effect at all, but the coder_agent's token is still changed and so the agent can no longer connect to the control plane on behalf of the workspace. The previous agent token would still be used, even though the control plane will only accept the new one.

Workaround

Manually restarting the workspace will allow the agent to reconnect successfully.

Proposed Solution

Template updates should not be start builds, but rather a logical restart (i.e. successive stop and start builds) in order to guarantee the behaviour customers expect. This should apply for both claimed prebuilt workspaces AND regular workspaces alike, to guarantee that the compute resource is created anew. I fear the current start-only mechanism is working by accident, because of Terraform drift taking care of destroying and recreating the resource like a stop + start would.

Metadata

Metadata

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions