Skip to content

Coder agent should exit when out-of-date #3485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mattlqx opened this issue Aug 12, 2022 · 4 comments · Fixed by #4715
Closed

Coder agent should exit when out-of-date #3485

mattlqx opened this issue Aug 12, 2022 · 4 comments · Fixed by #4715
Labels
api Area: HTTP API s0 Major regression, all-hands-on-deck to fix

Comments

@mattlqx
Copy link

mattlqx commented Aug 12, 2022

With the Coder agent configured as a systemd service, when a workspace is updated, the agent will no longer be shown as connected in the UI and the running agent displays a warning level log with "Error: build is outdated". The coder agent does not exit however. If it did, systemd would just restart it and everything would be happy again. I could restart it as part of a Terraform null_resource, but I think the coder agent behavior should just be improved.

Couple points:

  1. Why can't the coder agent handle this situation automatically? Either reinitializing the outdated parts internally, or forking a new copy of itself.
  2. In this situation if it's just going to perpetually fail to connect and log errors, why doesn't it just exit?
  3. If changes to this behavior aren't desired for some reason, could there at least be a flag added to make it an option to exit?
● coder.service - Coder.com Agent
     Loaded: loaded (/etc/systemd/system/coder.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2022-08-12 13:54:35 UTC; 1h 4min ago
       Docs: https://github.com/coder/coder
   Main PID: 505 (coder)
      Tasks: 48 (limit: 9403)
     Memory: 444.0M
     CGroup: /system.slice/coder.service
             ├─ 505 ./coder agent
             ├─1251 /usr/lib/code-server/lib/node /usr/lib/code-server --auth none --port 13337
             ├─1270 /usr/lib/code-server/lib/node /usr/lib/code-server --auth none --port 13337
             ├─1284 /usr/lib/code-server/lib/node /usr/lib/code-server/lib/vscode/out/bootstrap-fork --type=ptyHost
             └─3996 /bin/bash -l

Aug 12 14:58:24 mkulka-coder-test.local coder_bootstrap_linux.sh[505]:                    Error: build is outdated
Aug 12 14:58:34 mkulka-coder-test.local coder_bootstrap_linux.sh[505]: 2022-08-12 14:58:34.565 [WARN]        <./agent/agent.go:140>        (*agent).run        failed to dial ...
Aug 12 14:58:34 mkulka-coder-test.local coder_bootstrap_linux.sh[505]:   "error": GET https://mkulka-coder.local:8443/api/v2/workspaceagents/me/listen: unexpected status code 403>
Aug 12 14:58:34 mkulka-coder-test.local coder_bootstrap_linux.sh[505]:                    Error: build is outdated
Aug 12 14:58:44 mkulka-coder-test.local coder_bootstrap_linux.sh[505]: 2022-08-12 14:58:44.568 [WARN]        <./agent/agent.go:140>        (*agent).run        failed to dial ...
Aug 12 14:58:44 mkulka-coder-test.local coder_bootstrap_linux.sh[505]:   "error": GET https://mkulka-coder.local:8443/api/v2/workspaceagents/me/listen: unexpected status code 403>
Aug 12 14:58:44 mkulka-coder-test.local coder_bootstrap_linux.sh[505]:                    Error: build is outdated
Aug 12 14:58:54 mkulka-coder-test.local coder_bootstrap_linux.sh[505]: 2022-08-12 14:58:54.571 [WARN]        <./agent/agent.go:140>        (*agent).run        failed to dial ...
Aug 12 14:58:54 mkulka-coder-test.local coder_bootstrap_linux.sh[505]:   "error": GET https://mkulka-coder.local:8443/api/v2/workspaceagents/me/listen: unexpected status code 403>
Aug 12 14:58:54 mkulka-coder-test.local coder_bootstrap_linux.sh[505]:                    Error: build is outdated
@mattlqx
Copy link
Author

mattlqx commented Aug 12, 2022

Basically a dupe of #2970

@kylecarbs
Copy link
Member

We'll have to move the exchanging of a token from instance identity deeper in the agent, which isn't a big deal anyways.

The problem is we exchange a token from instance identity at the beginning of the agents lifecycle, so when a new build comes up it attempts to use the old token instead of refreshing.

@mafredri
Copy link
Member

@kylecarbs Is this issue provider dependent? I've mostly used the Docker templates and there I've only ever seen the workspace get re-created on updates. So this issue hasn't surfaced. I kind of think it would be OK for update to be a stop -> update -> start kind of process, but I guess there could be use-cases for not interrupting work unless necessary?

So, I guess another way to put this is: Depending on how a template is configured, could it sometimes restart the workspace and sometimes leave it running (currently)? If the behavior varies, I think it'd be important for the user to know this (i.e. warning) or always do updates in the same "interruptive" way.

@kylecarbs
Copy link
Member

It's not, this is purely in the agent.

If a token is returned invalid, we should exchange to get a new one from the instance identity again, not continually poll with the invalid token.

@kylecarbs kylecarbs added the api Area: HTTP API label Aug 24, 2022
@bpmct bpmct added the s0 Major regression, all-hands-on-deck to fix label Oct 20, 2022
@kylecarbs kylecarbs assigned kylecarbs and unassigned kylecarbs Oct 21, 2022
kylecarbs added a commit that referenced this issue Oct 24, 2022
This simplifies a lot of code by creating an interface for
the codersdk client into the agent. It also moves agent
authentication code so instance identity will work between
restarts.

Fixes #3485 and #4082.
kylecarbs added a commit that referenced this issue Oct 24, 2022
This simplifies a lot of code by creating an interface for
the codersdk client into the agent. It also moves agent
authentication code so instance identity will work between
restarts.

Fixes #3485 and #4082.
kylecarbs added a commit that referenced this issue Oct 24, 2022
* fix: Refactor agent to consume API client

This simplifies a lot of code by creating an interface for
the codersdk client into the agent. It also moves agent
authentication code so instance identity will work between
restarts.

Fixes #3485 and #4082.

* Fix client reconnections
@matifali matifali added the bug label May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Area: HTTP API s0 Major regression, all-hands-on-deck to fix
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants