Skip to content

workspace proxy fails to proxy; error "ensure agent: subscribe agent" #11401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
johnstcn opened this issue Jan 4, 2024 · 1 comment · Fixed by #11414
Closed

workspace proxy fails to proxy; error "ensure agent: subscribe agent" #11401

johnstcn opened this issue Jan 4, 2024 · 1 comment · Fixed by #11414
Assignees
Labels
s2 Broken use cases or features (with a workaround). Only humans may set this.

Comments

@johnstcn
Copy link
Member

johnstcn commented Jan 4, 2024

(TODO: write a better issue title)

After #11366 was merged, I observed 502 errors when attempting to open vscode-web on my workspace:

Failed to proxy request to application: acquire agent conn: ensure agent: subscribe agent: write message: failed to write msg: WebSocket closed: failed to read frame header: EOF

and alternately, upon refresh:

Failed to proxy request to application: acquire agent conn: ensure agent: subscribe agent: write message: failed to write msg: failed to acquire lock: context canceled

Screenshot 2024-01-04 at 09 30 08

I observed this behaviour with:

But not with:

After restarting paris.fly.dev.coder.com, the issue was apparently resolved.

Current theory is that there is a bug in the retry logic.

This appears to be supported by the following timeline of events:

  • 2024-01-04T05:33:34.9677772Z 'paris-coder' fly.io wsproxy restarted
  • 2024-01-04T05:34:08.0138014Z 'sydney-coder' fly.io wsproxy restarted
  • 2024-01-04T05:34:17.3245364Z deployment.apps/coder restarted
  • 2024-01-04T05:34:25.7908454Z 'sydney' GCP wsproxy restarted
  • 2024-01-04T05:34:29.7072117Z 'sao-paulo-coder' fly.io wsproxy restarted
  • 2024-01-04T05:34:33.5160577Z deployment "coder" successfully rolled out
  • 2024-01-04T05:34:44.6466916Z 'europe' GCP wsproxy restarted
  • 2024-01-04T05:35:02.7787815Z 'brazil' GCP wsproxy restarted

The Paris and Sydney wsproxies would have been connected to coderd at the time of the rollout restart happening; the restart would have interrupted the persistent websocket connection for those wsproxies while the others most likely were connected to the new coderd replicas.

Curiously, the workspace proxy healthcheck reported no issues:

curl https://sydney.fly.dev.coder.com/healthz-report
{"errors":null,"warnings":null}

The logs on the Paris fly.io wsproxy had already rotated, but we observed the following in the Sydney fly.io wsproxy's log output:

2024-01-04T05:34:06Z app[918577d4bd5538] syd [info]Started HTTP listener at http://0.0.0.0:3000
2024-01-04T05:34:06Z app[918577d4bd5538] syd [info]View the Web UI: https://sydney.fly.dev.coder.com
2024-01-04T05:34:08Z app[918577d4bd5538] syd [info]==> Logs will stream in below (press ctrl+c to gracefully exit):
2024-01-04T05:34:35Z app[918577d4bd5538] syd [info]2024-01-04 05:34:35.000 [warn]  net.workspace-proxy.servertailnet: broadcast server node to agents ...
2024-01-04T05:34:35Z app[918577d4bd5538] syd [info]    error= write message:
2024-01-04T05:34:35Z app[918577d4bd5538] syd [info]               github.com/coder/coder/v2/enterprise/wsproxy/wsproxysdk.(*remoteMultiAgentHandler).writeJSON
2024-01-04T05:34:35Z app[918577d4bd5538] syd [info]                   /home/runner/actions-runner/_work/coder/coder/enterprise/wsproxy/wsproxysdk/wsproxysdk.go:524
2024-01-04T05:34:35Z app[918577d4bd5538] syd [info]             - failed to write msg: WebSocket closed: failed to read frame header: EOF
@cdr-bot cdr-bot bot added the bug label Jan 4, 2024
@spikecurtis spikecurtis changed the title workspace proxy retry logic potentially bugged workspace proxy fails to proxy; error "ensure agent: subscribe agent" Jan 4, 2024
@spikecurtis spikecurtis added the s2 Broken use cases or features (with a workaround). Only humans may set this. label Jan 4, 2024
@coadler coadler self-assigned this Jan 4, 2024
@matifali
Copy link
Member

matifali commented Jan 6, 2024

This has been happening very frequently on paris-coder fly.io proxy. A manuall redeply fixed it but temporarily

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
s2 Broken use cases or features (with a workaround). Only humans may set this.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants