Skip to content

fix: ensure wsproxy MultiAgent is closed when websocket dies #11414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
fix: ensure wsproxy MultiAgent is closed when websocket dies
The `SingleTailnet` behavior only checked to see if the `MultiAgent` was
closed, but the websocket error was not being propogated into the
`MultiAgent`, causing it to never be swapped for a new working one.
  • Loading branch information
coadler committed Jan 11, 2024
commit 129b16df19cc8fad152cc32ef569825a8bc93a4c
6 changes: 3 additions & 3 deletions coderd/httpapi/websocket.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ func Heartbeat(ctx context.Context, conn *websocket.Conn) {
}
}

// Heartbeat loops to ping a WebSocket to keep it alive. It kills the connection
// on ping failure.
// Heartbeat loops to ping a WebSocket to keep it alive. It calls `exit` on ping
// failure.
func HeartbeatClose(ctx context.Context, exit func(), conn *websocket.Conn) {
ticker := time.NewTicker(30 * time.Second)
ticker := time.NewTicker(15 * time.Second)
defer ticker.Stop()

for {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Re: lines 43 to 43]

Drop an INFO log here

See this comment inline on Graphite.

Expand Down
1 change: 1 addition & 0 deletions coderd/tailnet.go
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@ func (s *ServerTailnet) watchAgentUpdates() {
nodes, ok := conn.NextUpdate(s.ctx)
if !ok {
if conn.IsClosed() && s.ctx.Err() == nil {
s.logger.Warn(s.ctx, "multiagent closed, reinitializing")
s.reinitCoordinator()
continue
}
Expand Down
5 changes: 5 additions & 0 deletions enterprise/wsproxy/wsproxysdk/wsproxysdk.go
Original file line number Diff line number Diff line change
Expand Up @@ -472,6 +472,11 @@ func (c *Client) DialCoordinator(ctx context.Context) (agpl.MultiAgentConn, erro
OnRemove: func(agpl.Queue) { conn.Close(websocket.StatusGoingAway, "closed") },
}).Init()

go func() {
<-ctx.Done()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding this correctly, we're depending on the fact that the reader goroutine below cancels the context on a failed read.

I think we should also tear down the multi-agent on a failed write of subscription messages. It's unlikely that we'd have a failure that leaves the connection half-open (e.g. for reads but not writes), but such things are possible and you don't want the proxy limping on unable to subscribe to new agents.

ma.Close()
}()

go func() {
defer cancel()
dec := json.NewDecoder(nc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Re: lines 488 to 488]

I think it's worth dropping an INFO log here.

See this comment inline on Graphite.

Expand Down