Skip to content

fix: fix goroutine leak in log streaming over websocket #15709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 3, 2024

Conversation

spikecurtis
Copy link
Contributor

@spikecurtis spikecurtis commented Dec 2, 2024

fixes #14881

Our handlers for streaming logs don't read from the websocket. We don't allow the client to send us any data, but the websocket library we use requires reading from the websocket to properly handle pings and closing. Not doing so can can cause the websocket to hang on write, leaking go routines which were noticed in #14881.

This fixes the issue, and in process refactors our log streaming to a encoder/decoder package which provides generic types for sending JSON over websocket.

I'd also like for us to upgrade to the latest https://github.com/coder/websocket but we should also upgrade our tailscale fork before doing so to avoid including two copies of the websocket library.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@spikecurtis spikecurtis requested review from Emyrk and mafredri December 2, 2024 06:54
@spikecurtis spikecurtis marked this pull request as ready for review December 2, 2024 06:55
@spikecurtis spikecurtis force-pushed the spike/14881-read-from-json-websockets branch from e379c84 to 2ae0a74 Compare December 2, 2024 08:54
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggestions but otherwise LGTM 👍🏻

ctx, wsNetConn := codersdk.WebsocketNetConn(ctx, conn, websocket.MessageText)
defer wsNetConn.Close() // Also closes conn.
encoder := wsjson.NewEncoder[[]codersdk.WorkspaceAgentLog](conn, websocket.MessageText)
defer encoder.Close(websocket.StatusGoingAway)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use going away instead of normal closure? It's kind of an error state and a divergence from before (i.e. status from calling wsNetConn.Close()).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose GoingAway because one likely reason for the server closing the connection is that it's shutting down. We could also be closing because there is a new build and this agent is no longer current.

If we actually used status codes for anything we'd want to send different codes in these cases, but we don't. I can change it to match the old wsNetConn for consistency.

@@ -767,7 +758,7 @@ func (api *API) derpMapUpdates(rw http.ResponseWriter, r *http.Request) {
err := ws.Ping(ctx)
cancel()
if err != nil {
_ = nconn.Close()
_ = ws.Close(websocket.StatusGoingAway, "ping failed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻


// nolint: revive // complains that Encoder has the same function name
func (d *Decoder[T]) Close() error {
err := d.conn.Close(websocket.StatusGoingAway, "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previously, why not use normal closure? Also, why not take status like encoder for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched to StatusNormalClosure.

Here I don't want to take the websocket status like Encoder because it's useful for the Decoder to implement io.Closer to more easily fit into existing code.

@spikecurtis spikecurtis force-pushed the spike/14881-read-from-json-websockets branch from 2ae0a74 to 76911bb Compare December 3, 2024 06:04
@spikecurtis spikecurtis merged commit 148a5a3 into main Dec 3, 2024
28 checks passed
Copy link
Contributor Author

Merge activity

  • Dec 3, 1:12 AM EST: A user merged this pull request with Graphite.

@spikecurtis spikecurtis deleted the spike/14881-read-from-json-websockets branch December 3, 2024 06:12
stirby pushed a commit that referenced this pull request Dec 3, 2024
fixes #14881

Our handlers for streaming logs don't read from the websocket. We don't allow the client to send us any data, but the websocket library we use requires reading from the websocket to properly handle pings and closing. Not doing so can [can cause the websocket to hang on write](coder/websocket#405), leaking go routines which were noticed in #14881.

This fixes the issue, and in process refactors our log streaming to a encoder/decoder package which provides generic types for sending JSON over websocket.

I'd also like for us to upgrade to the latest https://github.com/coder/websocket but we should also upgrade our tailscale fork before doing so to avoid including two copies of the websocket library.

(cherry picked from commit 148a5a3)
stirby pushed a commit that referenced this pull request Dec 11, 2024
fixes #14881

Our handlers for streaming logs don't read from the websocket. We don't allow the client to send us any data, but the websocket library we use requires reading from the websocket to properly handle pings and closing. Not doing so can [can cause the websocket to hang on write](coder/websocket#405), leaking go routines which were noticed in #14881.

This fixes the issue, and in process refactors our log streaming to a encoder/decoder package which provides generic types for sending JSON over websocket.

I'd also like for us to upgrade to the latest https://github.com/coder/websocket but we should also upgrade our tailscale fork before doing so to avoid including two copies of the websocket library.

(cherry picked from commit 148a5a3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Coder pods running out of memory
2 participants