-
Notifications
You must be signed in to change notification settings - Fork 896
fix(codersdk): keep workspace agent connection open after dial context #10863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
codersdk/workspaceagents.go
Outdated
@@ -369,6 +373,7 @@ func (c *Client) DialWorkspaceAgent(ctx context.Context, agentID uuid.UUID, opti | |||
firstDerpMap := make(chan error) | |||
go func() { | |||
defer close(closedDerpMap) | |||
firstDerpMap := firstDerpMap // Shadowed so it can be reassigned outside goroutine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to modify this from a separate goroutine? Is this not a potential data race?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this here to safe-guard against data-race since we assign firstDerpMap = nil
in the parent goroutine after we receive the first value (error or closed chan).
But I just realized, this is not needed. We only ever send one value (error or close, not both), so there is no race.
*Iff we had not shadowed and done both: send error and close the chan, then the close(firstDerpMap)
would've been racy because we're reassigning in the other routine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side, this change makes sense.
Closing the connection if the dial context is cancelled screams 'spooky action at a distance' to me.
ccc096b
to
f32848e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed some time to analyze this test, so sorry for the late review. Left one nit-pick related to Go logic.
coderd/workspaceagents_test.go
Outdated
ctx, cancel := context.WithTimeout(context.Background(), testutil.WaitLong) | ||
defer cancel() | ||
conn, err := client.DialWorkspaceAgent(ctx, resources[0].Agents[0].ID, &codersdk.DialWorkspaceAgentOptions{ | ||
Logger: slogtest.Make(t, nil).Named("client").Leveled(slog.LevelDebug), | ||
}) | ||
require.NoError(t, err) | ||
defer conn.Close() | ||
|
||
// Connection should remain open even if the dial context is canceled. | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm wondering if you can extract this to a separate function to make sure that cancel()
is called only once in defer, instead of adding an extra cancel()
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't really matter if it's called multiple times (it's no-op for context cancel funcs). I typically always defer
as a precaution to ensure it's cleaned up, then call it manually when it's time accepting that there will be a double cancel (this is especially important when using require
that can short-circuit a test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't really matter if it's called multiple times
I know, I'm looking at this cancel()
juggling more from the clean code perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, refactored 👍🏻
This PR fixes the issue where dial and lifetime contexts for agent connections returned by DialWorkspaceAgent were intertwined. This is unexpected behavior for a dial function and had the side-effect of closing both the coordinator and derp map updates whilst leaving the tailscale connection open. To close the connection, a user should call
.Close()
.This PR simply separates the contexts, it takes no stance on whether or not exiting the coordinator or derp map routine should also close the tailscale connection.