Skip to content

fix(codersdk): keep workspace agent connection open after dial context #10863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 27, 2023

Conversation

mafredri
Copy link
Member

@mafredri mafredri commented Nov 24, 2023

This PR fixes the issue where dial and lifetime contexts for agent connections returned by DialWorkspaceAgent were intertwined. This is unexpected behavior for a dial function and had the side-effect of closing both the coordinator and derp map updates whilst leaving the tailscale connection open. To close the connection, a user should call .Close().

This PR simply separates the contexts, it takes no stance on whether or not exiting the coordinator or derp map routine should also close the tailscale connection.

@mafredri mafredri self-assigned this Nov 24, 2023
@mafredri mafredri requested review from johnstcn and mtojek November 24, 2023 17:34
@@ -369,6 +373,7 @@ func (c *Client) DialWorkspaceAgent(ctx context.Context, agentID uuid.UUID, opti
firstDerpMap := make(chan error)
go func() {
defer close(closedDerpMap)
firstDerpMap := firstDerpMap // Shadowed so it can be reassigned outside goroutine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to modify this from a separate goroutine? Is this not a potential data race?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this here to safe-guard against data-race since we assign firstDerpMap = nil in the parent goroutine after we receive the first value (error or closed chan).

But I just realized, this is not needed. We only ever send one value (error or close, not both), so there is no race.

*Iff we had not shadowed and done both: send error and close the chan, then the close(firstDerpMap) would've been racy because we're reassigning in the other routine.

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side, this change makes sense.
Closing the connection if the dial context is cancelled screams 'spooky action at a distance' to me.

@mafredri mafredri force-pushed the mafredri/fix-codersdk-dialworkspaceagent branch from ccc096b to f32848e Compare November 27, 2023 10:20
Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed some time to analyze this test, so sorry for the late review. Left one nit-pick related to Go logic.

Comment on lines 448 to 457
ctx, cancel := context.WithTimeout(context.Background(), testutil.WaitLong)
defer cancel()
conn, err := client.DialWorkspaceAgent(ctx, resources[0].Agents[0].ID, &codersdk.DialWorkspaceAgentOptions{
Logger: slogtest.Make(t, nil).Named("client").Leveled(slog.LevelDebug),
})
require.NoError(t, err)
defer conn.Close()

// Connection should remain open even if the dial context is canceled.
cancel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm wondering if you can extract this to a separate function to make sure that cancel() is called only once in defer, instead of adding an extra cancel() call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really matter if it's called multiple times (it's no-op for context cancel funcs). I typically always defer as a precaution to ensure it's cleaned up, then call it manually when it's time accepting that there will be a double cancel (this is especially important when using require that can short-circuit a test).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't really matter if it's called multiple times

I know, I'm looking at this cancel() juggling more from the clean code perspective.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, refactored 👍🏻

@mafredri mafredri merged commit f441ad6 into main Nov 27, 2023
@mafredri mafredri deleted the mafredri/fix-codersdk-dialworkspaceagent branch November 27, 2023 12:29
@github-actions github-actions bot locked and limited conversation to collaborators Nov 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants