chore: refactor agent connection updates #11301

spikecurtis · 2023-12-21T07:35:44Z

Refactors the code that handles monitoring an agent websocket with pings and updating the connection times in the DB.

Consolidates v1 and v2 agent APIs under the same code for this.

One substantive change (not just a refactor) is that I've made it so that we actually disconnect if the agent fails to respond to our pings, rather than the old behavior where we would update the database, but not actually tear down the websocket.

spikecurtis · 2023-12-21T07:35:56Z

Current dependencies on/for this PR:

main
- PR chore: refactor agent connection updates #11301 👈

This stack of pull requests is managed by Graphite.

coderd/workspaceagentsrpc.go

coderd/workspaceagentsrpc_internal_test.go

mafredri · 2023-12-21T14:36:32Z

coderd/workspaceagentsrpc_internal_test.go

+		AnyTimes().
+		Return(database.WorkspaceBuild{ID: build.ID}, nil)
+
+	go uut.mind(ctx)


Relying on context to shut down this method its racy in that we may trigger goleak sporadically. Might be best to synchronize this to the test.

goleak waits for goroutines to complete up to a timeout. I don't believe we are in danger of triggering it unless the routine deadlocks, in which case goleak will have found a legit bug.

By default for only 2 seconds I believe, and I don’t think it’s configurable. Plenty of time for a slow env like Windows runner to trigger the edge case, I’d say.

Even on a slow Windows runner, 2 seconds is absolutely ages. I've never seen goleak give a false positive on a goroutine that was unblocked and just waiting to be scheduled by the runtime and to finish its work. Have you?

I have seen such failures in the past, which is why I raised the concern. In my experience it can and has resulted in rare flakes. I'd like to concretely say which scenarios, but truth be told I don't remember.

I also can't say whether or not it's the case for the current Windows runners, the only thing I can clearly say is that for the old GH Actions Windows runner 2 seconds was definitely not a long time and such expectations often resulted in a flake. 😄 We run so many things in parallel it's unclear how much time is needed, the more things exit leaving other things behind, the closer we get to filling up that 2 second gap, I suppose.

coderd/workspaceagentsrpc.go

mafredri

One last thing, but other than that this looks good 👍🏻.

PS. Now that I know about the uut pattern, I kinda like it. ☺️

mafredri · 2024-01-02T09:27:01Z

coderd/workspaceagentsrpc.go

-				return
-			}
-			lastPing.Store(ptr.Ref(time.Now()))
+func (api *API) startAgentWebsocketMonitor(ctx context.Context,


Since we now have a close method, I don't think we need to take a context here anymore, wdyt? For start we could simply create the context/cancel func based on api context here and pass it along.

PS. Thanks for the naming change, this now feels very obvious what it does when reading the code!

No, the API context is used when sending the final updated that the agent has been disconnected.

The context we accept here, and want for start(), is tied to the connection.

coderd/workspaceagentsrpc_internal_test.go

spikecurtis · 2024-01-02T12:04:39Z

Merge activity

Jan 2, 7:04 AM: @spikecurtis merged this pull request with Graphite.

github-actions bot assigned spikecurtis Dec 21, 2023

spikecurtis requested a review from mafredri December 21, 2023 07:36

mafredri reviewed Dec 21, 2023

View reviewed changes

spikecurtis force-pushed the spike/refactor-agent-update branch from e1f4d89 to 1f31780 Compare January 2, 2024 07:12

spikecurtis requested a review from mafredri January 2, 2024 07:43

mafredri approved these changes Jan 2, 2024

View reviewed changes

chore: refactor agent connection updates

61f5d26

spikecurtis force-pushed the spike/refactor-agent-update branch from 1f31780 to 61f5d26 Compare January 2, 2024 11:54

spikecurtis merged commit c9b7d61 into main Jan 2, 2024

spikecurtis deleted the spike/refactor-agent-update branch January 2, 2024 12:04

github-actions bot locked and limited conversation to collaborators Jan 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: refactor agent connection updates #11301

chore: refactor agent connection updates #11301

Uh oh!

spikecurtis commented Dec 21, 2023

Uh oh!

spikecurtis commented Dec 21, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mafredri Dec 21, 2023

Uh oh!

spikecurtis Dec 21, 2023

Uh oh!

mafredri Dec 21, 2023

Uh oh!

spikecurtis Jan 2, 2024

Uh oh!

mafredri Jan 2, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mafredri left a comment

Uh oh!

mafredri Jan 2, 2024

Uh oh!

spikecurtis Jan 2, 2024

Uh oh!

Uh oh!

spikecurtis commented Jan 2, 2024

Uh oh!

Uh oh!

chore: refactor agent connection updates #11301

chore: refactor agent connection updates #11301

Uh oh!

Conversation

spikecurtis commented Dec 21, 2023

Uh oh!

spikecurtis commented Dec 21, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mafredri Dec 21, 2023

Choose a reason for hiding this comment

Uh oh!

spikecurtis Dec 21, 2023

Choose a reason for hiding this comment

Uh oh!

mafredri Dec 21, 2023

Choose a reason for hiding this comment

Uh oh!

spikecurtis Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

mafredri Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mafredri left a comment

Choose a reason for hiding this comment

Uh oh!

mafredri Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

spikecurtis Jan 2, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spikecurtis commented Jan 2, 2024

Merge activity

Uh oh!

Uh oh!