fix: fix graceful disconnect in DialWorkspaceAgent #11993

spikecurtis · 2024-02-02T09:15:51Z

I noticed in testing that the CLI wasn't correctly sending the disconnect message when it shuts down, and thus agents are seeing this as a "lost" peer, rather than a "disconnected" one.

What was happening is that we just used a single context for everything from the netconn to the RPCs, and when the context was canceled we failed to send the disconnect message due to canceled context.

So, this PR splits things into two contexts, with a graceful one set to last up to 1 second longer than the main one.

spikecurtis · 2024-02-02T09:15:55Z

main
- chore: rename FakeCoordinator for export #11991
  - chore: move FakeCoordinator to tailnettest #11992
    - fix: fix graceful disconnect in DialWorkspaceAgent #11993 👈

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @spikecurtis and the rest of your teammates on Graphite

johnstcn

Nice find! Some comments below.

johnstcn · 2024-02-02T09:46:33Z

codersdk/workspaceagents_internal_test.go

+type fakeTailnetConn struct{}
+
+func (*fakeTailnetConn) UpdatePeers([]*proto.CoordinateResponse_PeerUpdate) error {
+	// TODO implement me
+	panic("implement me")
+}
+
+func (*fakeTailnetConn) SetAllPeersLost() {}
+
+func (*fakeTailnetConn) SetNodeCallback(func(*tailnet.Node)) {}
+
+func (*fakeTailnetConn) SetDERPMap(*tailcfg.DERPMap) {}
+
+func newFakeTailnetConn() *fakeTailnetConn {
+	return &fakeTailnetConn{}
+}


Should this live in tailnettest as well?

I don't think so, because the interface we are faking lives in codersdk, even though the "real" object we are faking lives in tailnet.

johnstcn · 2024-02-02T09:46:53Z

codersdk/workspaceagents_internal_test.go

+func (*fakeTailnetConn) UpdatePeers([]*proto.CoordinateResponse_PeerUpdate) error {
+	// TODO implement me
+	panic("implement me")
+}


Can we call t.Fail() instead of just panicking?

panic is nice because it gives you a stack trace.

changing to t.Fail() won't give a stack trace, so you'll have to manually chase down how the function could have been called by your test.

codersdk/workspaceagents.go

johnstcn · 2024-02-02T09:51:39Z

codersdk/workspaceagents.go

+	<-tac.ctx.Done()
+	select {
+	case <-tac.closed:
+	case <-time.After(time.Second):


(Non-blocking) I figure this is a best-effort situation, but will 1 second be enough? Does this need to be a configurable knob?

I think it should be plenty, even on a slow connection because we're not waiting for a reply. I definitely don't want to plumb configuration thru.

It is a best effort as you say --- consequence of not doing this is that the agent on the other side will see it as "lost" and possibly still try to handshake with it for up to 15 minutes.

spikecurtis · 2024-02-05T09:33:10Z

Merge activity

Feb 5, 4:33 AM EST: @spikecurtis started a stack merge that includes this pull request via Graphite.
Feb 5, 4:50 AM EST: Graphite rebased this pull request as part of a merge.
Feb 5, 5:01 AM EST: @spikecurtis merged this pull request with Graphite.

This was referenced Feb 2, 2024

chore: rename FakeCoordinator for export #11991

Merged

chore: move FakeCoordinator to tailnettest #11992

Merged

github-actions bot assigned spikecurtis Feb 2, 2024

spikecurtis requested review from coadler and johnstcn February 2, 2024 09:17

spikecurtis marked this pull request as ready for review February 2, 2024 09:21

spikecurtis force-pushed the spike/tailnet-graceful-disconnect branch from 6371570 to 308bad6 Compare February 2, 2024 09:33

johnstcn reviewed Feb 2, 2024

View reviewed changes

spikecurtis force-pushed the spike/fake-coordinator-move branch from 02f29b5 to f39414c Compare February 5, 2024 06:45

spikecurtis force-pushed the spike/tailnet-graceful-disconnect branch from 308bad6 to 73cfe1a Compare February 5, 2024 06:46

spikecurtis requested a review from johnstcn February 5, 2024 06:46

johnstcn approved these changes Feb 5, 2024

View reviewed changes

spikecurtis force-pushed the spike/fake-coordinator-move branch from f39414c to 19806f6 Compare February 5, 2024 09:33

Base automatically changed from spike/fake-coordinator-move to main February 5, 2024 09:49

fix: fix graceful disconnect in DialWorkspaceAgent

33e84aa

spikecurtis force-pushed the spike/tailnet-graceful-disconnect branch from 73cfe1a to 33e84aa Compare February 5, 2024 09:49

spikecurtis merged commit e5ba586 into main Feb 5, 2024

spikecurtis deleted the spike/tailnet-graceful-disconnect branch February 5, 2024 10:01

github-actions bot locked and limited conversation to collaborators Feb 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix graceful disconnect in DialWorkspaceAgent #11993

fix: fix graceful disconnect in DialWorkspaceAgent #11993

spikecurtis commented Feb 2, 2024 •

edited

Loading

spikecurtis commented Feb 2, 2024 •

edited

Loading

johnstcn left a comment

johnstcn Feb 2, 2024

spikecurtis Feb 5, 2024

johnstcn Feb 2, 2024

spikecurtis Feb 5, 2024

johnstcn Feb 2, 2024

spikecurtis Feb 5, 2024

spikecurtis commented Feb 5, 2024 •

edited

Loading

fix: fix graceful disconnect in DialWorkspaceAgent #11993

fix: fix graceful disconnect in DialWorkspaceAgent #11993

Conversation

spikecurtis commented Feb 2, 2024 • edited Loading

spikecurtis commented Feb 2, 2024 • edited Loading

johnstcn left a comment

Choose a reason for hiding this comment

johnstcn Feb 2, 2024

Choose a reason for hiding this comment

spikecurtis Feb 5, 2024

Choose a reason for hiding this comment

johnstcn Feb 2, 2024

Choose a reason for hiding this comment

spikecurtis Feb 5, 2024

Choose a reason for hiding this comment

johnstcn Feb 2, 2024

Choose a reason for hiding this comment

spikecurtis Feb 5, 2024

Choose a reason for hiding this comment

spikecurtis commented Feb 5, 2024 • edited Loading

Merge activity

spikecurtis commented Feb 2, 2024 •

edited

Loading

spikecurtis commented Feb 2, 2024 •

edited

Loading

spikecurtis commented Feb 5, 2024 •

edited

Loading