-
Notifications
You must be signed in to change notification settings - Fork 881
fix(tailnet): Skip nodes without DERP, avoid use of RemoveAllPeers #6320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
go func() { | ||
err := (*api.TailnetCoordinator.Load()).ServeClient(serverConn, uuid.New(), agentID) | ||
if err != nil { | ||
api.Logger.Warn(r.Context(), "tailnet coordinator client error", slog.Error(err)) | ||
_ = conn.Close() | ||
_ = agentConn.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change OK? And should this always happen if ServeClient exits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this, it seems good!
go func() { | ||
err := (*api.TailnetCoordinator.Load()).ServeClient(serverConn, uuid.New(), agentID) | ||
if err != nil { | ||
api.Logger.Warn(r.Context(), "tailnet coordinator client error", slog.Error(err)) | ||
_ = conn.Close() | ||
_ = agentConn.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this, it seems good!
e63df40
to
a3f3776
Compare
This PR targets #6292 and has two main changes:
conn.RemoveAllPeers
(in favor ofconn.UpdateNodes(nodes, replace=true)
AwaitReacahble
first)The reason for 2. is that on initial connections, there's a race between receiving a valid node, removing it, and then trying to add it back, this leads to 30+ seconds delays e.g. in scaletests and sometimes timeouts.
We might be able to replace many uses of AwaitReachable with something akin to "wait until first peer node with derp". This might speed up connections by not having to do a ping.