Skip to content

fix(tailnet): Skip nodes without DERP, avoid use of RemoveAllPeers #6320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 24, 2023

Conversation

mafredri
Copy link
Member

This PR targets #6292 and has two main changes:

  1. Skip nodes that have no preferred DERP server so that we don't try to contact them too early (this often leads to a 5+ seconds timeout due to WireGuard retries)
  2. Avoid the use of conn.RemoveAllPeers (in favor of conn.UpdateNodes(nodes, replace=true)
  3. (A few additional agentConn fixes where we AwaitReacahble first)

The reason for 2. is that on initial connections, there's a race between receiving a valid node, removing it, and then trying to add it back, this leads to 30+ seconds delays e.g. in scaletests and sometimes timeouts.

We might be able to replace many uses of AwaitReachable with something akin to "wait until first peer node with derp". This might speed up connections by not having to do a ping.

@mafredri mafredri self-assigned this Feb 23, 2023
go func() {
err := (*api.TailnetCoordinator.Load()).ServeClient(serverConn, uuid.New(), agentID)
if err != nil {
api.Logger.Warn(r.Context(), "tailnet coordinator client error", slog.Error(err))
_ = conn.Close()
_ = agentConn.Close()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change OK? And should this always happen if ServeClient exits?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, it seems good!

go func() {
err := (*api.TailnetCoordinator.Load()).ServeClient(serverConn, uuid.New(), agentID)
if err != nil {
api.Logger.Warn(r.Context(), "tailnet coordinator client error", slog.Error(err))
_ = conn.Close()
_ = agentConn.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, it seems good!

Base automatically changed from mafredri/fix-agentconn-races to main February 24, 2023 11:11
@mafredri mafredri force-pushed the mafredri/fix-agentconn-races-part2 branch from e63df40 to a3f3776 Compare February 24, 2023 11:23
@mafredri mafredri merged commit 677721e into main Feb 24, 2023
@mafredri mafredri deleted the mafredri/fix-agentconn-races-part2 branch February 24, 2023 16:16
@github-actions github-actions bot locked and limited conversation to collaborators Feb 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants