-
Notifications
You must be signed in to change notification settings - Fork 887
fix: use TSMP for pings and checking reachability #11306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current dependencies on/for this PR: This stack of pull requests is managed by Graphite. |
5933e85
to
0576b65
Compare
0576b65
to
b43cf33
Compare
Signed-off-by: Spike Curtis <spike@coder.com>
@@ -186,7 +186,7 @@ func TestAgent_Stats_Magic(t *testing.T) { | |||
// If it isn't, it's set to -1. | |||
s.ConnectionMedianLatencyMS >= 0 | |||
}, testutil.WaitLong, testutil.IntervalFast, | |||
"never saw stats: %+v", s, | |||
"never saw stats", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives misleading information since the pointer is passed on the initial call to Eventually
. It is not updated since <-stats
returns a new pointer, rather than updating the data at the old pointer. Changed in favor of logging the relevant info as we get it.
s, ok := <-stats | ||
t.Logf("got stats after disconnect %t, %d", | ||
ok, s.SessionCountJetBrains) | ||
return ok && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TSMP ping counts as a "connection" in the agent stats so it never goes to zero. I'm not sure why we were testing it going to zero in the first place @code-asher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we wanted to make sure ConnectionCount
was also updating correctly, but did not realize there would be other connections going on.
Merge activity
|
We're seeing some flaky tests related to agent connectivity - https://github.com/coder/coder/actions/runs/7286675441/job/19856270998
I'm pretty sure what happened in this one is that the client opened a connection while the wgengine was in the process of reconfiguring the wireguard device, so the fact that the peer became "active" as a result of traffic being sent was not noticed.
The test calls
AwaitReachable()
but this only tests the disco layer, so it doesn't wait for wireguard to come up.I think we should be using TSMP for pinging and reachability, since this operates at the IP layer, and therefore requires that wireguard comes up before being successful.
This should also help with the problems we have seen where a TCP connection starts before wireguard is up and the initial round trip has to wait for the 5 second wireguard handshake retry.
fixes: #11294