-
Notifications
You must be signed in to change notification settings - Fork 886
test(tailnet): Add TestTransmitHang test with packet capture and logging #7414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f674ac1
to
46052b4
Compare
46052b4
to
17bca06
Compare
Good repro, Mathias! It's fairly easy to show it on dogfood, but I managed to do it also on M1:
I'm going to research this further. |
I spent a day investigating this issue. In most cases, my research ended on the gVisor:
I tried a few more things:
(also
I think that @mafredri's test proves that it doesn't look like a bug in Coder's source code, but rather in |
I'm able to reproduce this in Tailscale's diff --git a/tsnet/tsnet_test.go b/tsnet/tsnet_test.go
index 5ab27099..30b5c07e 100644
--- a/tsnet/tsnet_test.go
+++ b/tsnet/tsnet_test.go
@@ -261,25 +261,25 @@ func TestConn(t *testing.T) {
if err != nil {
t.Fatal(err)
}
-
- want := "hello"
- if _, err := io.WriteString(w, want); err != nil {
- t.Fatal(err)
+ defer r.Close()
+ go io.Copy(io.Discard, r)
+
+ t.Log("writing")
+ payload := []byte(strings.Repeat("hello world\n", 65536/12))
+ size := 0
+ for i := 0; i < 1024*2; i++ {
+ t.Logf("write payload %d: %d bytes", i, size/1024)
+ n, err := w.Write(payload)
+ if err != nil {
+ t.Fatal(err)
+ }
+ size += n
}
- got := make([]byte, len(want))
- if _, err := io.ReadAtLeast(r, got, len(got)); err != nil {
+ err = w.Close()
+ if err != nil {
t.Fatal(err)
}
- t.Logf("got: %q", got)
- if string(got) != want {
- t.Errorf("got %q, want %q", got, want)
- }
-
- _, err = s2.Dial(ctx, "tcp", fmt.Sprintf("%s:8082", s1ip)) // some random port
- if err == nil {
- t.Fatalf("unexpected success; should have seen a connection refused error")
- }
}
func TestLoopbackLocalAPI(t *testing.T) { |
This Pull Request is becoming stale. In order to minimize WIP, prevent merge conflicts and keep the tracker readable, I'm going close to this PR in 3 days if there isn't more activity. |
This commit adds a test that occasionally triggers the issue in #7262. When triggered, the test takes e.g. ~30s to complete instead of 2s. The test also logs both from both `tailnet.Conn`s into a log file and produces pcap files from each side.
17bca06
to
2573049
Compare
This commit adds a test that occasionally triggers the issue in #7262. When triggered, the test takes e.g. ~30s to complete instead of 2s. The test also logs both from both
tailnet.Conn
s into a log file and produces pcap files from each side.Trigger by simply running
go test ./tailnet -run=TestTransmitHang -v -count=1
until it takes longer than a few seconds. NOTE: I've been reproducing this in a Docker container.To trigger this in dogfood, it may help to set
export GOMAXPROCS=2
. (I've also been able to trigger the issue on an M1 MacBook with this setting.)To analyze the pcap files, you'll need Wireshark and
ts-dissector.lua
, start viawireshark -X lua_script:ts-dissector.lua tailnet-test-3304800396/send.pcap
.To further increase log verbosity, enable Tailscale debug flags before running the test, e.g.
export TS_DEBUG_DISCO=true TS_DEBUG_DERP=true TS_DEBUG_NETSTACK=true
.To me there's no obvious indicators as to whats going wrong. First we see a bunch of
TCP Dup Ack
s after which we're spinning our wheels inTCP Retransmission
for ~30 seconds.This issue can be reproduced using the currently bundled
coder/tailscale#ed9307cf1b22
as well ascoder/tailscale#main
(at the time of writing).We also have a few changes in our Tailscale fork which I've tried to revert like:
However, even with all those changes reverted, we still run into this issue.