Skip to content

flake: TestTelemetry/IDPOrgSync #645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dannykopping opened this issue May 16, 2025 · 1 comment
Open

flake: TestTelemetry/IDPOrgSync #645

dannykopping opened this issue May 16, 2025 · 1 comment
Assignees
Labels

Comments

@dannykopping
Copy link
Collaborator

https://github.com/coder/coder/actions/runs/15061235780/job/42336654525

=== FAIL: coderd/telemetry  (0.00s)
panic: test timed out after 10m0s
	running tests:
		TestTelemetry/IDPOrgSync (10m0s)

goroutine 354 [running]:
testing.(*M).startAlarm.func1()
	/Users/runner/work/_tool/go/1.24.2/arm64/src/testing/testing.go:2484 +0x308
created by time.goFunc
	/Users/runner/work/_tool/go/1.24.2/arm64/src/time/sleep.go:215 +0x38

goroutine 1 [chan receive, 10 minutes]:
testing.tRunner.func1()
	/Users/runner/work/_tool/go/1.24.2/arm64/src/testing/testing.go:1753 +0x43c
testing.tRunner(0x1400021e8c0, 0x140005cfba8)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/testing/testing.go:1798 +0x120
testing.runTests(0x140002a7248, {0x1050e0160, 0x5, 0x5}, {0x140005cfc18?, 0x1024c556c?, 0x105201ca0?})
	/Users/runner/work/_tool/go/1.24.2/arm64/src/testing/testing.go:2277 +0x3ec
testing.(*M).Run(0x140003794a0)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/testing/testing.go:2142 +0x588
go.uber.org/goleak.VerifyTestMain({0x1042afd98?, 0x140003794a0?}, {0x105201540, 0x4, 0x4})
	/Users/runner/go/pkg/mod/go.uber.org/goleak@v1.3.1-0.20240429205332-517bace7cc29/testmain.go:53 +0x44
github.com/coder/coder/v2/coderd/telemetry_test.TestMain(...)
	/Users/runner/work/coder/coder/coderd/telemetry/telemetry_test.go:33
main.main()
	_testmain.go:57 +0xc0

goroutine 9 [sync.Cond.Wait, 10 minutes]:
sync.runtime_notifyListWait(0x14000512950, 0x0)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/runtime/sema.go:597 +0x150
sync.(*Cond).Wait(0x14000512940)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/sync/cond.go:71 +0xc4
github.com/cihub/seelog.(*asyncLoopLogger).processItem(0x140003cc7e0)
	/Users/runner/go/pkg/mod/github.com/cihub/seelog@v0.0.0-20170130134532-f561c5e57575/behavior_asynclooplogger.go:50 +0x80
github.com/cihub/seelog.(*asyncLoopLogger).processQueue(0x140003cc7e0)
	/Users/runner/go/pkg/mod/github.com/cihub/seelog@v0.0.0-20170130134532-f561c5e57575/behavior_asynclooplogger.go:63 +0x38
created by github.com/cihub/seelog.NewAsyncLoopLogger in goroutine 1
	/Users/runner/go/pkg/mod/github.com/cihub/seelog@v0.0.0-20170130134532-f561c5e57575/behavior_asynclooplogger.go:40 +0xc8

goroutine 10 [sync.Cond.Wait, 10 minutes]:
sync.runtime_notifyListWait(0x14000512ad0, 0x0)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/runtime/sema.go:597 +0x150
sync.(*Cond).Wait(0x14000512ac0)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/sync/cond.go:71 +0xc4
github.com/cihub/seelog.(*asyncLoopLogger).processItem(0x140003cc900)
	/Users/runner/go/pkg/mod/github.com/cihub/seelog@v0.0.0-20170130134532-f561c5e57575/behavior_asynclooplogger.go:50 +0x80
github.com/cihub/seelog.(*asyncLoopLogger).processQueue(0x140003cc900)
	/Users/runner/go/pkg/mod/github.com/cihub/seelog@v0.0.0-20170130134532-f561c5e57575/behavior_asynclooplogger.go:63 +0x38
created by github.com/cihub/seelog.NewAsyncLoopLogger in goroutine 1
	/Users/runner/go/pkg/mod/github.com/cihub/seelog@v0.0.0-20170130134532-f561c5e57575/behavior_asynclooplogger.go:40 +0xc8

goroutine 12 [chan receive, 10 minutes]:
testing.tRunner.func1()
	/Users/runner/work/_tool/go/1.24.2/arm64/src/testing/testing.go:1753 +0x43c
testing.tRunner(0x1400021ea80, 0x1042a0458)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/testing/testing.go:1798 +0x120
created by testing.(*T).Run in goroutine 1
	/Users/runner/work/_tool/go/1.24.2/arm64/src/testing/testing.go:1851 +0x374

goroutine 105 [IO wait, 10 minutes]:
internal/poll.runtime_pollWait(0x1055b0a98, 0x72)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/runtime/netpoll.go:351 +0xa0
internal/poll.(*pollDesc).wait(0x140007fa100?, 0x1024b3f78?, 0x0)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x140007fa100)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/internal/poll/fd_unix.go:620 +0x24c
net.(*netFD).accept(0x140007fa100)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/net/fd_unix.go:172 +0x28
net.(*TCPListener).accept(0x14000796bc0)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/net/tcpsock_posix.go:159 +0x24
net.(*TCPListener).Accept(0x14000796bc0)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/net/tcpsock.go:380 +0x2c
net/http.(*Server).Serve(0x1400080a000, {0x1042c4258, 0x14000796bc0})
	/Users/runner/work/_tool/go/1.24.2/arm64/src/net/http/server.go:3424 +0x290
net/http/httptest.(*Server).goServe.func1()
	/Users/runner/work/_tool/go/1.24.2/arm64/src/net/http/httptest/server.go:311 +0x5c
created by net/http/httptest.(*Server).goServe in goroutine 135
	/Users/runner/work/_tool/go/1.24.2/arm64/src/net/http/httptest/server.go:309 +0x6c

goroutine 236 [select, 10 minutes]:
github.com/coder/coder/v2/coderd/telemetry.(*remoteReporter).runSnapshotter(0x14000814b40)
	/Users/runner/work/coder/coder/coderd/telemetry/telemetry.go:257 +0x2b4
created by github.com/coder/coder/v2/coderd/telemetry.New in goroutine 135
	/Users/runner/work/coder/coder/coderd/telemetry/telemetry.go:88 +0x278

goroutine 106 [select, 10 minutes]:
github.com/coder/coder/v2/coderd/telemetry.(*remoteReporter).runSnapshotter(0x140000c4a20)
	/Users/runner/work/coder/coder/coderd/telemetry/telemetry.go:257 +0x2b4
created by github.com/coder/coder/v2/coderd/telemetry.New in goroutine 135
	/Users/runner/work/coder/coder/coderd/telemetry/telemetry.go:88 +0x278

goroutine 135 [chan receive, 10 minutes]:
github.com/coder/coder/v2/coderd/telemetry_test.collectSnapshot(0x140006c7180, {0x1043079c0, 0x14000798d98}, 0x1042a05d0)
	/Users/runner/work/coder/coder/coderd/telemetry/telemetry_test.go:488 +0x2d0
github.com/coder/coder/v2/coderd/telemetry_test.TestTelemetry.func5(0x140006c7180)
	/Users/runner/work/coder/coder/coderd/telemetry/telemetry_test.go:293 +0xb8
testing.tRunner(0x140006c7180, 0x1042a05c8)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/testing/testing.go:1792 +0xe4
created by testing.(*T).Run in goroutine 12
	/Users/runner/work/_tool/go/1.24.2/arm64/src/testing/testing.go:1851 +0x374

goroutine 235 [IO wait, 10 minutes]:
internal/poll.runtime_pollWait(0x1055b01d8, 0x72)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/runtime/netpoll.go:351 +0xa0
internal/poll.(*pollDesc).wait(0x140009d0000?, 0x1024b3f78?, 0x0)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/internal/poll/fd_poll_runtime.go:84 +0x28
internal/poll.(*pollDesc).waitRead(...)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0x140009d0000)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/internal/poll/fd_unix.go:620 +0x24c
net.(*netFD).accept(0x140009d0000)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/net/fd_unix.go:172 +0x28
net.(*TCPListener).accept(0x14000797f40)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/net/tcpsock_posix.go:159 +0x24
net.(*TCPListener).Accept(0x14000797f40)
	/Users/runner/work/_tool/go/1.24.2/arm64/src/net/tcpsock.go:380 +0x2c
net/http.(*Server).Serve(0x1400080a500, {0x1042c4258, 0x14000797f40})
	/Users/runner/work/_tool/go/1.24.2/arm64/src/net/http/server.go:3424 +0x290
net/http/httptest.(*Server).goServe.func1()
	/Users/runner/work/_tool/go/1.24.2/arm64/src/net/http/httptest/server.go:311 +0x5c
created by net/http/httptest.(*Server).goServe in goroutine 135
	/Users/runner/work/_tool/go/1.24.2/arm64/src/net/http/httptest/server.go:309 +0x6c
FAIL	github.com/coder/coder/v2/coderd/telemetry	600.099s
@hugodutka
Copy link

The source of the issue seems to be this error:

 error= perform request:
                   github.com/coder/coder/v2/coderd/telemetry.(*remoteReporter).deployment
                       /Users/runner/work/coder/coder/coderd/telemetry/telemetry.go:355
                 - Post "http://127.0.0.1:56934/deployment": net/http: HTTP/1.x transport connection broken: http: CloseIdleConnections called

The telemetry report never gets sent and the test waits indefinitely on a channel, eventually triggering the timeout panic.

The telemetry reporter uses the default, global HTTP client to send reports

resp, err := http.DefaultClient.Do(req)
if err != nil {
	return xerrors.Errorf("perform request: %w", err)
}

It seems that some other test running in parallel in the package calls CloseIdleConnections on the same client, which causes the error.

The default HTTP client is used in many places in Coder and the flake we're seeing here manifests in other tests too, like this one: #582.

I don't have a generic solution in mind, so for now I'm going to ensure waiting on the telemetry report channel times out without a panic so our CI can retry the test gracefully: coder/coder#17879

hugodutka added a commit to coder/coder that referenced this issue May 22, 2025
This PR ensures that waits on channels will time out according to the
test context, rather than waiting indefinitely. This should alleviate
the panic seen in coder/internal#645 and, if
the deadlock recurs, allow the test to be retried automatically in CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants