Skip to content

flake: TestWorkspaceActivityBump/Dial #102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
spikecurtis opened this issue Oct 15, 2024 · 4 comments
Closed

flake: TestWorkspaceActivityBump/Dial #102

spikecurtis opened this issue Oct 15, 2024 · 4 comments
Assignees
Labels

Comments

@spikecurtis
Copy link

    activitybump_test.go:182: [Zone=UTC 0] originDeadline: 2024-10-15 07:50:47.610686 +0000 UTC, deadline: 2024-10-15 08:49:49.593297 +0000 UTC, now 2024-10-15 07:49:59.891499 +0000 UTC, (now-deadline)=-59m49.701798s
    activitybump_test.go:182: 
        	Error Trace:	/home/runner/work/coder/coder/coderd/activitybump_test.go:159
        	            				/home/runner/work/coder/coder/coderd/activitybump_test.go:182
        	Error:      	Max difference between 2024-10-15 08:49:59.891618 +0000 UTC and 2024-10-15 08:49:49.593297 +0000 UTC allowed is 10s, but difference was 10.298321s
        	Test:       	TestWorkspaceActivityBump/Dial

seen here: https://github.com/coder/coder/actions/runs/11341558999/job/31540089468

@Emyrk
Copy link
Member

Emyrk commented Oct 15, 2024

This test feels like it's testing too much. We could probably split it into a workspace agent test, and a coderd test.

Some notes debugging the log outputs:

In the failed test, we see bumped deadline from activity 4 times, all with connection_count:1

  • 07:49:49.598
  • 07:49:53.736
  • 07:49:57.511
  • 07:49:59.719

Generally, we only see this happen once in a unit test.

ActivityBumpWorkspace will only bump if more than 5% of the deadline elapsed.

From what I can tell:

  1. The deadline was bumped at 07:49:49.598 to 08:49:49.593297. Subsequent bumps were ignored.
  2. The assertion took 10s to finally detect the bump, and do the assertion. So now was now 07:49:59.891499, which was 10s late.

I created this commit to add some logs which should help confirm my suspicion.
coder/coder@b0bc897

@spikecurtis
Copy link
Author

We log all API calls, so it seems the eventually loop you pointed out only runs once. It just takes ages because our CI is underpowered and we are doing to many things in parallel in that test. The API call itself returns in 1.3s according to Coderd

    t.go:99: 2024-10-15 07:49:59.841 [debu]  coderd: GET  host=localhost:39391  path=/api/v2/workspaces/22fc04a4-8aa6-42e6-954c-53c486183a95  proto=HTTP/1.1  remote_addr=127.0.0.1  start="2024-10-15T07:49:58.529362644Z"  took=1.312592225s  status_code=200  latency_ms=1312  request_id=d393a351-743c-4e06-a595-16b508e00995

So the majority of the time is just waiting for the client to actually dial up the server.

@Emyrk
Copy link
Member

Emyrk commented Oct 16, 2024

@spikecurtis good look on the api call. I was trying to verify my hunch. My added logs are a bit more explicit for next time.

That is crazy slow

@matifali matifali removed the tests label Oct 16, 2024
@Emyrk
Copy link
Member

Emyrk commented Oct 16, 2024

Closing from coder/coder#15089

Will reopen if it flakes again

@Emyrk Emyrk closed this as completed Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants