Skip to content

fix(scaletest/createworkspaces): address race condition between closer and cleanup #10210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 11, 2023

Conversation

johnstcn
Copy link
Member

@johnstcn johnstcn commented Oct 11, 2023

Fixes #10202.

In goEventuallyStartFakeAgent we were returning a closer func and returning it to be called in test cleanup.
This had the side effect of not waiting for the agent to actually start AwaitWorkspaceAgents before stopping the test, e.g.

2023-10-10T17:53:23.5541190Z     run_test.go:552: 
2023-10-10T17:53:23.5541560Z         	Error Trace:	/Users/runner/work/coder/coder/coderd/coderdtest/coderdtest.go:862
2023-10-10T17:53:23.5542060Z         	            				/Users/runner/work/coder/coder/scaletest/createworkspaces/run_test.go:552
2023-10-10T17:53:23.5542570Z         	            				/Users/runner/hostedtoolcache/go/1.20.10/arm64/src/runtime/asm_arm64.s:1172
2023-10-10T17:53:23.5542720Z         	Error:      	Condition never satisfied
2023-10-10T17:53:23.5542830Z         	Test:       	Test_Runner/OK

Instead, just returning the closer channel and waiting for the agent to start AwaitWorkspaceAgents to finish before performing cleanup.

Copy link
Contributor

@spikecurtis spikecurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This had the side effect of not waiting for the agent to actually start before stopping the test

If we weren't actually waiting for the agent to start before stopping the test, why is it necessary to start the agent at all?

I'm guessing the answer is that the above phrasing is subtly misleading: the test requires the agent to actually start, but it doesn't require AwaitWorkspaceAgents to complete, and if we race shutting down and AwaitWorkspaceAgents, this can fail.

But, it would be good to make sure my reading is correct.

@johnstcn
Copy link
Member Author

This had the side effect of not waiting for the agent to actually start before stopping the test

If we weren't actually waiting for the agent to start before stopping the test, why is it necessary to start the agent at all?

I'm guessing the answer is that the above phrasing is subtly misleading: the test requires the agent to actually start, but it doesn't require AwaitWorkspaceAgents to complete, and if we race shutting down and AwaitWorkspaceAgents, this can fail.

But, it would be good to make sure my reading is correct.

Those were probably the words I wanted to write.

Copy link
Contributor

@spikecurtis spikecurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@johnstcn johnstcn merged commit ed8092c into main Oct 11, 2023
@johnstcn johnstcn deleted the cj/scaletest-createworkspaces-flake branch October 11, 2023 11:10
@github-actions github-actions bot locked and limited conversation to collaborators Oct 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test flake: scaletest/createworkspaces Test_Runner/OK
2 participants