test: fix cleanup order on provisioner daemon work dir #9668
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Attempting to fix some of the error here: #9379. I think other categories of this test flake exist. I was only investigating the provisioner daemon route.
In unit tests our provisioner daemon creates a working directory. For some unit tests (like one of the ones in the issue), we do not wait for the template version job to complete before ending the test. So a provisioner job can exist when the test is closing.
The previous cleanup order tried to delete the working directory first, and then close the priovisioner, which can cause the race condition of a job writing files when the delete happens.
This PR fixes the
t.Cleanup
order, however I think there is still a race condition because in the job runner, we do not check the context when extracting an archive:coder/provisionersdk/session.go
Line 201 in c7d66d3
A small race condition still exists. The for loop loops over the tar file, writing each file to disk. I added a check for the context cancelled to abort the loop early and stop writing files if it fails, but I did not remove the race condition entirely (see comment).