Skip to content

chore: fix race in cron close behavior (TestAgent_WriteVSCodeConfigs) #11243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 18, 2023

Conversation

Emyrk
Copy link
Member

@Emyrk Emyrk commented Dec 15, 2023

Closes #11226

Problem

The problem is cmdCloseWait never closes, so there is an open agent go routine that will not close. In the unit test in Question, cron.Run() is the only routine started. So I figured it is likely the routine left open. You can exercise the test locally with go test -run TestAgent_WriteVSCodeConfigs --timeout=8s -count=20. Or I wrote a new test that always exercises what I think the bug is.

What is happening

The cron package we are using allows Run() to be called after Close(). This essentially allows restarting the cron daemon, which is interesting.

Run() is called on a go routine. So in our unit tests, it is possible the Close() gets called first, then Run() is called. There exists a time period between these 2 lines of code:

<-r.cron.Stop().Done()
r.cmdCloseWait.Wait()

If Run() is called between there, the cmdCloseWait will never terminate.

A fix

If the cronCtx is cancelled, we just never call cron.Run(). The ctx is always cancelled before the stop, so this should be pretty good. Ideally we implement this inside the cron package and hook on their running flag. However this should suffice.

This fix still has a race condition. I do not see a clean solution
without modifying the cron libary. The cron library uses a boolean
to indicate running, and that boolean needs to be set to "true"
before we call "Close()". Or "Close()" should prevent "Run()"
from doing anything.

In either case, this solves the issue for a niche unit test bug
in which the test finishes, calling Close(), before there was
an oppertunity to start the go routine. It probably isn't worth
a lot of time investment, and this fix will suffice
@Emyrk Emyrk changed the title chore: add unit test to excercise flake chore: fix(ish) flake calling cron.Close() before cron.Run() Dec 15, 2023
@Emyrk Emyrk changed the title chore: fix(ish) flake calling cron.Close() before cron.Run() chore: fix race in cron close behavior (TestAgent_WriteVSCodeConfigs) Dec 15, 2023
Comment on lines +56 to +64
// TestCronClose exists because cron.Run() can happen after cron.Close().
// If this happens, there used to be a deadlock.
func TestCronClose(t *testing.T) {
t.Parallel()
runner := agentscripts.New(agentscripts.Options{})
runner.StartCron()
require.NoError(t, runner.Close(), "close runner")
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to know for sure if this was the bug... I was able to reproduce this way, still feels like it might be more rare intertwined into some setup code that has more chances for the order to be "correct".

@Emyrk Emyrk marked this pull request as ready for review December 15, 2023 21:34
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the use of TDD to pin this one down!

@Emyrk Emyrk merged commit a6901ae into main Dec 18, 2023
@Emyrk Emyrk deleted the stevenmasley/flake_TestAgent_WriteVSCodeConfigs branch December 18, 2023 15:26
@github-actions github-actions bot locked and limited conversation to collaborators Dec 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flake: TestAgent_WriteVSCodeConfigs
2 participants