Skip to content

chore: fix race in cron close behavior (TestAgent_WriteVSCodeConfigs) #11243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion agent/agentscripts/agentscripts.go
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,18 @@ func (r *Runner) StartCron() {
// has exited by the time the `cron.Stop()` context returns, so we need to
// track it manually.
err := r.trackCommandGoroutine(func() {
r.cron.Run()
// Since this is run async, in quick unit tests, it is possible the
// Close() function gets called before we even start the cron.
// In these cases, the Run() will never end.
// So if we are closed, we just return, and skip the Run() entirely.
select {
case <-r.cronCtx.Done():
// The cronCtx is cancelled before cron.Close() happens. So if the ctx is
// cancelled, then Close() will be called, or it is about to be called.
// So do nothing!
default:
r.cron.Run()
}
})
if err != nil {
r.Logger.Warn(context.Background(), "start cron failed", slog.Error(err))
Expand Down Expand Up @@ -315,6 +326,7 @@ func (r *Runner) Close() error {
return nil
}
close(r.closed)
// Must cancel the cron ctx BEFORE stopping the cron.
r.cronCtxCancel()
<-r.cron.Stop().Done()
r.cmdCloseWait.Wait()
Expand Down
9 changes: 9 additions & 0 deletions agent/agentscripts/agentscripts_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,15 @@ func TestTimeout(t *testing.T) {
require.ErrorIs(t, runner.Execute(context.Background(), nil), agentscripts.ErrTimeout)
}

// TestCronClose exists because cron.Run() can happen after cron.Close().
// If this happens, there used to be a deadlock.
func TestCronClose(t *testing.T) {
t.Parallel()
runner := agentscripts.New(agentscripts.Options{})
runner.StartCron()
require.NoError(t, runner.Close(), "close runner")
}

Comment on lines +56 to +64
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to know for sure if this was the bug... I was able to reproduce this way, still feels like it might be more rare intertwined into some setup code that has more chances for the order to be "correct".

func setup(t *testing.T, patchLogs func(ctx context.Context, req agentsdk.PatchLogs) error) *agentscripts.Runner {
t.Helper()
if patchLogs == nil {
Expand Down