Skip to content

fix: prevent notifier test flakiness #14467

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 28, 2024
Merged

fix: prevent notifier test flakiness #14467

merged 1 commit into from
Aug 28, 2024

Conversation

dannykopping
Copy link
Contributor

@dannykopping dannykopping commented Aug 28, 2024

Fixes coder/internal#281

There was a TOCTOU race between checking whether the notifier should pause and processing the messages:

// run is the main loop of the notifier.
func (n *notifier) run(ctx context.Context, success chan<- dispatchResult, failure chan<- dispatchResult) error {
...
 for {
    ...
    // Check if notifier is not paused.
    ok, err := n.ensureRunning(ctx)
    if err != nil {
	    n.log.Warn(ctx, "failed to check notifier state", slog.Error(err))
    }
    
    if ok {
            // <---- if the notifier was paused at this point, any pending messages would still be processed.

	    // Call process() immediately (i.e. don't wait an initial tick).
	    err = n.process(ctx, success, failure)
	    if err != nil {
		    n.log.Error(ctx, "failed to process messages", slog.Error(err))
	    }
    }
  ...
  }
...

I'm of the opinion that we don't need to fix this in the code itself (it's not super critical that pausing the notifier is effected immediately); it was just a problem for the tests.

Previously in the tests:

  1. The manager was started first
  2. A message was enqueued
  3. Notifier ticked, to process notifications
  4. Notifier is determined to be paused (while already in process method) <-- TOCTOU
  5. Notification is processed

Now:

  1. Message is enqueued
  2. Manager is started
  3. Notifier ticked, to process notifications
  4. Notifier is determined to be paused, process method exits

Signed-off-by: Danny Kopping <danny@coder.com>
@dannykopping dannykopping marked this pull request as ready for review August 28, 2024 14:19
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'm fine with this approach if it stops the flakes for now, but Quartz is a better solution overall.

@dannykopping
Copy link
Contributor Author

👍 I'm fine with this approach if it stops the flakes for now, but Quartz is a better solution overall.

Indeed. We still have coder/internal#6 which could be picked up; it'll take some doing to plumb it all through.

@dannykopping dannykopping merged commit f24cb5c into main Aug 28, 2024
39 checks passed
@dannykopping dannykopping deleted the dk/notifier-paused branch August 28, 2024 14:33
@github-actions github-actions bot locked and limited conversation to collaborators Aug 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flake: TestNotifierPaused
2 participants