Skip to content

fix: fix TestPendingUpdatesMetric flaky assertion #14534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 3, 2024

Conversation

spikecurtis
Copy link
Contributor

@spikecurtis spikecurtis commented Sep 3, 2024

Fixes flake seen here:

https://github.com/coder/coder/actions/runs/10677025332/job/29591518118

The original test waits for DB calls to update success and failure of notifications, and blocks on a pause channel. However, it uses a single pause channel for both DB calls, and so implicitly assumes that both success and failure updates occur during an update sync.

However, the update sync timer isn't synchronized to anything, and so there is a race where the update sync only has either the success or the failure result but not both. This blocks the test, which waits for both, and deadlocks before unpausing.

It appears the unpause mechanism is there to test that the PendingUpdates metric updates accordingly.

This fixes the flake by:

  1. using a Quartz clock to control when the Manager syncs updates.
  2. waiting for the PendingUpdates metric to reach 2, so that we know that both success and failure have been queued up
  3. triggering the update via the Quartz mock clock

This has the nice property that a "pause" function is no longer required: we know the manager is in our desired state before we trigger the update, and can assert on the Metrics before and after.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @spikecurtis and the rest of your teammates on Graphite Graphite

@spikecurtis spikecurtis marked this pull request as ready for review September 3, 2024 08:54
Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this @spikecurtis!

There's a minor change regarding dbtime to be made before merging.

func newNotifier(cfg codersdk.NotificationsConfig, id uuid.UUID, log slog.Logger, db Store,
hr map[database.NotificationMethod]Handler, metrics *Metrics, clock quartz.Clock,
) *notifier {
tick := clock.NewTicker(cfg.FetchInterval.Value(), "notifier", "fetchInterval")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love these ticker tags ❤️

@spikecurtis spikecurtis force-pushed the spike/test-pending-updates-metric-flake branch from 0747bb8 to 3ae3fbb Compare September 3, 2024 09:38
@spikecurtis spikecurtis merged commit 0eca1fc into main Sep 3, 2024
28 checks passed
Copy link
Contributor Author

Merge activity

@spikecurtis spikecurtis deleted the spike/test-pending-updates-metric-flake branch September 3, 2024 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants