Skip to content

chore: refactor notifier to use quartz.TickerFunc #15134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 21, 2024

Conversation

spikecurtis
Copy link
Contributor

@spikecurtis spikecurtis commented Oct 18, 2024

In investigating coder/internal#109 I noticed many of the notification tests are still using time.Sleep and require.Eventually. This is an initial effort to start converting these to Quartz.

One product change is to switch the notifier to use a TickerFunc instead of a normal Ticker, since it allows the test to assert that a batch process is complete via the Quartz Mock clock. This does introduce one slight behavioral change in that the notifier waits the fetch interval before processing its first batch. In practice, this is inconsequential: no one will notice if we send notifications immediately on startup, or just a little later.

But, it does make a difference to some tests, which are fixed up here.

Copy link
Contributor Author

spikecurtis commented Oct 18, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @spikecurtis and the rest of your teammates on Graphite Graphite

// Stop() waits for the in-progress flush to complete, meaning we have to advance the time such that sync triggers
// a total of (batchSize/StoreSyncBufferSize)-1 times. The -1 is because once we run the penultimate sync, it
// clears space in the buffer for the last dispatches of the batch, which allows graceful shutdown to continue
// immediately, without waiting for the last trigger.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dannykopping this behavior of the manager surprised me: it doesn't immediately flush when you call Stop(), it waits until the notifier exits. In this sense, it was never really the Stop() that relieved the backpressure in this test, it was that we wait around until the sync triggers enough times to relieve the backpressure. Calling Stop() early prevents another fetch from starting tho.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding additional clarification in the comment here.
This looks great!

this behavior of the manager surprised me

This is a smell for me; is there a way we could make this more explicit?

I worry about requiring a sequence of calls (i.e. Drain() + Stop()) to properly shutdown the manager (which would make this more clear but would introduce some potential problems if not called correctly or at all).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do worry that we can possibly wait around a long time for Graceful shutdown. It's up to 2s by default for a Sync, then the sync itself times out after 30s. The delivery of notifications times out after 60s.

Normally we expect Graceful shutdown to take 5-15 seconds or less, otherwise humans or cluster managers are starting to get likely to just kill us anyway. Maybe the initial, high level context can serve this purpose, but right now it's just tied to the CLI command, and doesn't get canceled in practice. If you moved the Manager to within coderd, then I think the API context does eventually get canceled.

@@ -88,7 +88,7 @@ require (
github.com/cli/safeexec v1.0.1
github.com/coder/flog v1.1.0
github.com/coder/pretty v0.0.0-20230908205945-e89ba86370e0
github.com/coder/quartz v0.1.0
github.com/coder/quartz v0.1.2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, the backpressure test surfaced some bugs in Quartz that I had to fix to get the mock to behave like the real TickerFunc.

@spikecurtis spikecurtis force-pushed the spike/internal-109-notifier-ticker-func branch from 9103601 to 4ac5e9b Compare October 18, 2024 10:31
Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a mil for looking into this

// Stop() waits for the in-progress flush to complete, meaning we have to advance the time such that sync triggers
// a total of (batchSize/StoreSyncBufferSize)-1 times. The -1 is because once we run the penultimate sync, it
// clears space in the buffer for the last dispatches of the batch, which allows graceful shutdown to continue
// immediately, without waiting for the last trigger.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding additional clarification in the comment here.
This looks great!

this behavior of the manager surprised me

This is a smell for me; is there a way we could make this more explicit?

I worry about requiring a sequence of calls (i.e. Drain() + Stop()) to properly shutdown the manager (which would make this more clear but would introduce some potential problems if not called correctly or at all).

@spikecurtis spikecurtis changed the base branch from spike/internal109-coderd to graphite-base/15134 October 21, 2024 06:39
@spikecurtis spikecurtis force-pushed the spike/internal-109-notifier-ticker-func branch from 4ac5e9b to 0ca6cf1 Compare October 21, 2024 06:39
@spikecurtis spikecurtis changed the base branch from graphite-base/15134 to main October 21, 2024 06:40
@spikecurtis spikecurtis force-pushed the spike/internal-109-notifier-ticker-func branch from 0ca6cf1 to a2a1743 Compare October 21, 2024 06:40
@spikecurtis spikecurtis merged commit 29099d4 into main Oct 21, 2024
27 checks passed
@spikecurtis spikecurtis deleted the spike/internal-109-notifier-ticker-func branch October 21, 2024 08:07
@github-actions github-actions bot locked and limited conversation to collaborators Oct 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants