-
Notifications
You must be signed in to change notification settings - Fork 887
chore: refactor notifier to use quartz.TickerFunc #15134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @spikecurtis and the rest of your teammates on |
// Stop() waits for the in-progress flush to complete, meaning we have to advance the time such that sync triggers | ||
// a total of (batchSize/StoreSyncBufferSize)-1 times. The -1 is because once we run the penultimate sync, it | ||
// clears space in the buffer for the last dispatches of the batch, which allows graceful shutdown to continue | ||
// immediately, without waiting for the last trigger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dannykopping this behavior of the manager surprised me: it doesn't immediately flush when you call Stop()
, it waits until the notifier exits. In this sense, it was never really the Stop()
that relieved the backpressure in this test, it was that we wait around until the sync triggers enough times to relieve the backpressure. Calling Stop()
early prevents another fetch from starting tho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding additional clarification in the comment here.
This looks great!
this behavior of the manager surprised me
This is a smell for me; is there a way we could make this more explicit?
I worry about requiring a sequence of calls (i.e. Drain()
+ Stop()
) to properly shutdown the manager (which would make this more clear but would introduce some potential problems if not called correctly or at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do worry that we can possibly wait around a long time for Graceful shutdown. It's up to 2s by default for a Sync, then the sync itself times out after 30s. The delivery of notifications times out after 60s.
Normally we expect Graceful shutdown to take 5-15 seconds or less, otherwise humans or cluster managers are starting to get likely to just kill us anyway. Maybe the initial, high level context can serve this purpose, but right now it's just tied to the CLI command, and doesn't get canceled in practice. If you moved the Manager to within coderd
, then I think the API
context does eventually get canceled.
@@ -88,7 +88,7 @@ require ( | |||
github.com/cli/safeexec v1.0.1 | |||
github.com/coder/flog v1.1.0 | |||
github.com/coder/pretty v0.0.0-20230908205945-e89ba86370e0 | |||
github.com/coder/quartz v0.1.0 | |||
github.com/coder/quartz v0.1.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, the backpressure test surfaced some bugs in Quartz that I had to fix to get the mock to behave like the real TickerFunc.
9103601
to
4ac5e9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a mil for looking into this
// Stop() waits for the in-progress flush to complete, meaning we have to advance the time such that sync triggers | ||
// a total of (batchSize/StoreSyncBufferSize)-1 times. The -1 is because once we run the penultimate sync, it | ||
// clears space in the buffer for the last dispatches of the batch, which allows graceful shutdown to continue | ||
// immediately, without waiting for the last trigger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding additional clarification in the comment here.
This looks great!
this behavior of the manager surprised me
This is a smell for me; is there a way we could make this more explicit?
I worry about requiring a sequence of calls (i.e. Drain()
+ Stop()
) to properly shutdown the manager (which would make this more clear but would introduce some potential problems if not called correctly or at all).
4ac5e9b
to
0ca6cf1
Compare
085e994
to
8c8bd31
Compare
0ca6cf1
to
a2a1743
Compare
In investigating coder/internal#109 I noticed many of the notification tests are still using
time.Sleep
andrequire.Eventually
. This is an initial effort to start converting these to Quartz.One product change is to switch the
notifier
to use aTickerFunc
instead of a normal Ticker, since it allows the test to assert that a batch process is complete via the QuartzMock
clock. This does introduce one slight behavioral change in that the notifier waits the fetch interval before processing its first batch. In practice, this is inconsequential: no one will notice if we send notifications immediately on startup, or just a little later.But, it does make a difference to some tests, which are fixed up here.