-
Notifications
You must be signed in to change notification settings - Fork 881
feat: implement thin vertical slice of system-generated notifications #13537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @dannykopping and the rest of your teammates on |
a1b4e2b
to
d2a947a
Compare
1cb776a
to
25636ec
Compare
dc359de
to
2e16d76
Compare
d8f51ce
to
284b781
Compare
1c4046c
to
538bc42
Compare
284b781
to
9fd8527
Compare
538bc42
to
438bcba
Compare
438bcba
to
c1a3010
Compare
9fd8527
to
8b7fa18
Compare
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments below but I don't need to review again. 👍
require.Eventually(t, func() bool { return santa.naughty.Load() == 1 && santa.nice.Load() == 2 }, testutil.WaitMedium, testutil.IntervalFast) | ||
|
||
// Stop the manager which forces an update of buffered updates. | ||
require.NoError(t, mgr.Stop(ctx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a race condition here, which is that above we wait for notifications to be sent to santa, but sending to santa happens before sending on the success
and failure
channels. So, triggering stop and the final bulk update might miss one or more success/failure updates and cause this test to flake.
Quartz can help here. The manager's bulk update and the notifier's processing loop depend on Tickers. If you convert them to use a quartz.TickerFunc
, then you can control the mock clock and trigger the processing loop and bulk update, and wait for them to finish.
This lets us get rid of the Eventually
that checks for dispatched notifications (Quartz allows you to wait until the tick processing is done, so there is no race in just asserting santa got what we expected).
It also means we can get rid of the Eventually
below for the same reason --- we can wait for bulk processing to complete before asserting the interceptor
's state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a follow-up item to implement Quartz in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 613e074
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! It's a pity we couldn't get this PR in smaller chunks.
I left a few comments, please forgive if some of them were already discussed. I didn't track the whole thread.
|
||
// WebhookPayload describes the JSON payload to be delivered to the configured webhook endpoint. | ||
type WebhookPayload struct { | ||
Version string `json:"_version"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: that needs to be published in our Coder docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, will add as a follow-up.
return ctx.Err() | ||
default: | ||
logger.Warn(ctx, "message dispatch failed", slog.Error(err)) | ||
failure <- newFailedDispatch(n.id, msg.ID, err, retryable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still racy. What you want is
select {
case <-ctx.Done():
return ctx.Err()
case failure <- newFailedDispatch(n.id, msg.ID, err, retryable):
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that default
will only be called if there are no cases which can proceed.
https://go.dev/ref/spec#Select_statements
Execution of a "select" statement proceeds in several steps:
...
2. If one or more of the communications can proceed, a single one that can proceed is chosen via a uniform pseudo-random selection. Otherwise, if there is a default case, that case is chosen. ...
Am I misunderstanding something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that default
only executes if it is the only case that can succeed, but then writing to failure
is a TOCTOU race.
Signed-off-by: Danny Kopping <danny@coder.com>
Signed-off-by: Danny Kopping <danny@coder.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I smoke-tested enabling the experiment with no further configuration via docker-compose, and didn't see anything immediately break.
@spikecurtis since you appear to be on PTO this week, and since I have made a reasonable effort to address all of your comments, I'm going to merge this. I've resolved all of the trivial comments and kept the substantive ones open to prompt further discussion when you return. This is behind an experiment and is reasonably self-contained, so I feel merging this is quite safe. Thanks very much for your thoughtful reviews! |
Spike is on PTO, see #13537 (comment)
NOTE: this is a stacked PR, please review #13536 as well.
This introduces a new subsystem into Coder to enable notifications (see milestone). In this PR, I have implemented a very thin vertical slice of this functionality:
This functionality is also hidden behind an experiment.
The main entities in this package are:
Notifier
is responsible for:notification_messages
entriesManager
of successful/failed deliveriesManager
is a controller which is responsible for:Notifier
s)Notifier
s' lifecycles (run/shutdown)The queue has been designed to be safe for concurrent use. This can either mean multiple
coderd
instances running side-by-side and/or eachcoderd
having multiple queue consumers.