-
Notifications
You must be signed in to change notification settings - Fork 874
fix: reduce excessive logging when database is unreachable #17363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
provisionerd/provisionerd.go
Outdated
p.acquireAndRunOne(client) | ||
err := p.acquireAndRunOne(client) | ||
if err != nil && ctx.Err() == nil { // Only log if context is not done. | ||
p.opts.Logger.Debug(ctx, "retrying to acquire job", slog.F("retry_in_ms", retrier.Delay.Milliseconds()), slog.Error(err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-review: acquireAndRunOne
already logs its own warning - specifically the provisionerd was unable to acquire job
one is logged when the db is unreachable - so Debug
is what felt most appropriate to me.
…ailnet control protocol dialer Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
coderd/workspaceagents_test.go
Outdated
// This needs to be done *after* the server "starts" otherwise it'll fail straight away when trying to initialize. | ||
pdb.MarkUnhealthy() | ||
|
||
// Then: the tailnet controller will continually try to dial the coordination endpoint, exceeding its context timeout. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is wrong, we don't continually retry because DialAgent
only waits until we hit a dial error. Once the first error is returned the test is complete and we tear down the context.
Furthermore, I don't think the SDK DialAgent
is really the thing that you care about testing here. It doesn't handle the retries anyways, tailnet
does. Maybe simplify this and just use the WebsocketDialer
and ensure it returns an error.
This has a downside of losing the details of the received error, but in this case it seems justified since we need to conditionalize responses based on codersdk.ErrDatabaseNotReachable Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
Signed-off-by: Danny Kopping <dannykopping@gmail.com>
/cherry-pick release/2.21 |
/cherry-pick release/2.20 |
Fixes #17045