fix: reduce excessive logging when database is unreachable #17363

dannykopping · 2025-04-11T14:11:05Z

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

coderd/coderd.go

coderd/tailnet_test.go

codersdk/database.go

dannykopping · 2025-04-11T14:23:42Z

provisionerd/provisionerd.go

-		p.acquireAndRunOne(client)
+		err := p.acquireAndRunOne(client)
+		if err != nil && ctx.Err() == nil { // Only log if context is not done.
+			p.opts.Logger.Debug(ctx, "retrying to acquire job", slog.F("retry_in_ms", retrier.Delay.Milliseconds()), slog.Error(err))


Self-review: acquireAndRunOne already logs its own warning - specifically the provisionerd was unable to acquire job one is logged when the db is unreachable - so Debug is what felt most appropriate to me.

…ailnet control protocol dialer Signed-off-by: Danny Kopping <dannykopping@gmail.com>

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

tailnet/controllers.go

coderd/tailnet.go

provisionerd/provisionerd.go

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

coderd/tailnet.go

coderd/coderd.go

coderd/tailnet_test.go

spikecurtis · 2025-04-14T08:26:28Z

coderd/workspaceagents_test.go

+	// This needs to be done *after* the server "starts" otherwise it'll fail straight away when trying to initialize.
+	pdb.MarkUnhealthy()
+
+	// Then: the tailnet controller will continually try to dial the coordination endpoint, exceeding its context timeout.


This comment is wrong, we don't continually retry because DialAgent only waits until we hit a dial error. Once the first error is returned the test is complete and we tear down the context.

Furthermore, I don't think the SDK DialAgent is really the thing that you care about testing here. It doesn't handle the retries anyways, tailnet does. Maybe simplify this and just use the WebsocketDialer and ensure it returns an error.

provisionerd/provisionerd.go

This has a downside of losing the details of the received error, but in this case it seems justified since we need to conditionalize responses based on codersdk.ErrDatabaseNotReachable Signed-off-by: Danny Kopping <dannykopping@gmail.com>

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

coderd/coderd.go

coderd/tailnet.go

coderd/tailnet_test.go

dannykopping · 2025-04-16T10:39:20Z

/cherry-pick release/2.21

dannykopping · 2025-04-16T13:31:19Z

/cherry-pick release/2.20

github-actions bot assigned dannykopping Apr 11, 2025

Backoff acquiring provisioner jobs when the database is unreachable

3f95841

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

dannykopping force-pushed the dk/17045 branch from 15588ef to cb302b6 Compare April 11, 2025 14:13

dannykopping changed the title ~~Reduce excessive logging when database is unreachable~~ fix: reduce excessive logging when database is unreachable Apr 11, 2025

dannykopping force-pushed the dk/17045 branch from cb302b6 to cf6af33 Compare April 11, 2025 14:20

dannykopping commented Apr 11, 2025

View reviewed changes

Checking for, and specifically handling, database unreachability in t…

0448a74

…ailnet control protocol dialer Signed-off-by: Danny Kopping <dannykopping@gmail.com>

dannykopping force-pushed the dk/17045 branch from cf6af33 to 0448a74 Compare April 11, 2025 14:29

Add len checks for returned resources

0136b70

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

dannykopping marked this pull request as ready for review April 11, 2025 14:50

dannykopping requested a review from spikecurtis April 11, 2025 14:50

johnstcn reviewed Apr 11, 2025

View reviewed changes

tailnet/controllers.go Outdated Show resolved Hide resolved

coderd/tailnet.go Outdated Show resolved Hide resolved

Merge branch 'main' of github.com:/coder/coder into dk/17045

8d94c3c

dannykopping commented Apr 14, 2025

View reviewed changes

provisionerd/provisionerd.go Show resolved Hide resolved

Reset retrier after each successful job acquisition

f92e852

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

spikecurtis reviewed Apr 14, 2025

View reviewed changes

dannykopping added 3 commits April 14, 2025 09:35

Replace DatabaseHealthcheckFn with interface

68867da

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

Review suggestions

6f60cbc

Signed-off-by: Danny Kopping <dannykopping@gmail.com>

dannykopping requested review from spikecurtis and johnstcn April 14, 2025 13:19

johnstcn reviewed Apr 14, 2025

View reviewed changes

coderd/coderd.go Show resolved Hide resolved

johnstcn reviewed Apr 14, 2025

View reviewed changes

coderd/tailnet.go Show resolved Hide resolved

spikecurtis approved these changes Apr 15, 2025

View reviewed changes

coderd/tailnet_test.go Show resolved Hide resolved

johnstcn approved these changes Apr 15, 2025

View reviewed changes

dannykopping merged commit 0b18e45 into main Apr 15, 2025
32 checks passed

dannykopping deleted the dk/17045 branch April 15, 2025 08:55

github-actions bot locked and limited conversation to collaborators Apr 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: reduce excessive logging when database is unreachable #17363

fix: reduce excessive logging when database is unreachable #17363

Uh oh!

dannykopping commented Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dannykopping Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spikecurtis Apr 14, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dannykopping commented Apr 16, 2025

Uh oh!

dannykopping commented Apr 16, 2025

Uh oh!

Uh oh!

fix: reduce excessive logging when database is unreachable #17363

fix: reduce excessive logging when database is unreachable #17363

Uh oh!

Conversation

dannykopping commented Apr 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dannykopping Apr 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

spikecurtis Apr 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dannykopping commented Apr 16, 2025

Uh oh!

dannykopping commented Apr 16, 2025

Uh oh!

Uh oh!