-
Notifications
You must be signed in to change notification settings - Fork 889
fix: prevent db deadlock when workspaces go dormant #10618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
workspaces.template_id = templates.id | ||
AND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review: already in the WHERE cond so 👍
log.Error(e.ctx, "get workspace autostart failed", slog.Error(err)) | ||
return nil | ||
return xerrors.Errorf("get workspace by id: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
review: IIRC the reason this was written this way was to avoid a single error causing an entire iteration to fail. Did the previous logic negatively impact the issue here? If not, I'd advise keeping the scope of the change small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only returns errors for the transaction not for the errgroup
. Seems like we were trying to commit the transaction regardless of whether we encountered an error or not which is bizarre. That shouldn't cause an entire evaluation iteration to abort early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha, I think I was operating on some stale cached assumptions about this code that are no longer true.
@@ -309,6 +312,84 @@ func TestWorkspaceAutobuild(t *testing.T) { | |||
require.True(t, ws.LastUsedAt.After(lastUsedAt)) | |||
}) | |||
|
|||
// This test serves as a regression prevention for generating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍 We could rewrite using the new shiny dbfake but this can be a follow-up.
autobuild
to 10 so 10 connections could potentially be consumed leaving none free to handle the audit logging, effectively deadlocking the DB.The solution is obviously to not do non-transaction DB operations from within a transaction. This was not an intentional implementation decision.
We could potentially add a timeout to the context we use in the transactions we open in
autobuild
but if a similar bug were to occur it would just provide temporary relief.I'm a bit hesitant on adding the test for this since it's sort of easy for it to produce a false-positive. If we want to retain it I'll rewrite it to use our new
dbfake
implementation.fixes https://github.com/coder/customers/issues/314