-
Notifications
You must be signed in to change notification settings - Fork 887
fix: fix workspace status filter returning more statuses that requested #7732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
var genCtx = dbauthz.As(context.Background(), rbac.Subject{ | ||
ID: "owner", | ||
Roles: rbac.Roles(must(rbac.RoleNames{rbac.RoleOwner()}.Expand())), | ||
Groups: []string{}, | ||
Scope: rbac.ExpandableScope(rbac.ScopeAll), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so dbgen can work on dbauthz wrapped databases for unit testing.
latest_build.transition = 'delete'::workspace_transition | ||
latest_build.transition = 'delete'::workspace_transition AND | ||
-- If the error field is null, the status is 'failed' | ||
latest_build.error IS NULL | ||
|
||
WHEN @status = 'deleting' THEN | ||
latest_build.completed_at IS NOT NULL AND | ||
latest_build.completed_at IS NULL AND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the only fix needed for prod. All other fixes were just for the dbfake
|
||
WHEN @status = 'deleting' THEN | ||
latest_build.completed_at IS NOT NULL AND | ||
latest_build.completed_at IS NULL AND |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, it's really difficult to read this query and convince yourself that these WHERE clauses are
a) non-overlapping
b) match the golang implementation of the status computation
What do you think about the strategy of introducing a stored procedure that computes the status? Then the filter logic in this query becomes a trivial WHERE clause, and we can use the DB-computed status in prod, rather than computing the status in the API layer.
Annoyingly, we'd still have to keep the golang implementation around for the dbfake, but they could both be written in an imperative style and so it will be much easier to verify the algorithm matches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it was a headache. Are we using stored procedures anywhere else, or would this be the first case? I'm open to anything that makes this easier.
Ideally though, it would be in the next PR so this bug is fixed in main. The major contribution from this PR is the test imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have triggers for deletion of tokens stored in the DB. This would be the first computed column, though.
Fine to follow up in another PR.
}, database.ProvisionerJob{ | ||
StartedAt: sql.NullTime{Time: time.Now().Add(time.Second * -2), Valid: true}, | ||
CompletedAt: sql.NullTime{Time: time.Now(), Valid: true}, | ||
}, database.WorkspaceTransitionDelete) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests are great and all, but they are not combinatorially exhaustive on null vs non-null for all fields. This is another reason I think a stored procedure for the status is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is definitely not exhaustive. The annoying thing too is when we prepulate this much data (which isn't even that much), the postgres test gets so slow. So I skipped it in CI, meaning this bug could easily pop up again silently 😢.
I was debating about putting in a batch insert SQL query outside our typical db interface and see if it is faster. If we go the stored procedure route, might be worth investigating how to make the postgres test run in a reasonable amount of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the slowness is related to how we have to create the job, then acquire it, then fail/complete it. Acquiring in particular might be slow because of the locking we do.
In testing we could have some queries that directly insert the job in the state we want it. Not exposed on the main Store
interface, but available if you type assert to a test type.
@Emyrk I tried this out locally and failed workspaces do not show up with the deleted status filter - awesome. |
@Kira-Pilot could it be "deleting"? |
Ah, so that is a different bug actually.
coder/coderd/database/queries/workspaces.sql Lines 108 to 109 in cf35412
What I fixed was the list of statuses. So if you search So that is expected behavior. |
Fixes the postgres condition to not make failed workspaces show up with the deleted status filter.
Also fixed the dbfake to match the postgres implementation. It was acting a lot differently.
See #7685 (comment)
Fixes: #7685
Testing
Made a unit test to test this logic. Doing it on a real DB is slow, so going to skip in CI 😢. Took a bit of work to get the test setup.