-
Notifications
You must be signed in to change notification settings - Fork 875
fix: make GetWorkspacesEligibleForTransition return even less false positives #15594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7e5652b
to
a982168
Compare
0172c5b
to
9725ec7
Compare
Removed myself as reviewer; Cian & Mathias have much more context here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obligatory reminder to double-check migration number before merge 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach looks fine on the surface level, but I'm a bit worried that I'm not seeing any invalidation for "next start at". I.e. changes that mean we should disregard or recompute it.
For instance, it could be set in the future, but a change is done to start it sooner. Will it propagate?
Just a few thoughts that pop into my mind:
- What happens if a template scheduling options are changed?
- What happens if workspace settings are updated (e.g. autostart schedule)?
- What happens if a workspace is updated to a new template version
- What happens if a user is disabled after the nextStartAt has been set (this might be OK, just mentioning for completeness)
coderd/database/migrations/000277_workspace_next_start_at.up.sql
Outdated
Show resolved
Hide resolved
It appears I completely forgot to consider the template being updated, my bad, will resolve this! Thanks for spotting 👍
I've handled this (as far as I can tell), the
The query has |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Seeing it, I'm not sure how I feel about the db trigger approach, but I like the enforcement and think it's worth giving it a shot. I think we still have some possibilities for data inconsistency without a trigger on the templates table, though.
// We do not know if workspace with a zero next start is eligible | ||
// for autostart, so we accept this false-positive. This can occur | ||
// when a coder version is upgraded and next_start_at has yet to | ||
// be set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a new known case, reset by db trigger. Should we implement the trigger in memory db as well? (I know it's probably going away soon, but still, might cause flakes?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it'd cause flakes, but instead cause dbmem tests to fail when postgres tests do not. Happy to add it though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I've checked dbmem.go
for how we emulate triggers and I'm not sure there is a good way to do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no good way to do it, just ad hoc code in various functions 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that's unfortunate.
I think attempting to emulate this trigger isn't worth it then. If someone writes a test that is dependent on the trigger (which they ideally shouldn't) then they'll have to disable it when running under dbmem.go
.
CREATE TRIGGER trigger_update_workspaces_schedule | ||
AFTER UPDATE ON workspaces | ||
FOR EACH ROW | ||
EXECUTE PROCEDURE nullify_workspace_next_start_at(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we also need a trigger for changes to a template time-til-dormant? Or do I misunderstand that interaction?
My thought process: if the time-til-dormant is reduced, we might've previously calculated a next start for the workspace which might no longer be valid because dormancy was sped up.
It's probably a fringe edge-case, but might be hard to debug customer reports if they run into it.
I'm thinking this applies to template allowed days changing as well. Tomorrow might've been blocked but is now unblocked yet we computed the autostart for the day after.
I'm wondering if there are other such interactions between template and workspace we should cover as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree with you here, I'll try and think if there are any other similar interactions 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking we also need a trigger for changes to a template time-til-dormant? Or do I misunderstand that interaction?
My thought process: if the time-til-dormant is reduced, we might've previously calculated a next start for the workspace which might no longer be valid because dormancy was sped up.
After digging into this a little bit more, I don't think this is an issue.
I had forgotten to include dormant_at IS NOT NULL
in autostart
part of the GetWorkspacesEligibleForTransition
SQL query, so I've added that now (it already existed in the coderd lifecycle executor side).
Instead of handling changes to time_til_dormancy
, I've instead modified the existing trigger to check if the workspace is dormant. When a workspace is marked as dormant, the workspace's next_start_at
is made NULL.
I'm thinking this applies to template allowed days changing as well. Tomorrow might've been blocked but is now unblocked yet we computed the autostart for the day after.
This edge case has been handled in the EnterpriseTemplateScheduleStore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock.
Relates to #15082
Further to #15429, this reduces the amount of false-positives returned by the 'is eligible for autostart' part of the query. We achieve this by calculating the 'next start at' time of the workspace, storing it in the database, and using it in our
GetWorkspacesEligibleForTransition
query.The prior implementation of the 'is eligible for autostart' query would return all workspaces that at some point in the future might be eligible for autostart. This now ensures we only return workspaces that should be eligible for autostart.
We also now pass
currentTick
instead oft
to theGetWorkspacesEligibleForTransition
query as otherwise we'll have one round of workspaces that are skipped byisEligibleForTransition
due tocurrentTick
being a truncated version oft
.