Skip to content

fix(coderd): mark sub agent deletion via boolean instead of delete #18411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mafredri
Copy link
Member

@mafredri mafredri commented Jun 17, 2025

Since deletion of data is uncommon in our database, the introduction of sub agents and their deletion introduced issues with foreign key assumptions in coder/internal#685. We could have only addressed the specific case by allowing cascade deletion of stats as well as handling in the stats collector, but it's unclear how many more such edge-cases we would've run into.

For this reason, we decided to mark the rows as deleted via boolean instead, and filter them out in all queries that seem relevant so that coderd doesn't see the deleted agents.

Testing is covered by updating the relevant agentapi tests as well as adding an antagonist deleted agent to dbgen which automatically gets applied to all tests.

Fixes coder/internal#685

Note that there are some queries that read workspace_agents that were not updated with deletion handling, but each one was analyzed and deemed unnecessary because of how they are used, doing so may introduce more harm than good. On that note, many of these queries can return results for non-latest builds which essentially means, "deleted" agents.

@mafredri mafredri force-pushed the mafredri/fix-subagent-deletion branch from a473f80 to db5700a Compare June 17, 2025 15:38
@mafredri mafredri force-pushed the mafredri/fix-subagent-deletion branch from db5700a to 7d73585 Compare June 18, 2025 10:29

// Add a test antagonist. For every agent we add a deleted sub agent
// to discover cases where deletion should be handled.
subAgt, err := db.InsertWorkspaceAgent(genCtx, database.InsertWorkspaceAgentParams{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: This may be a bit unusual, but it already helped me catch an issue with the workspace_prebuilds VIEW.

Ideally this potential cause would somehow be propagated to the user in case of test failure, but not sure how that could be done. I'll add a log entry to help with discovery.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot! I do agree that it could get confusing to troubleshoot, but the log message in combination with dbtestutil.DumpOnFailure should help. You could also potentially set some relevant fields of the sub-agent so that its purpose is clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this? 😆 bc370a8 (#18411)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SATISFACTORY

@mafredri mafredri marked this pull request as ready for review June 18, 2025 12:06
Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spot-checked the migrations and the updated views/triggers look OK to me.

Would like a second pair of eyes on this for approval in case there's something I missed.


// Add a test antagonist. For every agent we add a deleted sub agent
// to discover cases where deletion should be handled.
subAgt, err := db.InsertWorkspaceAgent(genCtx, database.InsertWorkspaceAgentParams{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot! I do agree that it could get confusing to troubleshoot, but the log message in combination with dbtestutil.DumpOnFailure should help. You could also potentially set some relevant fields of the sub-agent so that its purpose is clear.

Comment on lines +360 to +367
UPDATE
workspace_agents
SET
deleted = TRUE
WHERE
id = $1
AND parent_id IS NOT NULL
AND deleted = FALSE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 literally impossible to delete a 'top-level' agent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obligatory reminder to check migration number!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Devcontainers: As a side-effect of deleting agents, it's possible for the stats collector to become sad
2 participants