Skip to content

feat(coderd/database): keep only 1 day of workspace_agent_stats after rollup #12674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 22, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
delete in batches
  • Loading branch information
mafredri committed Apr 9, 2024
commit 18b45c23c04d289091817c49a138c4abbf1dcd20
24 changes: 21 additions & 3 deletions coderd/database/dbmem/dbmem.go
Original file line number Diff line number Diff line change
Expand Up @@ -1526,6 +1526,15 @@ func (q *FakeQuerier) DeleteOldWorkspaceAgentStats(_ context.Context) error {
)
FROM
template_usage_stats
)
AND created_at < (
-- Delete at most in batches of 3 days (with a batch size of 3 days, we
-- can clear out the previous 6 months of data in ~60 iterations) whilst
-- keeping the DB load relatively low.
SELECT
COALESCE(MIN(created_at) + '3 days'::interval, NOW())
FROM
workspace_agent_stats
);
*/

Expand All @@ -1543,10 +1552,19 @@ func (q *FakeQuerier) DeleteOldWorkspaceAgentStats(_ context.Context) error {
}

var validStats []database.WorkspaceAgentStat
var batchLimit time.Time
for _, stat := range q.workspaceAgentStats {
if batchLimit.IsZero() || stat.CreatedAt.Before(batchLimit) {
batchLimit = stat.CreatedAt
}
}
if batchLimit.IsZero() {
batchLimit = time.Now()
} else {
batchLimit = batchLimit.AddDate(0, 0, 3)
}
for _, stat := range q.workspaceAgentStats {
fmt.Println(stat.CreatedAt, limit)
if stat.CreatedAt.Before(limit) {
fmt.Println("delete")
if stat.CreatedAt.Before(limit) && stat.CreatedAt.Before(batchLimit) {
continue
}
validStats = append(validStats, stat)
Expand Down
9 changes: 9 additions & 0 deletions coderd/database/queries.sql.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions coderd/database/queries/workspaceagentstats.sql
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a maximum number of rows that could be deleted here?
Would it make sense to add a limit on the number of rows deleted?
My thinking is that we could choose a reasonably large upper limit and hopefully converge after a few iterations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a bad idea. Limiting would only be relevant for the first time(s) this query runs, after that there's no point but if we want to avoid the initial performance penalty of doing a huge delete, that would be sensible. There's no limit for deletes though so we could rely on selecting the min(created_at) and only delete 1 (or N days) of data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinking deleting the oldest day of data every interval will probably work fine here. The magnitude of that data should scale with the size of the deployment. We'll have to assume that the database is correctly sized to handle that level of deletes, but if not that is likely a sizing issue with the database itself in relation to the rest of the deployment.

Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,15 @@ WHERE
)
FROM
template_usage_stats
)
AND created_at < (
-- Delete at most in batches of 3 days (with a batch size of 3 days, we
-- can clear out the previous 6 months of data in ~60 iterations) whilst
-- keeping the DB load relatively low.
SELECT
COALESCE(MIN(created_at) + '3 days'::interval, NOW())
FROM
workspace_agent_stats
);

-- name: GetDeploymentWorkspaceAgentStats :one
Expand Down
Loading