Skip to content

bug: GetManagedAgentCount is called > 100k times per day and creates lots of DB load #943

@cstyan

Description

@cstyan

AFAICT this is now one of our most expensive queries, at least over the last 7d.

This query is called via the license entitlements path. As it's currently written, the underlying SQL query has to do a sequential scan of all workspace builds for a given time range (depends on how we apply license constraints), which in the case of a 1y time range is ~70k rows. It then filters out any builds which do not have has_ai_task = true and transition = start.

IIUC this query is used to determine which workspaces are really being used for AI tasks as opposed to being used by individual users. I imagine they're attributed somewhat differently for license usage in that case. Anyways, we can see with the following query that the AI task usage internally is growing:

SELECT 
    DATE_TRUNC('month', created_at) as month_start,
    COUNT(*) as total_builds,
    COUNT(*) FILTER (WHERE has_ai_task = true) as ai_task_builds,
    ROUND(
        COUNT(*) FILTER (WHERE has_ai_task = true) * 100.0 / COUNT(*), 
        2
    ) as ai_task_percentage,
    -- First and last AI task build in the month
    MIN(created_at) FILTER (WHERE has_ai_task = true) as first_ai_task_build,
    MAX(created_at) FILTER (WHERE has_ai_task = true) as last_ai_task_build
FROM workspace_builds 
WHERE created_at >= NOW() - INTERVAL '6 months'
GROUP BY DATE_TRUNC('month', created_at)
HAVING COUNT(*) > 0
ORDER BY month_start DESC;

      month_start       | total_builds | ai_task_builds | ai_task_percentage |      first_ai_task_build      |      last_ai_task_build  
     
------------------------+--------------+----------------+--------------------+-------------------------------+--------------------------
-----
 2025-08-01 00:00:00+00 |         2959 |            360 |              12.17 | 2025-08-01 12:49:48.319719+00 | 2025-08-26 20:57:42.13888
2+00
 2025-07-01 00:00:00+00 |         9245 |            357 |               3.86 | 2025-07-01 06:33:05.881854+00 | 2025-07-31 17:52:52.95627
3+00
 2025-06-01 00:00:00+00 |         5188 |            226 |               4.36 | 2025-06-24 14:16:47.489334+00 | 2025-06-30 20:32:10.38735
5+00
 2025-05-01 00:00:00+00 |         3932 |              0 |               0.00 |                               | 
 2025-04-01 00:00:00+00 |         2906 |              0 |               0.00 |                               | 
 2025-03-01 00:00:00+00 |         2059 |              0 |               0.00 |                               | 
 2025-02-01 00:00:00+00 |          272 |              0 |               0.00 |                               | 

For this reason, it may not make sense to create a partial index where we include the same filters on has_ai_task and transition as we do for the actual query, as it would likely grow quite a bit (for the last 1+ years of workspace builds in dogfood ~1% have AI tasks vs so far in august > 12% do). That indexing would also be less reusable by other queries.

Instead, we could create an index on transition, has_ai_task, created_at, all of which are going to speed up this query. transition and created_at are commonly used for other existing queries, we have at least 25 queries that filter on transition=<something> and double (or more) that order/filter on created_at.

So I propose we add the following index:

CREATE INDEX CONCURRENTLY idx_workspace_builds_transition_ai_task_created_at
ON workspace_builds (transition, has_ai_task, created_at);

One alternative may be to turn the current query into a materialized view that we update on the same cadence as the current license loop (10 minutes afaict) and offload the processing expense to that one query, and then all queries executed by the license entitlements Go code simply hit the view instead. One thing to keep in mind is that (though the index will do this as well), the materialized view would need to be run against the entire valid time range of workspace builds since I think we have a range of license validities, such as 30d trials and 1y licenses.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions