Roll up workspace and app stats into a separate table and use it for querying insights and deployment DAUs

## Problem

Currently we store raw agent stats for 6 months and app stats indefinitely. On dev.coder.com, the agent stats table takes up 6.3 GB of storage and it's a fairly small deployment. We've managed to optimize our queries fairly well, but the raw data is still large and slow to query. App stats are not as bad as they're only written on a as-needed basis, compressing time-ranges into a single row, as possible.

To make insights more performant, and allow us to store them for extended periods of time (years), we need to store them in a format that can be query efficiently and doesn't use excessive amounts of storage.

## Limitations

Template parameter usage is also a part of insights, but is not covered in this proposal.

## Requirements

- Generate reports on usage at a certain granularity (e.g. daily, weekly, monthly)
- Store usage data in a database at a granularity that allows for timezone adjustments (e.g. 15m, 30m, 1h)
  - 15m covers most timezones, but has 4x the storage requirements of 1h
- Store app usage by template and user (user is also needed for unique counts across templates)
- Compute total usage by user
- Store latency for user by template

## Use cases

- Deployment DAUs
- Template insights

## Design

We will roll up the data into a format that is queryable and more efficient to store. The rollup will happen on an hourly or daily schedule. If we require live-data, we can utilize the same rollup queries to compute this from the raw data. Once the data is rolled up, we are free to delete data from either `workspace_agent_stats` or `workspace_app_stats`, however, there is no evidence that we need to delete from `workspace_app_stats` at this time.

We will store the data in a table called `template_insights`. This table will have the following columns:

- ~~id (bigserial)~~
- start_time (timestamptz)
- end_time (timestamptz)
- template_id (uuid)
- user_id (uuid)
- latency_ms (smallint)
- usage_mins (smallint)
- ssh_mins (smallint)
- vscode_mins (smallint)
- jetbrains_mins (smallint)
- webterminal_mins (smallint)
- app_usage_mins (jsonb)
  - object {name1: int, name2: int, ...}

Note that we're storing the minutes so that we can properly account for usage that is smaller than the configured rollup range (be it 15m or 1h). The `usage_mins` field stores the accurate total usage within the timeframe (let's say you use ssh for 10 mins, then disconnect and start using code-server for 5 mins, `usage_mins` will reflect this and show 15 mins).

With this format, if we assume that a user uses 1 template at a time, and that we have 1000 users who each use coder 8 hours/work day for a year, we would have 1000 * 8 * 365 = 2,920,000 rows. If we store this at 30m granularity, we would have 2,920,000 * 2 = 5,840,000 rows. This is a manageable amount of data to store and query.

Data type storage sizes are:
- ~~bigserial: 8 bytes~~
- timestamptz: 8 bytes
- uuid: 16 bytes
- smallint: 2 bytes
- jsonb: ~35 bytes/app (depending on length of name)

```
select pg_column_size('{"code-server": 60}'::jsonb);
=> 36
```

There are ways to optimize `app_usage` storage requirements further, but this is a good starting point. Furthermore, if we don't need to query the JSON, or don't mind the compute overhead of live conversion, we can simply store it as `json`, greatly reducing the storage requirement.

```
select pg_column_size('{"code-server":60}'::json);
=> 22
```

With this design, storage size will approximate: 8 + 8 + 16 + 2 + 2 + 2 + 2 + 2 + 2 + 36 = 80 bytes/row

If we store 5,840,000 rows, we would need 5,840,000 * 80 = ~0.5 GB of storage. This is seems acceptable for one year of data for 1000 users. If a customer feels this is too much, we can make the retention configurable.

Note: These calculations don't take into account the the storage space used by whatever indexes we end up creating.

Related: https://github.com/coder/customers/issues/384, https://github.com/coder/coder/issues/9301

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Roll up workspace and app stats into a separate table and use it for querying insights and deployment DAUs #12122

Problem

Limitations

Requirements

Use cases

Design

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Roll up workspace and app stats into a separate table and use it for querying insights and deployment DAUs #12122

Description

Problem

Limitations

Requirements

Use cases

Design

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions