Skip to content

feat(coderd/database): track user status changes over time #16019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Jan 13, 2025

Conversation

SasSwart
Copy link
Contributor

@SasSwart SasSwart commented Jan 3, 2025

RE: #15740, #15297

In order to add a graph to the coder frontend to show user status over time as an indicator of license usage, this PR adds the following:

  • a new api.insightsUserStatusCountsOverTime endpoint to the API
  • which calls a new GetUserStatusCountsOverTime query from postgres
  • which relies on two new tables user_status_changes and user_deleted
  • which are populated by a new trigger and function that tracks updates to the users table

The graph itself will be added in a subsequent PR

Copy link
Member

@Emyrk Emyrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit, can we use the word Chart instead of Graph in various comments when referring to the frontend display?

For a moment I was wondering why we needed a graph data structure.


On deleting a user more than once. We should enforce this at the db_schema level to not allow it if it's going to break our logic. An index could be made on the user_deleted(user_id).

Comment on lines 24 to 28
CREATE TABLE user_deleted (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
user_id uuid NOT NULL REFERENCES users(id),
deleted_at timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we ever be able to "un-delete" a user? Would is be easy to implement this to handle that case if we ever support it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we don't implement this as a user status change? Perhaps as an additional field on the table or modify the enum to include deleted. Wrt to changing the enum, it doesn't make much sense to me to differentiate between e.g. deleted/active or deleted/dormant users (or whatever the possibilities are), so changing it makes sense IMO.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true. I assume we can still make an index with a conditional expression on status == deleted. We might want to have some trigger or something to prevent "un-deletion" if we want to prevent that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered these options, but avoided them simply because they expand the scope beyond a reasonable time cost for this feature. We could relatively easy migrate from this to a space where deleted is just another user status.

I'm trying to remain agnostic of the massive rabbit-hole shaped question of what the business rule should be around un/re-deletion.

The trigger and function right now would handle un/re-deletion correctly, but the query would break as identified by @Emyrk in another comment below.

I would prefer that we stick to the current implementation because it is good enough and I can't find anywhere that we support un/re-deletion right now. If y'all would like to explicitly request a specific change I'd be happy to consider it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SasSwart can we open an issue to address un-deletion and mention this query if we decide to support it?

Just to track it

Comment on lines +798 to +799
WHERE changed_at > @start_time::timestamptz
AND changed_at < @end_time::timestamptz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is inclusive, should it be?

Suggested change
WHERE changed_at > @start_time::timestamptz
AND changed_at < @end_time::timestamptz
WHERE changed_at >= @start_time::timestamptz
AND changed_at <= @end_time::timestamptz

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's covered by the initial and final union.

Personally I prefer start_time inclusive, end_time exclusive though, as most of the time that mirrors the intuitive understanding of data. Say we want to view Monday to Sunday (1 week), we ultimately want Monday 00:00 -> Sunday 23:59 but it's simplest to request this as Monday 00:00 -> Monday 00:00 instead of subtracting an arbitrary unit of time or using an end-of-day function. Alternatively, if we include Monday 00:00 at the end, we include data from the beyond the range.

Copy link
Contributor Author

@SasSwart SasSwart Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's covered by the initial and final union.

Correct.

Monday 00:00 -> Monday 00:00 instead of subtracting an arbitrary unit of time or using an end-of-day function

I think the inclusive approach is good enough. There are higher impact changes to make. If you'd like me to change it, please ask me directly to do so and I will. My own opinion is that for how this chart is going to be displayed inclusive works well enough.

usc.new_status,
usc.changed_at
FROM user_status_changes usc
LEFT JOIN user_deleted ud ON usc.user_id = ud.user_id
Copy link
Member

@Emyrk Emyrk Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user is deleted twice in it's history (should not be possible, but would be nice to be durable against), then this join will duplicate the status changed rows for each deletion event.

We could make the table a subquery that returns 1 row, or we should enforce a user can only be deleted once.

Minimal example: https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/15847

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work and good job adding all that test coverage. A few suggestions and questions, but nothing major.

Comment on lines 24 to 28
CREATE TABLE user_deleted (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
user_id uuid NOT NULL REFERENCES users(id),
deleted_at timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we don't implement this as a user status change? Perhaps as an additional field on the table or modify the enum to include deleted. Wrt to changing the enum, it doesn't make much sense to me to differentiate between e.g. deleted/active or deleted/dormant users (or whatever the possibilities are), so changing it makes sense IMO.

Comment on lines +798 to +799
WHERE changed_at > @start_time::timestamptz
AND changed_at < @end_time::timestamptz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's covered by the initial and final union.

Personally I prefer start_time inclusive, end_time exclusive though, as most of the time that mirrors the intuitive understanding of data. Say we want to view Monday to Sunday (1 week), we ultimately want Monday 00:00 -> Sunday 23:59 but it's simplest to request this as Monday 00:00 -> Monday 00:00 instead of subtracting an arbitrary unit of time or using an end-of-day function. Alternatively, if we include Monday 00:00 at the end, we include data from the beyond the range.

rsc1.new_status
FROM dates_of_interest d
LEFT JOIN relevant_status_changes rsc1 ON rsc1.changed_at <= d.date
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried about the number of CTEs and performance, but I think this is fine for now and let's not prematurely optimize. Just raising this so you're aware that in some cases a CTE performs worse than a pure query with joins and subqueries. That's mainly because the result of a CTE lacks indexes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about performance too, but user count and user status changes is going to be way less data than say workspace builds. Like on the order of 1000s. 🤷

Might be worth a benchmark, but we don't have a good way to populate "large" datasets atm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it this way for legibility. Trying to write the optimal solution here made my own query inscrutable even to myself. I don't consider this to be on a particularly hot path. We do have metrics for this query. If it needs to be optimised, we can do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just raising this so you're aware that in some cases a CTE performs worse than a pure query with joins and subqueries

Also these CTEs were designed to reduce the amount of data being handled as early as possible. We use the existing indices while we have them to identify exactly the data we need ASAP and then once we need to join it all the hope is that we've brought down the volume low enough that it's performant regardless of the lack of indices on CTEs.

@SasSwart SasSwart force-pushed the jjs/dau-history-backend branch from 699ee8a to 8fca0c5 Compare January 9, 2025 10:48
@SasSwart SasSwart merged commit 4543b21 into main Jan 13, 2025
36 checks passed
@SasSwart SasSwart deleted the jjs/dau-history-backend branch January 13, 2025 11:08
@github-actions github-actions bot locked and limited conversation to collaborators Jan 13, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants