-
Notifications
You must be signed in to change notification settings - Fork 886
feat(coderd): add user latency and template insights endpoints #8519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(coderd): add user latency and template insights endpoints #8519
Conversation
8458e66
to
c728857
Compare
c728857
to
aaadc6a
Compare
Status update: Base logic is mostly done, however, some timeframe edge cases are not yet handled and there are a few open questions. Tests also need to be added.
Questions:
Issues:
Current response samples:
{
"report": {
"start_time": "2023-07-18T00:00:00Z",
"end_time": "2023-07-20T00:00:00Z",
"template_ids": [
"bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
],
"users": [
{
"template_ids": [
"bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
],
"user_id": "a6d24688-2c59-423a-a8b0-41d56331d4a1",
"username": "admin",
"latency_ms": {
"p50": 0.327,
"p95": 0.5352
}
},
{
"template_ids": [],
"user_id": "e7217d35-9ce1-4c98-ae09-c6c9ad6f5686",
"username": "member",
"latency_ms": null
},
{
"template_ids": [
"bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
],
"user_id": "d9fb2aa8-fedb-4c76-a759-27e4e4c5d550",
"username": "woot",
"latency_ms": {
"p50": 0.59,
"p95": 0.59
}
}
]
}
}
{
"report": {
"start_time": "2023-07-18T00:00:00Z",
"end_time": "2023-07-20T00:00:00Z",
"template_ids": [
"bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
],
"active_users": 2,
"apps_usage": [
{
"template_ids": [
"bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
],
"type": "builtin",
"display_name": "Visual Studio Code",
"slug": "vscode",
"icon": "/icons/code.svg",
"seconds": 0
},
{
"template_ids": [
"bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
],
"type": "builtin",
"display_name": "JetBrains",
"slug": "jetbrains",
"icon": "/icons/intellij.svg",
"seconds": 0
},
{
"template_ids": [
"bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
],
"type": "builtin",
"display_name": "Web Terminal",
"slug": "reconnecting-pty",
"icon": "/icons/terminal.svg",
"seconds": 4800
},
{
"template_ids": [
"bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
],
"type": "builtin",
"display_name": "SSH",
"slug": "ssh",
"icon": "/icons/terminal.svg",
"seconds": 1800
}
]
},
"interval_reports": [
{
"start_time": "2023-07-18T00:00:00Z",
"end_time": "2023-07-19T00:00:00Z",
"template_ids": [
"bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
],
"interval": "day",
"active_users": 2
},
{
"start_time": "2023-07-19T00:00:00Z",
"end_time": "2023-07-20T00:00:00Z",
"template_ids": [
"bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
],
"interval": "day",
"active_users": 1
}
]
} |
Thinking in terms of MVP, it is something we can iterate later. IMHO we should start with whole days, and exceptionally allow for the current day to be counted in hours (not 24h yet).
Alternatively, you can depend on the deployment time+TZ, and just share it over Insights API, so that UI can consider it.
DevOps people are probably aware of this issue, so they will be fine with any option :) Do you find any of them easier to implement?
cumulative/per user 👍 BUT... how will the API response look like if we are doing scale tests for 1000 users? AFAIR I asked about this before.
I would park it for later 👍 |
Agreed, I'll add the "today" exception so that it's possible for the site to show stable numbers for today.
Sure, we can do that. We'll have to remove TZ from the start_time/end_time request format as well.
Since we don't have knowledge of user TZ, let's go with the simple approach for now. 😄
I guess the scaletest numbers would be included, unless there's some special handling to not gather stats in this case? It's unclear to me whether or not we'd want to filter them out, will scale testing be a common/recurring practice on live deployments? |
IMHO we have to look at scaletesting from a different angle. Running load tests with traffic ~1000 users means that any customer can potentially create such a large deployment. Coder API must sustain this load too, and in this case, I'm mostly concerned about the response size for the Insights endpoint. |
The principle of least surprise would be that the FE specifies the timezone to be used, auto-detecting from the browser.
Cumulative across all users makes the most sense here, I think. Per-user averages are too widely skewed.
The scaletest command is hidden, so we should not expect it to be used regularly in the field.Th at having been said, all scaletest users and workspaces are prefixed with the string |
coderd/insights.go
Outdated
// Example: | ||
// - I want data from Monday - Friday | ||
// - I'm UTC+3 and the deployment is UTC+0 | ||
// - Do we select Monday - Friday in UTC+0 or UTC+3? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Friday UTC+3, using timezone from the incoming request. Coder admins won't necessarily know or care about the localtime of the server on which the deployment is running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One factor here is that two people looking at "the same data" may see different numbers. They may share screen captures and wonder why the data doesn't add up. An anchor like deployment TZ would help in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, showing the user's relative difference from the deployment TZ (or even just showing the GMT offset in the graph) would help indicate this.
Ah right, this mainly affects the
I'd be fine with this, it's easier to handle on the backend too (as-is currently) 😄, thoughts @mtojek?
Ah, I believe I've worded that too open ended. It's currently cumulative across all users, however, with per-user I meant group by, this e.g. comes into play in the following scenario: User A has two workspaces and a VSCode sessions open to both workspaces from 9am to 10am. User A contributes to cumulative stats with 1h of VSCode usage (vs 2h). Same if it's two VSCode sessions open in one workspace, etc. It's perhaps a weird line to draw, though, since the user can contribute with 1h vscode + 1h jetbrains + 1h terminal usage. The idea is anyway that a user can only focus on one app at a time.
I agree. 👍🏻 |
I wouldn't necessarily make that assumption ;-) |
I think it's a matter of making the data useful. For instance, take a deployment of one user with 8 workspaces based on one template. At the start of the day they launch a VSCode for each workspace and during the day keeps hopping between editor windows. Given the current implementation, we will see 8h of VSCode usage (logical, one employee working a full day), whereas without this assumption, we see 64h of VSCode usage for one day (confusing). I think this is a question for @BrunoQuaresma and @bpmct, though, which matches their vision better? |
00a9b6a
to
167719c
Compare
167719c
to
2e51056
Compare
Decisions made:
Things we may want to improve in the future:
|
4ffc5f1
to
088620e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless @bpmct or @BrunoQuaresma has some objections, I think that this is in good shape to get merged. You can iterate on improvements in follow-ups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think this is in a good shape to start iterating.
4398e9f
to
7fd13df
Compare
This pull request implements the user latency and template insights endpoints, as per #8514.
Data available:
Missing (left for follow-up PRs):
Updates #8514
Refs #8109