feat(coderd): add user latency and template insights endpoints #8519

mafredri · 2023-07-14T15:23:45Z

This pull request implements the user latency and template insights endpoints, as per #8514.

Data available:

User latency for specified time range and templates
Template users, both for whole interval and daily
Template build-in app usage (vscode, jetbrains, web term and ssh) for whole interval

Missing (left for follow-up PRs):

Template parameter value/usage
Template custom app usage

mafredri · 2023-07-19T13:24:17Z

Status update: Base logic is mostly done, however, some timeframe edge cases are not yet handled and there are a few open questions. Tests also need to be added.

We currently accept input times with date+clock, we should perhaps either remove clock or enforce that it's 00:00:00.
Allowing clock skews the interval report since we're not locked into whole days

Questions:

What timezone should we use**?**, we could use deployment, user, or viewer
- For simplicity/flexibility we can allow the dashboard to request whatever timezone it feels is best (as it is now)
- For the most accurate (IMO), we'd use the user timezone so that when user A's Monday and users B's Monday appear on Monday in the data, even though they're on opposite sides of the globe
App usage, should this be cumulative or an average of all users? (Current implementation is cumulative/per user)

Issues:

We're relying heavily on workspace_agents_stats, however, user latency based on this table is skewed since it includes agent<->coderd latency
- Also, if an admin e.g. does ssh otheruser/workspace, the admins session will be counted towards user stats as well (both latency and apps usage), that said, fixing this is not a high priority

Current response samples:

GET http://localhost:3000/api/v2/insights/user-latency?start_time=2023-07-18T00:00:00Z&end_time=2023-07-20T00:00:00Z

{
  "report": {
    "start_time": "2023-07-18T00:00:00Z",
    "end_time": "2023-07-20T00:00:00Z",
    "template_ids": [
      "bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
    ],
    "users": [
      {
        "template_ids": [
          "bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
        ],
        "user_id": "a6d24688-2c59-423a-a8b0-41d56331d4a1",
        "username": "admin",
        "latency_ms": {
          "p50": 0.327,
          "p95": 0.5352
        }
      },
      {
        "template_ids": [],
        "user_id": "e7217d35-9ce1-4c98-ae09-c6c9ad6f5686",
        "username": "member",
        "latency_ms": null
      },
      {
        "template_ids": [
          "bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
        ],
        "user_id": "d9fb2aa8-fedb-4c76-a759-27e4e4c5d550",
        "username": "woot",
        "latency_ms": {
          "p50": 0.59,
          "p95": 0.59
        }
      }
    ]
  }
}

GET http://localhost:3000/api/v2/insights/templates?start_time=2023-07-18T00:00:00Z&end_time=2023-07-20T00:00:00Z&interval=day

{
  "report": {
    "start_time": "2023-07-18T00:00:00Z",
    "end_time": "2023-07-20T00:00:00Z",
    "template_ids": [
      "bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
    ],
    "active_users": 2,
    "apps_usage": [
      {
        "template_ids": [
          "bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
        ],
        "type": "builtin",
        "display_name": "Visual Studio Code",
        "slug": "vscode",
        "icon": "/icons/code.svg",
        "seconds": 0
      },
      {
        "template_ids": [
          "bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
        ],
        "type": "builtin",
        "display_name": "JetBrains",
        "slug": "jetbrains",
        "icon": "/icons/intellij.svg",
        "seconds": 0
      },
      {
        "template_ids": [
          "bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
        ],
        "type": "builtin",
        "display_name": "Web Terminal",
        "slug": "reconnecting-pty",
        "icon": "/icons/terminal.svg",
        "seconds": 4800
      },
      {
        "template_ids": [
          "bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
        ],
        "type": "builtin",
        "display_name": "SSH",
        "slug": "ssh",
        "icon": "/icons/terminal.svg",
        "seconds": 1800
      }
    ]
  },
  "interval_reports": [
    {
      "start_time": "2023-07-18T00:00:00Z",
      "end_time": "2023-07-19T00:00:00Z",
      "template_ids": [
        "bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
      ],
      "interval": "day",
      "active_users": 2
    },
    {
      "start_time": "2023-07-19T00:00:00Z",
      "end_time": "2023-07-20T00:00:00Z",
      "template_ids": [
        "bb20c5b7-3b3c-4031-868b-3aee8b80e79b"
      ],
      "interval": "day",
      "active_users": 1
    }
  ]
}

mtojek · 2023-07-20T08:05:13Z

We currently accept input times with date+clock, we should perhaps either remove clock or enforce that it's 00:00:00.
Allowing clock skews the interval report since we're not locked into whole days

Thinking in terms of MVP, it is something we can iterate later. IMHO we should start with whole days, and exceptionally allow for the current day to be counted in hours (not 24h yet).

For simplicity/flexibility we can allow the dashboard to request whatever timezone it feels is best (as it is now)

Alternatively, you can depend on the deployment time+TZ, and just share it over Insights API, so that UI can consider it.

For the most accurate (IMO), we'd use the user timezone so that when user A's Monday and users B's Monday appear on
Monday in the data, even though they're on opposite sides of the globe

DevOps people are probably aware of this issue, so they will be fine with any option :) Do you find any of them easier to implement?

App usage, should this be cumulative or an average of all users? (Current implementation is cumulative/per user)
Issues:

cumulative/per user 👍 BUT... how will the API response look like if we are doing scale tests for 1000 users? AFAIR I asked about this before.

Also, if an admin e.g. does ssh otheruser/workspace, the admins session will be counted towards user stats as well (both latency and apps usage), that said, fixing this is not a high priority

I would park it for later 👍

mafredri · 2023-07-20T09:12:23Z

Thinking in terms of MVP, it is something we can iterate later. IMHO we should start with whole days, and exceptionally allow for the current day to be counted in hours (not 24h yet).

Agreed, I'll add the "today" exception so that it's possible for the site to show stable numbers for today.

Alternatively, you can depend on the deployment time+TZ, and just share it over Insights API, so that UI can consider it.

Sure, we can do that. We'll have to remove TZ from the start_time/end_time request format as well.

DevOps people are probably aware of this issue, so they will be fine with any option :) Do you find any of them easier to implement?

Since we don't have knowledge of user TZ, let's go with the simple approach for now. 😄

[...] how will the API response look like if we are doing scale tests for 1000 users? AFAIR I asked about this before.

I guess the scaletest numbers would be included, unless there's some special handling to not gather stats in this case? It's unclear to me whether or not we'd want to filter them out, will scale testing be a common/recurring practice on live deployments?

mtojek · 2023-07-20T09:23:10Z

I guess the scaletest numbers would be included, unless there's some special handling to not gather stats in this case? It's unclear to me whether or not we'd want to filter them out, will scale testing be a common/recurring practice on live deployments?

IMHO we have to look at scaletesting from a different angle. Running load tests with traffic ~1000 users means that any customer can potentially create such a large deployment. Coder API must sustain this load too, and in this case, I'm mostly concerned about the response size for the Insights endpoint.

johnstcn · 2023-07-20T09:25:57Z

What timezone should we use**?**, we could use deployment, user, or viewer

The principle of least surprise would be that the FE specifies the timezone to be used, auto-detecting from the browser.

App usage, should this be cumulative or an average of all users? (Current implementation is cumulative/per user)

Cumulative across all users makes the most sense here, I think. Per-user averages are too widely skewed.

I guess the scaletest numbers would be included, unless there's some special handling to not gather stats in this case? It's unclear to me whether or not we'd want to filter them out, will scale testing be a common/recurring practice on live deployments?

The scaletest command is hidden, so we should not expect it to be used regularly in the field.Th at having been said, all scaletest users and workspaces are prefixed with the string scaletest-, so excluding them from the /insights endpoint should not be too difficult. But maybe we do want to leverage scaletests to generate data for the insights endpoint? It should be performant even with a large amount of data.

johnstcn · 2023-07-20T09:34:49Z

coderd/insights.go

+	// Example:
+	// - I want data from Monday - Friday
+	// - I'm UTC+3 and the deployment is UTC+0
+	// - Do we select Monday - Friday in UTC+0 or UTC+3?


Friday UTC+3, using timezone from the incoming request. Coder admins won't necessarily know or care about the localtime of the server on which the deployment is running.

One factor here is that two people looking at "the same data" may see different numbers. They may share screen captures and wonder why the data doesn't add up. An anchor like deployment TZ would help in that case.

Yeah, showing the user's relative difference from the deployment TZ (or even just showing the GMT offset in the graph) would help indicate this.

coderd/insights.go

mafredri · 2023-07-20T09:36:40Z

@mtojek

IMHO we have to look at scaletesting from a different angle. Running load tests with traffic ~1000 users means that any customer can potentially create such a large deployment. Coder API must sustain this load too, and in this case, I'm mostly concerned about the response size for the Insights endpoint.

Ah right, this mainly affects the user-latency endpoint which we can add paginations to when/if needed. I think we can punt pagination for later, though. All stats in the templates endpoint are aggregated so we don't have the same scale issue there.

@johnstcn

The principle of least surprise would be that the FE specifies the timezone to be used, auto-detecting from the browser.

I'd be fine with this, it's easier to handle on the backend too (as-is currently) 😄, thoughts @mtojek?

Cumulative across all users makes the most sense here, I think. Per-user averages are too widely skewed.

Ah, I believe I've worded that too open ended. It's currently cumulative across all users, however, with per-user I meant group by, this e.g. comes into play in the following scenario:

User A has two workspaces and a VSCode sessions open to both workspaces from 9am to 10am. User A contributes to cumulative stats with 1h of VSCode usage (vs 2h). Same if it's two VSCode sessions open in one workspace, etc.

It's perhaps a weird line to draw, though, since the user can contribute with 1h vscode + 1h jetbrains + 1h terminal usage. The idea is anyway that a user can only focus on one app at a time.

But maybe we do want to leverage scaletests to generate data for the insights endpoint? It should be performant even with a large amount of data.

I agree. 👍🏻

coderd/database/dbauthz/dbauthz.go

johnstcn · 2023-07-20T09:40:11Z

The idea is anyway that a user can only focus on one app at a time.

I wouldn't necessarily make that assumption ;-)

mafredri · 2023-07-20T09:47:18Z

I wouldn't necessarily make that assumption ;-)

I think it's a matter of making the data useful. For instance, take a deployment of one user with 8 workspaces based on one template. At the start of the day they launch a VSCode for each workspace and during the day keeps hopping between editor windows. Given the current implementation, we will see 8h of VSCode usage (logical, one employee working a full day), whereas without this assumption, we see 64h of VSCode usage for one day (confusing).

I think this is a question for @BrunoQuaresma and @bpmct, though, which matches their vision better?

…sights-endpoints

mafredri · 2023-07-20T20:31:12Z

Decisions made:

User latency pagination will be left for later (/ as needed)
Client provides timezone as part of start_time/end_time, can use deployment tz if that's preferred
Time ranges can't be in the future, except "Today" which has a special case and if the clock is 15:33:00, an end time of 16:00:00 is OK (this includes all the latest data), although 15:00:00 may be preferable. Using anything other than 00:00:00 outside of "Today" is not allowed
Assuming admin permissions, so all data is "fair game"

Things we may want to improve in the future:

User latency, the data stored in workspace_agent_stats is suboptimal and includes agent<->coderd latency as well, not only user
App usage calculations, right now we only know that a connection from e.g. VSCode is open, but not whether or not the user is actively using VSCode

codersdk/insights.go

mtojek

Unless @bpmct or @BrunoQuaresma has some objections, I think that this is in good shape to get merged. You can iterate on improvements in follow-ups.

johnstcn

Agreed, I think this is in a good shape to start iterating.

…sights-endpoints

github-actions bot assigned mafredri Jul 14, 2023

mafredri linked an issue Jul 14, 2023 that may be closed by this pull request

Implement API endpoint for template/deployment insights #8514

Closed

mafredri force-pushed the mafredri/feat-coderd-add-user-latency-and-insights-endpoints branch from 8458e66 to c728857 Compare July 14, 2023 15:59

mafredri added 3 commits July 19, 2023 11:41

feat(coderd): add user latency and template insights endpoints

8a1d3ee

feat(coderd): implement user latency insights logic

b31ec33

feat(coderd): implement template insights logic

aaadc6a

mafredri force-pushed the mafredri/feat-coderd-add-user-latency-and-insights-endpoints branch from c728857 to aaadc6a Compare July 19, 2023 13:14

mafredri added 2 commits July 19, 2023 14:00

make start/endtime check more strict

b54b0aa

move interval verification

5f2bfd4

gen

4820c53

johnstcn reviewed Jul 20, 2023

View reviewed changes

coderd/insights.go Outdated Show resolved Hide resolved

johnstcn reviewed Jul 20, 2023

View reviewed changes

coderd/database/dbauthz/dbauthz.go Show resolved Hide resolved

mafredri added 2 commits July 20, 2023 11:29

allow endtime for today include the hour

6326643

prevent timetravel

2198c5f

mafredri force-pushed the mafredri/feat-coderd-add-user-latency-and-insights-endpoints branch from 00a9b6a to 167719c Compare July 20, 2023 16:38

add test for latencies

2e51056

mafredri force-pushed the mafredri/feat-coderd-add-user-latency-and-insights-endpoints branch from 167719c to 2e51056 Compare July 20, 2023 16:44

mafredri added 3 commits July 20, 2023 19:59

add test for template insights

ddc8606

Merge branch 'main' into mafredri/feat-coderd-add-user-latency-and-in…

26e0f02

…sights-endpoints

verify same timezone for start/end

2c8e311

johnstcn reviewed Jul 21, 2023

View reviewed changes

codersdk/insights.go Outdated Show resolved Hide resolved

mafredri added 5 commits July 21, 2023 08:54

s/greater/after/

87b1b90

unexport insights time layout

bffc673

remove interval none

056918e

remove mixed tz restrictions, test DST

d5a9865

fix lint

088620e

mafredri force-pushed the mafredri/feat-coderd-add-user-latency-and-insights-endpoints branch from 4ffc5f1 to 088620e Compare July 21, 2023 09:30

mafredri added 8 commits July 21, 2023 09:35

add fixmes to dbauthz

397cc7a

improve comment about assert equality

ebacfe1

replace sleep with cat

cd07478

add bad request tests

040390d

fix comment placement

b1de87f

create convert function for builtin apps

5588e39

move interval loop

dafbba1

remove all users, improve test comments

fc2157d

mtojek self-requested a review July 21, 2023 13:09

mtojek approved these changes Jul 21, 2023

View reviewed changes

johnstcn approved these changes Jul 21, 2023

View reviewed changes

Merge branch 'main' into mafredri/feat-coderd-add-user-latency-and-in…

7fd13df

…sights-endpoints

mafredri enabled auto-merge (squash) July 21, 2023 14:07

mafredri disabled auto-merge July 21, 2023 15:43

mafredri force-pushed the mafredri/feat-coderd-add-user-latency-and-insights-endpoints branch 2 times, most recently from 4398e9f to 7fd13df Compare July 21, 2023 16:31

Merge branch 'main' into mafredri/feat-coderd-add-user-latency-and-in…

9e228f6

…sights-endpoints

mafredri enabled auto-merge (squash) July 21, 2023 16:33

bpmct approved these changes Jul 21, 2023

View reviewed changes

fix windows test?

b03cc29

mafredri merged commit 30fe153 into main Jul 21, 2023

mafredri deleted the mafredri/feat-coderd-add-user-latency-and-insights-endpoints branch July 21, 2023 18:00

github-actions bot locked and limited conversation to collaborators Jul 21, 2023

feat(coderd): add user latency and template insights endpoints #8519

feat(coderd): add user latency and template insights endpoints #8519

Uh oh!

Conversation

mafredri commented Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mafredri commented Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mtojek commented Jul 20, 2023

Uh oh!

mafredri commented Jul 20, 2023

Uh oh!

mtojek commented Jul 20, 2023

Uh oh!

johnstcn commented Jul 20, 2023

Uh oh!

johnstcn Jul 20, 2023

Choose a reason for hiding this comment

Uh oh!

mafredri Jul 20, 2023

Choose a reason for hiding this comment

Uh oh!

johnstcn Jul 20, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mafredri commented Jul 20, 2023

Uh oh!

Uh oh!

johnstcn commented Jul 20, 2023

Uh oh!

mafredri commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mafredri commented Jul 20, 2023

Uh oh!

Uh oh!

mtojek left a comment

Choose a reason for hiding this comment

Uh oh!

johnstcn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mafredri commented Jul 14, 2023 •

edited

Loading

mafredri commented Jul 19, 2023 •

edited

Loading

mafredri commented Jul 20, 2023 •

edited

Loading