Skip to content

feat(cli): add dashboard load test command #8723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Jul 27, 2023
Merged

Conversation

johnstcn
Copy link
Member

This PR adds a scaletest subcommand to simulate dashboard load using codersdk.
A table is defined consisting of a list of actions to perform and an associated probability.
Each tick, an action is chosen from the table based on the result of a random number, and executed.

@johnstcn johnstcn self-assigned this Jul 25, 2023
@johnstcn johnstcn marked this pull request as ready for review July 26, 2023 12:57
@johnstcn johnstcn requested a review from mtojek July 26, 2023 12:58
)

var (
AllRBACResources = []RBACResource{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It looks like something devs may forget to update in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have rbac gen:

// Code generated by rbacgen/main.go. DO NOT EDIT.

I wonder if we can repurpose it for this too?

In a more general note though, we manually copy over all enums we have in the database package to the sdk package. So this problem already exists for all enum types.

var stdout, stderr bytes.Buffer
inv.Stdout = &stdout
inv.Stderr = &stderr
err := inv.WithContext(ctx).Run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you don't need to perform any actions/waiters/or output string-checks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is me being lazy 😅

// D&D nerds will feel right at home here :-)
// Note that the order of the table is important!
// Entries must be in ascending order.
var allActions rollTable = []rollTableEntry{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I buy the idea of rolling a dice. Isn't it better to perform a set of separate tests for each of these cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean like, save all the calls when hitting the "Users" page and gather them into 1 "scenario" that gets run?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because right now the "scenarios" are single api calls, but you could feasibly make them emulate page hits.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - we could build out each action to include multiple fetches in parallel.
For now I'm focusing on keeping it simple so that we don't have to constantly play catchup with the frontend.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "table" construct is primarily an implementation detail to make it easier to weight the probability of each scenario. I could imagine fetching HTTP logs of an existing instance, and setting the priority of each endpoint accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

case <-ctx.Done():
return
case <-t.C:
rolls <- rand.Intn(allActions.max() + 1) // nolint:gosec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that randomized tests may be hard to reproduce if every scenario is different. I'd rather model a few scenarios that cover all dashboard API calls. Feel free to challenge it though!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that randomized tests may be hard to reproduce if every scenario is different.

We're not looking for 100% reproducibility here, I don't think. I was mainly looking to generate load that corresponds to a baseline number of requests per second. When performed over a period of time, you're going to end up with data points for all of the scenarios.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never even considered reproducibility for the scaletest. Is there a way to maybe log or use prom metrics to indicate how many of each "scenario" is run?

Not needed in this PR, but would be cool so if 1 scaletest just craps the bed, and we notice "Scenerio 5" was run an excessive amount of times.

Idk how the metrics or logs are hooked up for this though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% adding prom metrics, we need those.
Regarding reproucibility, we could use a seeded PRNG so we can seed a single test with the current time and be able to reproduce it if needed.

@mtojek mtojek self-requested a review July 26, 2023 14:29
Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@Emyrk Emyrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the organization of the scaletest package. The harness looks nice 👍

clibase.Option{
Flag: "scaletest-prometheus-wait",
Env: "CODER_SCALETEST_PROMETHEUS_WAIT",
Default: "5s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 5s long enough? Isn't a default scraper 15s?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, you're right!

{0, fetchWorkspaces, "fetch workspaces"},
{1, fetchUsers, "fetch users"},
{2, fetchTemplates, "fetch templates"},
{3, authCheckAsOwner, "authcheck owner"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth getting the list of auth checks the UI does all the time. Not needed in this PR, but the FE does like 10ish checks on each load.

@johnstcn johnstcn merged commit 3282908 into main Jul 27, 2023
@johnstcn johnstcn deleted the cj/scaletest-dashboard branch July 27, 2023 08:40
@github-actions github-actions bot locked and limited conversation to collaborators Jul 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants