Skip to content

feat: add metrics to workspace agent scripts #11132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Dec 13, 2023

Conversation

Emyrk
Copy link
Member

@Emyrk Emyrk commented Dec 11, 2023

What this does

Closes #10479

Add metrics for agent scripts

agents_scripts_executed_total{"success"="true/false",...}

Metric to keep track of the total number of scripts executed for a workspace and if they succeed or fail. This includes cron executed scripts by the agent. Was easy to include and felt beneficial to track script failures.

coderd_agentstats_startup_script_s{...} 1.969900304

Metric that indicates how long the startup scrip took to execute.

Other features

Adds template_name to all agent metrics as a label. Example:

agent_scripts_executed_total{agent_name="main",success="true",template_name="docker",username="admin",workspace_name="gvf"}

This allows grouping metrics by template_name, so you can see for example the average startup script time for a given template.

@Emyrk Emyrk force-pushed the stevenmasley/agent_script_metrics branch from ebf974d to c567eab Compare December 11, 2023 16:40
@Emyrk Emyrk changed the title wip: work on agent script metrics feat: workspace agent script metrics Dec 11, 2023
@Emyrk
Copy link
Member Author

Emyrk commented Dec 11, 2023

I do not think this is a breaking change. I added a label to agent metrics, template_name, so all existing queries that collect the metrics will still work. But new workspaces will make all new metrics since they have a new label value now.

@Emyrk Emyrk changed the title feat: workspace agent script metrics feat: add metrics to workspace agent script Dec 12, 2023
@Emyrk Emyrk changed the title feat: add metrics to workspace agent script feat: add metrics to workspace agent scripts Dec 12, 2023
@Emyrk Emyrk marked this pull request as ready for review December 12, 2023 15:13
@Emyrk Emyrk requested a review from mtojek December 12, 2023 15:31
@@ -119,6 +124,10 @@ func verifyCollectedMetrics(t *testing.T, expected []agentsdk.AgentMetric, actua
}

dtoLabels := asMetricAgentLabels(d.GetLabel())
// dto labels are sorted in alphabetical order.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move it this routine to asMetricAgentLabels, so the function produces compliant DTO labels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected labels that we pass in have to be sorted. The DTO ones are already sorted.
I figured it was fine in the verify function to sort them, otherwise maybe we would have to move it closer to where we define expected?

expected := []agentsdk.AgentMetric{
{Name: "a_counter_one", Type: agentsdk.AgentMetricTypeCounter, Value: 1, Labels: commonLabels},
{Name: "b_counter_two", Type: agentsdk.AgentMetricTypeCounter, Value: 4, Labels: commonLabels},
{Name: "c_gauge_three", Type: agentsdk.AgentMetricTypeGauge, Value: 5, Labels: commonLabels},
{Name: "c_gauge_three", Type: agentsdk.AgentMetricTypeGauge, Value: 2, Labels: []agentsdk.AgentMetricLabel{
{Name: "agent_name", Value: testAgentName},
{Name: "foobar", Value: "Foobaz"},
{Name: "hello", Value: "world"},
{Name: "username", Value: testUsername},
{Name: "workspace_name", Value: testWorkspaceName},
{Name: "template_name", Value: testTemplateName},
}},
{Name: "d_gauge_four", Type: agentsdk.AgentMetricTypeGauge, Value: 6, Labels: commonLabels},
}

But this only needs to be sorted when comparing to dto labels.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now what you mean. I guess we can leave it as is, there won't be a huge gain if you refactor it 👍

@@ -13,6 +13,7 @@ import (

"cdr.dev/slog"
"cdr.dev/slog/sloggers/sloghuman"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: many of these changes are sneaking in, should we adjust our linter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added this to my editor config, maybe that will fix it:

local-prefixes: coder.com,cdr.dev,go.coder.com,github.com/cdr,github.com/coder

The issue is that imports are not deterministic for the fmt from my experience. Both this in the PR, and grouping them is valid. There are packages like https://github.com/daixiang0/gci that make import ordering deterministic. Would prevent this, but require us to setup an extra tool.

@Emyrk Emyrk requested a review from mtojek December 13, 2023 15:19
Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Feel free to ship it if CI is happy.

@Emyrk Emyrk merged commit b7bdb17 into main Dec 13, 2023
@Emyrk Emyrk deleted the stevenmasley/agent_script_metrics branch December 13, 2023 17:45
@github-actions github-actions bot locked and limited conversation to collaborators Dec 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add prometheus metrics for full workspace start up time
3 participants