-
Notifications
You must be signed in to change notification settings - Fork 883
feat: add metrics to workspace agent scripts #11132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ebf974d
to
c567eab
Compare
I do not think this is a breaking change. I added a label to agent metrics, |
@@ -119,6 +124,10 @@ func verifyCollectedMetrics(t *testing.T, expected []agentsdk.AgentMetric, actua | |||
} | |||
|
|||
dtoLabels := asMetricAgentLabels(d.GetLabel()) | |||
// dto labels are sorted in alphabetical order. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move it this routine to asMetricAgentLabels
, so the function produces compliant DTO labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expected labels that we pass in have to be sorted. The DTO ones are already sorted.
I figured it was fine in the verify
function to sort them, otherwise maybe we would have to move it closer to where we define expected
?
coder/coderd/prometheusmetrics/aggregator_test.go
Lines 66 to 79 in 8bfa9ad
expected := []agentsdk.AgentMetric{ | |
{Name: "a_counter_one", Type: agentsdk.AgentMetricTypeCounter, Value: 1, Labels: commonLabels}, | |
{Name: "b_counter_two", Type: agentsdk.AgentMetricTypeCounter, Value: 4, Labels: commonLabels}, | |
{Name: "c_gauge_three", Type: agentsdk.AgentMetricTypeGauge, Value: 5, Labels: commonLabels}, | |
{Name: "c_gauge_three", Type: agentsdk.AgentMetricTypeGauge, Value: 2, Labels: []agentsdk.AgentMetricLabel{ | |
{Name: "agent_name", Value: testAgentName}, | |
{Name: "foobar", Value: "Foobaz"}, | |
{Name: "hello", Value: "world"}, | |
{Name: "username", Value: testUsername}, | |
{Name: "workspace_name", Value: testWorkspaceName}, | |
{Name: "template_name", Value: testTemplateName}, | |
}}, | |
{Name: "d_gauge_four", Type: agentsdk.AgentMetricTypeGauge, Value: 6, Labels: commonLabels}, | |
} |
But this only needs to be sorted when comparing to dto labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now what you mean. I guess we can leave it as is, there won't be a huge gain if you refactor it 👍
coderd/batchstats/batcher.go
Outdated
@@ -13,6 +13,7 @@ import ( | |||
|
|||
"cdr.dev/slog" | |||
"cdr.dev/slog/sloggers/sloghuman" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: many of these changes are sneaking in, should we adjust our linter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added this to my editor config, maybe that will fix it:
Line 126 in 8bfa9ad
local-prefixes: coder.com,cdr.dev,go.coder.com,github.com/cdr,github.com/coder |
The issue is that imports are not deterministic for the fmt from my experience. Both this in the PR, and grouping them is valid. There are packages like https://github.com/daixiang0/gci that make import ordering deterministic. Would prevent this, but require us to setup an extra tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Feel free to ship it if CI is happy.
What this does
Closes #10479
Add metrics for agent scripts
agents_scripts_executed_total{"success"="true/false",...}
Metric to keep track of the total number of scripts executed for a workspace and if they succeed or fail. This includes cron executed scripts by the agent. Was easy to include and felt beneficial to track script failures.
coderd_agentstats_startup_script_s{...} 1.969900304
Metric that indicates how long the startup scrip took to execute.
Other features
Adds
template_name
to all agent metrics as a label. Example:This allows grouping metrics by template_name, so you can see for example the average startup script time for a given template.