-
Notifications
You must be signed in to change notification settings - Fork 901
fix: fix metric for hard-limited presets #18045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1034,15 +1034,18 @@ func TestHardLimitedPresetShouldNotBlockDeletion(t *testing.T) { | |
require.Equal(t, database.WorkspaceTransitionDelete, workspaceBuilds[0].Transition) | ||
require.Equal(t, database.WorkspaceTransitionStart, workspaceBuilds[1].Transition) | ||
|
||
// Metric is deleted after preset became outdated. | ||
// The metric is still set to 1, even though the preset has become outdated. | ||
// This happens because the old value hasn't been overwritten by a newer preset yet. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: In There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the only way to make it work correctly in 100% use-cases is how I described it above.
But from user-point of view, only one preset exists:
Let's say I report a metric for Alternatively I can skip (do nothing) - but in this case there is a risk, that there is no corresponding Fundamental issue is inside ReconcilePreset - I just don't have enough information to make a correct decision. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, that decision needs to happen with full context in ReconcileAll 👍 |
||
mf, err = registry.Gather() | ||
require.NoError(t, err) | ||
metric = findMetric(mf, prebuilds.MetricPresetHardLimitedGauge, map[string]string{ | ||
"template_name": template.Name, | ||
"preset_name": preset.Name, | ||
"org_name": org.Name, | ||
}) | ||
require.Nil(t, metric) | ||
require.NotNil(t, metric) | ||
require.NotNil(t, metric.GetGauge()) | ||
require.EqualValues(t, 1, metric.GetGauge().GetValue()) | ||
}) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we added another label for the template version?
I don’t think it would increase the metric's cardinality, since we only have one active template version per preset at a time. When a preset switches to a new template version, the old metric time series would just become inactive and eventually be garbage collected by Prometheus.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how all metrics are done: https://github.com/coder/coder/blob/main/enterprise/coderd/prebuilds/metricscollector.go#L35
I think from end-user point of view - it make sense. From end-user point of view there is only one version of preset - latest and not deleted. They don't really care about some outdated/deleted version of presets.
But I'm going to reimplement it significantly in a next PR.
Main issue here that I'm trying to do it on ReconcilePreset level, but I think right way to do it is on ReconcileAll level.
Gather information about all presets and then report accordingly.
I need to create hashMap:
Then for every key in the map - I have to report either hardLimited or not.
I will report hard limited only in case if
[]preset
contains latest and not deleted preset and it's hard-limited, otherwise I just remove it from the map.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, https://github.com/coder/coder/blob/main/enterprise/coderd/prebuilds/metricscollector.go#L3 uses just the template name. I was referring to the template version ID, which is the only way to distinguish between multiple presets that use the same template.
So the full label set would be:
(
preset_name
,template_name
,template_version_id
,organization_name
).The downside of this approach is that Prometheus wouldn't know which version is the latest, so querying for the current status becomes harder, we would need extra logic to filter by the active version.
That trade-off might be worthwhile if we needed visibility into historical versions of a preset, but I don't think that's the case here.
So since we only care about the current status per preset, I agree that the current approach makes sense 👍