Skip to content

feat: add hard-limited presets metric #18008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 26, 2025

Conversation

evgeniy-scherbina
Copy link
Contributor

@evgeniy-scherbina evgeniy-scherbina commented May 23, 2025

Closes #17988

Define preset_hard_limited metric which for every preset indicates whether a given preset has reached the hard failure limit (1 for hard-limited, 0 otherwise).

CLI example:

curl -X GET localhost:2118/metrics | grep preset_hard_limited
# HELP coderd_prebuilt_workspaces_preset_hard_limited Indicates whether a given preset has reached the hard failure limit (1 for hard-limited, 0 otherwise).
# TYPE coderd_prebuilt_workspaces_preset_hard_limited gauge
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="GoLand: Large",template_name="Test7"} 1
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="GoLand: Large",template_name="ValidTemplate"} 0
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="IU: Medium",template_name="Test7"} 1
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="IU: Medium",template_name="ValidTemplate"} 0
coderd_prebuilt_workspaces_preset_hard_limited{organization_name="coder",preset_name="WS: Small",template_name="Test7"} 1

NOTE:

if !ps.Preset.Deleted && ps.Preset.UsingActiveVersion {
	c.metrics.trackHardLimitedStatus(ps.Preset.OrganizationName, ps.Preset.TemplateName, ps.Preset.Name, ps.IsHardLimited)
}

Only active template version is tracked. If admin creates new template version - old value of metric (for previous template version) will be overwritten with new value of metric (for active template version).
Because template_version is not part of metric:

labels = []string{"template_name", "preset_name", "organization_name"}

Implementation is similar to implementation of MetricResourceReplacementsCount metric

@evgeniy-scherbina evgeniy-scherbina force-pushed the 17988-metric-for-hard-limited-presets branch from 32f43de to 39748bd Compare May 23, 2025 13:21
@evgeniy-scherbina evgeniy-scherbina force-pushed the 17988-metric-for-hard-limited-presets branch from 39748bd to 6fb6800 Compare May 23, 2025 13:44
@evgeniy-scherbina evgeniy-scherbina force-pushed the 17988-metric-for-hard-limited-presets branch from f6ea98b to 2667684 Compare May 23, 2025 16:45
@evgeniy-scherbina evgeniy-scherbina marked this pull request as ready for review May 23, 2025 16:48
@evgeniy-scherbina evgeniy-scherbina force-pushed the 17988-metric-for-hard-limited-presets branch from 8db12a2 to 3100e01 Compare May 23, 2025 23:37

key := hardLimitedPresetKey{orgName: orgName, templateName: templateName, presetName: presetName}

mc.isPresetHardLimited[key] = isHardLimited
Copy link
Contributor

@ssncferreira ssncferreira May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: With this approach, we’re accumulating entries that are no longer relevant over time. This can cause the metrics to keep reporting 0 for presets that should disappear (e.g., deleted or outdated presets).

A small improvement would be to remove presets from the map when isHardLimited == false:

if isHardLimited {
	mc.isPresetHardLimited[key] = true
} else {
	delete(mc.isPresetHardLimited, key)
}

This keeps the metric data cleaner and avoids unnecessary entries in Prometheus.

But there is a catch 🤔 With this change, there’s a brief window where Prometheus may still show the previous value (1) for a preset after it’s been removed from the map. This is expected behavior in Prometheus, metrics are not deleted immediately, but expire after some time without being reported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed here: 80c89fe.

I was under a false assumption that it will report stale value indefinitely.

Copy link
Contributor

@ssncferreira ssncferreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me, just a note on the metrics setting: removing entries from the map when isHardLimited keeps metrics clean and reduce cardinality. The trade-off is that Prometheus will show stale data briefly after removal, since metrics aren’t deleted immediately but expire after some time without being reported. This behavior is expected and I think acceptable in this context.

Copy link
Contributor

@ssncferreira ssncferreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀 The comments in the tests are especially helpful, they make it really easy to follow the reconciliation process and understand the logic behind each condition.
Just one small suggestion regarding the metric description.

Co-authored-by: Susana Ferreira <ssncferreira@gmail.com>
@evgeniy-scherbina evgeniy-scherbina merged commit 2a15aa8 into main May 26, 2025
34 checks passed
@evgeniy-scherbina evgeniy-scherbina deleted the 17988-metric-for-hard-limited-presets branch May 26, 2025 15:39
@github-actions github-actions bot locked and limited conversation to collaborators May 26, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose metric for hard-limited presets
3 participants