-
Notifications
You must be signed in to change notification settings - Fork 896
feat: add hard-limited presets metric #18008
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add hard-limited presets metric #18008
Conversation
32f43de
to
39748bd
Compare
39748bd
to
6fb6800
Compare
f6ea98b
to
2667684
Compare
8db12a2
to
3100e01
Compare
|
||
key := hardLimitedPresetKey{orgName: orgName, templateName: templateName, presetName: presetName} | ||
|
||
mc.isPresetHardLimited[key] = isHardLimited |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: With this approach, we’re accumulating entries that are no longer relevant over time. This can cause the metrics to keep reporting 0
for presets that should disappear (e.g., deleted or outdated presets).
A small improvement would be to remove presets from the map when isHardLimited == false
:
if isHardLimited {
mc.isPresetHardLimited[key] = true
} else {
delete(mc.isPresetHardLimited, key)
}
This keeps the metric data cleaner and avoids unnecessary entries in Prometheus.
But there is a catch 🤔 With this change, there’s a brief window where Prometheus may still show the previous value (1
) for a preset after it’s been removed from the map. This is expected behavior in Prometheus, metrics are not deleted immediately, but expire after some time without being reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed here: 80c89fe.
I was under a false assumption that it will report stale value indefinitely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me, just a note on the metrics setting: removing entries from the map when isHardLimited
keeps metrics clean and reduce cardinality. The trade-off is that Prometheus will show stale data briefly after removal, since metrics aren’t deleted immediately but expire after some time without being reported. This behavior is expected and I think acceptable in this context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🚀 The comments in the tests are especially helpful, they make it really easy to follow the reconciliation process and understand the logic behind each condition.
Just one small suggestion regarding the metric description.
Co-authored-by: Susana Ferreira <ssncferreira@gmail.com>
Closes #17988
Define
preset_hard_limited
metric which for every preset indicates whether a given preset has reached the hard failure limit (1 for hard-limited, 0 otherwise).CLI example:
NOTE:
Only active template version is tracked. If admin creates new template version - old value of metric (for previous template version) will be overwritten with new value of metric (for active template version).
Because
template_version
is not part of metric:Implementation is similar to implementation of
MetricResourceReplacementsCount
metric