Skip to content

Add Prometheus Native Histogram defaults #129406

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

SuperQ
Copy link
Contributor

@SuperQ SuperQ commented Dec 26, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Add some new default values to the metrics package for the Prometheus Native Histogram support.

  • Use the upstream documented recommendation for a 10% bucket factor.
  • Set an arbitrary limit of 160 buckets to avoid unbounded memory growth.
  • Set a 1 hour minimum reset interval to avoid frequent resets which would affect efficiency of the 120 sample per hour Prometheus chunks given a 30s scrape interval.

Which issue(s) this PR fixes:

See: #128842

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 26, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @SuperQ. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 26, 2024
@k8s-ci-robot k8s-ci-robot added sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 26, 2024
@SuperQ
Copy link
Contributor Author

SuperQ commented Dec 26, 2024

/sig instrumentation

@SuperQ SuperQ force-pushed the native_histogram_defaults branch from cb8750e to 3b96014 Compare December 27, 2024 08:11
Copy link

@beorn7 beorn7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick comment. The default values look good in general. The reason why there aren't any in client_golang yet is that we still have to see in practice what good default values are. (I'll be back from vacations only 2025-01-07. Happy to discuss things in more detail then.)

@rexagod
Copy link
Member

rexagod commented Jan 23, 2025

/triage accepted
/assign @rexagod @richabanker
/cc @dgrisonnet

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jan 23, 2025
@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Jan 23, 2025
@richabanker
Copy link
Contributor

Specifying these defaults in the Kubernetes codebase means we'd be taking on the responsibility of maintaining them. Given the wide range of ways users might want their metrics to look – which can vary significantly – these defaults might not be appropriate for everyone, esp. as was called out here? Also, since this feature is still declared experimental by Prometheus, baking in defaults could create maintenance challenges down the line as the feature evolves. Wouldn't it be more flexible and user-friendly to leave this as a configurable option only?

Add some new default values to the metrics package for the Prometheus Native
Histogram support.
* Use the upstream documented recommendation for a 10% bucket factor.
* Set an arbitrary limit of 160 buckets to avoid unbounded memory growth.
* Set a 1 hour minimum reset interval to avoid frequent resets which would
  affect efficiency of the 120 sample per hour Prometheus chunks given a 30s
  scrape interval.

See: kubernetes#128842

Signed-off-by: SuperQ <superq@gmail.com>
@SuperQ SuperQ force-pushed the native_histogram_defaults branch from 3b96014 to 4aa7a70 Compare February 19, 2025 11:43
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: SuperQ
Once this PR has been reviewed and has the lgtm label, please ask for approval from rexagod. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@SuperQ
Copy link
Contributor Author

SuperQ commented Feb 19, 2025

@richabanker We already set a number of defaults for other metrics, we already have to maintain them.

The point of this change is to discuss and come to some consensus on these defaults in the context of Kubernetes.

Having these be configuration options would be great, but we still need to set some sensible defaults for the configurations.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 1, 2025
@SuperQ
Copy link
Contributor Author

SuperQ commented Jul 8, 2025

Not stale

/remove-lifecycle stale

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 7, 2025
@SuperQ
Copy link
Contributor Author

SuperQ commented Aug 7, 2025

This would be really nice to move forward with. Getting native histograms started in Kubernetes would help a lot of users reduce the load on their metircs collection as it reduces the cardinality and improves the data quality of Kubernetes API metrics.

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 7, 2025
@SuperQ SuperQ requested review from beorn7 and toVersus August 7, 2025 13:30
@rexagod
Copy link
Member

rexagod commented Aug 7, 2025

Hello, apologies for the delay here. The SIG has triaged this effort and identified that there are certain prerequisites that need to be met before we can move forward in this direction, the first of which is a tracking KEP. The scope and details that the KEP will entail have already been discussed and will soon be driven by one of the SIG members.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/feature Categorizes issue or PR as related to a new feature. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants