-
Notifications
You must be signed in to change notification settings - Fork 41.1k
address counters and histograms context race and exemplars overhaul #128575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rexagod The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-unit |
571498f
to
9390d0d
Compare
/test pull-kubernetes-e2e-kind-ipv6 |
/assign @richabanker @dgrisonnet |
c6d7404
to
fd3747f
Compare
fd3747f
to
55b1dbf
Compare
AddWithExemplar
// WithContext is a no-op. | ||
func (v *CounterVec) WithContext(_ context.Context) *CounterVec { | ||
return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upstream divergence started from func (v *CounterVec) WithContext
onwards.
// Verify exemplar with different span context. | ||
exemplarGatherAndVerify(t, registry, altTraceID, altSpanID, 1, 2*toAdd+4) | ||
|
||
// Verify that all contextual counter calls are exclusive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this almost feels like it should be a separate test case since its following the table-test format.
} | ||
|
||
type exemplarHistogramVec struct { | ||
type HistogramVecWithContextWithObserver struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this type now since we removed the CounterVecWithContext type ?
counter := NewCounterVec(&CounterOpts{ | ||
Name: "metricvec_exemplar_test", | ||
Help: "helpless", | ||
}, []string{"label_a"}).WithContext(ctxForSpanCtx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with having the option to specify the context only at the Counter / Histogram level (especially since a spanID/traceID is applicable to a single function call as pointed out) as long as both are consistent. But looks like currently in this PR we are supporting specifying context for HistogramVec but not for CounterVec ?
Also, we can add the support for specifying counterVec / HistogramVec level contexts if the need arises in future?
@rexagod Any plans of merging this for 1.34? (not urgent but just bumping this up) |
32ec8d4
to
cb91901
Compare
Fixes: kubernetes#128548 Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
cb91901
to
efe5a81
Compare
// WithContext is a no-op. | ||
// WithContext is a no-op for now, users should still attach this to vectors for seamless future API upgrades (which rely on the context). | ||
// This is done to keep extensions (such as exemplars) on the counter and not its derivatives. | ||
// Note that there are no actual uses for this in the codebase except for chaining it with `WithLabelValues`, which makes no use of the context. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could still be breaking for downstream users that rely on the earlier WithContext(...).ctx
in some way or the other, albeit no such use case was found in Kubernetes itself.
I'm assuming general API stability cycles apply for that API, so we may want to push this out gradually, although I really doubt anyone will read value from this context rather than doing so just one step earlier in the chain (before passing the context).
efe5a81
to
1b5b50b
Compare
@rexagod: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
c := counterVec.WithContext(nil) | ||
|
||
c.WithLabelValues("1", "2").Inc() | ||
c.WithLabelValues("1", "2").(prometheus.ExemplarAdder).AddWithExemplar(100, prometheus.Labels{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just came across #89267, which essentially reinforces my concern that component owners, if needed, could extend vectors with exemplars like so. But, in doing this, they'll be directly interacting with the Prometheus client, which is discouraged, and makes sense, as it goes against the direction of the SIG post-Metrics Stability effort.
Do we want to diverge from upstream and allow component owners to extend vectors with exemplars (making context extensions ambiguous for us in the future as this would serve as the base for future additions on contextual vectors)? Not to mention, diverging from upstream may lead to similar divergence in the future, stacking up more maintenance debt for us?
This was talked about briefly on this PR a while ago, but IIRC we never reached a consensus, so this should make for a worthy agenda bullet point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is, even if we lean in the other direction, and try to add existing Add
, Inc
, and Observe
methods over contextual metrics, we'll not be able to utilise those since there's already a portion of Prometheus API being surfaced through component-base
, which goes against its design. Users will still need to rely on working with Prometheus' client directly, such as:
c.WithLabelValues("1", "2").(prometheus.ExemplarAdder).AddWithExemplar(100, prometheus.Labels{
"exemplar_label": "42",
})
which will, for good reason, get flagged by the script linked above, and add to technical debt.
On the other hand, I'm not entirely certain we can completely address things without breaking a couple of APIs. Of course, I'll send out a PR to refactor all uses within K8s to follow the established conventions to make us future-proof, but I'm not sure if and how much we take into account out-of-tree breakages for component-base
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is right now, we don't break any usages within the tree itself. Here's a quick rundown around what I deem to be the only breaking change (for out-of-tree users):
taking this as an example, all WithContext
applications in the tree as used the same way:
levelCounter.WithContext(ctx).WithLabelValues(string(level)).Inc()
however, WithLabelValues
makes no use of the wrapped contextual (counter) vector that was passed to it.
Since we do away with contexts at the vector level in this patch, WithContext(ctx)
will now essentially be a no-op, as we no use-case for it, and pass down a (counter) vector, instead of a contextual counter vector, and thus call it's WithLabelValues
, instead of the contextual vectors', which has been dropped in this patch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to diverge from upstream and allow component owners to extend vectors with exemplars
No, we dont want to do that. If the need arises to support contextual vectors, it should be provided via component-base. The script that you linked should help bring attention to any usages of the prometheus client library.
but IIRC we never reached a consensus
I thought the consensus was to not support contexts for vectors. Only support attaching contexts for Counter / Histogram ?
My understanding so far is that we are agreeing to have the following as a part of this PR
- We apply AddWithExemplar() implicitly for a contextualized Counter's / Histogram's
Add() | Inc() | Observe()
methods. And we ensure that we are creating the contextualized Counter/Histogram object in a race-free manner - Noop-ing the
CounterVec.WithContext()
&HistogramVec.WithContext()
since we agree that it doesn't make sense to apply a context to an entire vector. Instead, each individual counter or histogram within the vector should attach its own context and record exemplars. Further, I dont like a bunch of WithContext() usages across the code for vectors, which can continue to increase if people just copy this pattern for new metrics too. Eventually we should consider getting rid of theCounterVec.WithContext() | HistogramVec.WithContext()
entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @dgrisonnet to confirm if there's any misunderstanding in the above comment, or if there's any other concerns that need to be addressed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the consensus was to not support contexts for vectors. Only support attaching contexts for Counter / Histogram?
We talked about it earlier, yes, but we never got Damien's ACK to move forward with this. It'd be nice to have every stakeholder within the SIG be okay with this change.
Noop-ing the CounterVec.WithContext() & HistogramVec.WithContext() since we agree that it doesn't make sense to apply a context to an entire vector. Instead, each individual counter or histogram within the vector should attach its own context and record exemplars. Further, I dont like a bunch of WithContext() usages across the code for vectors, which can continue to increase if people just copy this pattern for new metrics too. Eventually we should consider getting rid of the CounterVec.WithContext() | HistogramVec.WithContext() entirely.
My only concern with doing this is how this will effect downstream users of WithContext
, as it's fairly plausible to assume that they may be creating a contextual vector using that, working with that a bit (say, asserting if it's a CounterVectorWithContext
, injecting custom KVs into that context, etc.), and then adding labels to it. This breaks with this patch, and essentially what all of my comments above resolved around.
Of course, being able to do that allows us to reposition the library with better Prometheus API abstraction, while following the established design pattern and reducing the technical debt that would otherwise arise in the future.
As is, we could either merge this PR if it looks good, or issue a deprecation notice for one release (since nothing breaks in the k/k tree itself, but potentially, and in a rare circumstance as I mentioned above, downstream) and then merge it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @dgrisonnet to confirm if there's any misunderstanding in the above comment, or if there's any other concerns that need to be addressed here.
Your comment is for the most part in line with my previous comments on the topic.
We talked about it earlier, yes, but we never got Damien's ACK to move forward with this. It'd be nice to have every stakeholder within the SIG be okay with this change.
This work is going on for quite some time now, I'd be better to finish it once and for all if all three of us are in agreement with what needs to be done.
My only concern with doing this is how this will effect downstream users of WithContext
I don't see how this change is breaking users of the library. Sure we are changing where the context needs to be attached, but functionality-wise, this is still the same as it was before. They will just have to specify label values before setting the context in order to migrate to the new version. That should for the most part be a pretty easy change to do when upgrading the library.
it's fairly plausible to assume that they may be creating a contextual vector using that, working with that a bit (say, asserting if it's a CounterVectorWithContext, injecting custom KVs into that context, etc.), and then adding labels to it.
If they abuse context to do that, it is wrong. They should use MustCurryWith
which is the supported way to do that with the Prometheus client. If we "break" that "functionnality" for them they have option to still be able to do what they were doing but using the supported method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noop-ing the CounterVec.WithContext() & HistogramVec.WithContext() since we agree that it doesn't make sense to apply a context to an entire vector. Instead, each individual counter or histogram within the vector should attach its own context and record exemplars. Further, I dont like a bunch of WithContext() usages across the code for vectors, which can continue to increase if people just copy this pattern for new metrics too. Eventually we should consider getting rid of the CounterVec.WithContext() | HistogramVec.WithContext() entirely
I think we should just remove these methods entirely instead of noop-ing them to make sure that users are aware that they need to make some code change before proceeding.
Again we are not breaking any functionality, we are just changing how to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they abuse context to do that, it is wrong. They should use MustCurryWith which is the supported way to do that with the Prometheus client.
I see. Just want to clarify that while MustCurryWith
will attach LVs to the MetricVec
, what I meant there was a WithValue
usage: ctx := <...>.WithContext(ctx).ctx; ctx = context.WithValue(ctx, "k", "v")
. But I'm fairly certain that will also fall into the similar category of abusing the context, so we don't need to respect any implicit guarantees for it.
This is essentially the stability guarantee for component-base
that I was unsure about, and just assumed that since we export that (WithContext
) API, we'll need to cautiously deprecate it, same as others. But I understand that this is done on a case-by case basis (depending on factors such as the ease of migration, the degree of inclusion upstream, etc.).
I think we should just remove these methods entirely instead of noop-ing them to make sure that users are aware that they need to make some code change before proceeding. Again we are not breaking any functionality, we are just changing how to use it.
CMIIW but dropping the API over the span of the same release as this patch may not be the best practise or well received. I propose a godoc
deprecation update for the same (WithContext
) API, as well as a CHANGELOG highlight, over at-least one more release before we drop it. Note that this will need a (albeit, straightforward) k/k-wide refactor to drop this usage as it would otherwise break instrumentation for them.
We can still make them aware next release by dropping the API, but by then we'll also have followed a deprecation cycle for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, apparently MustCurryWith
is only wrapped-over for timing and weighted histograms, which means users, as you mentioned earlier, will need to directly rely on the Prometheus client in order to curry all other metrics constructs exposed by us, which, IIUC, seems like a discrepancy.
We could drop it for these histograms, but since that will be breaking, we could make sure GaugeVecOps
and WeightedObserverVec
is generalised and implemented across all MetricVec
implementations.
Something we might want to pick up in the future; mentioning here for posterity.
What type of PR is this?
/kind bug
/kind failing-test
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #128548
Special notes for your reviewer:
/cc @pohly
While the patch addresses the
k8s.io/component-base/metrics.(*Counter).WithContext()
race, there's still some investigation that needs to be done (patch OTEL)?Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: