-
Notifications
You must be signed in to change notification settings - Fork 41.1k
client-go/tools/cache: add APIs with context parameter #126387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client-go/tools/cache: add APIs with context parameter #126387
Conversation
8e98f6b
to
a5ffb2c
Compare
a5ffb2c
to
215861c
Compare
/triage accepted |
/test pull-kubernetes-apidiff |
@@ -62,6 +62,7 @@ type NamingConditionController struct { | |||
} | |||
|
|||
func NewNamingConditionController( | |||
ctx context.Context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- ./pkg/controller/status.NewNamingConditionController: changed from func(k8s.io/apiextensions-apiserver/pkg/client/informers/externalversions/apiextensions/v1.CustomResourceDefinitionInformer, k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset/typed/apiextensions/v1.CustomResourceDefinitionsGetter) *NamingConditionController to func(context.Context, k8s.io/apiextensions-apiserver/pkg/client/informers/externalversions/apiextensions/v1.CustomResourceDefinitionInformer, k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset/typed/apiextensions/v1.CustomResourceDefinitionsGetter) *NamingConditionController
This seems to be unused outside of Kubernetes, so breaking the API seems acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all controllers can be considered internal and breaking consumers is ok.
// | ||
// It's an error to call Run more than once. | ||
// RunWithContext blocks; call via go. | ||
RunWithContext(ctx context.Context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## ./staging/src/k8s.io/client-go
Incompatible changes:
- ./tools/cache.Controller.RunWithContext: added
Adding a new method to an interface is a breaking change because out-of-tree implementations of that interface no longer work. I think this is an interface only because there is one test mocking it, not because there are supposed to be other implementations, so this API change is okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would rather change the existing. This interface seems to be very low-level. Would not be surprised if it is not used externally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me - changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this Run
method is what everyone calls who uses an informer. After changing it, lots of code no longer compiled. I think we have to keep RunWithContext
and transition to it gradually.
type WatchErrorHandlerWithContext func(ctx context.Context, r *Reflector, err error) | ||
|
||
// DefaultWatchErrorHandler is the default implementation of WatchErrorHandlerWithContext. | ||
func DefaultWatchErrorHandler(ctx context.Context, r *Reflector, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incompatible changes:
- ./tools/cache.DefaultWatchErrorHandler: changed from func(*Reflector, error) to func(context.Context, *Reflector, error)
https://grep.app/search?q=DefaultWatchErrorHandler
Some copies seem to exist, but not used elsewhere?
Users of cache.Config
can still set a watch handler without the extra context parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both opa gatekeeper and kubevirt aren't libraries. So version skew shouldn't be an issue.
@@ -60,21 +61,23 @@ type ResourceVersionComparator interface { | |||
// If includeAdds is true, objects in the mutation cache will be returned even if they don't exist | |||
// in the underlying store. This is only safe if your use of the cache can handle mutation entries | |||
// remaining in the cache for up to ttl when mutations and deletes occur very closely in time. | |||
func NewIntegerResourceVersionMutationCache(backingCache Store, indexer Indexer, ttl time.Duration, includeAdds bool) MutationCache { | |||
func NewIntegerResourceVersionMutationCache(ctx context.Context, backingCache Store, indexer Indexer, ttl time.Duration, includeAdds bool) MutationCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incompatible changes:
- ./tools/cache.NewIntegerResourceVersionMutationCache: changed from func(Store, Indexer, time.Duration, bool) MutationCache to func(context.Context, Store, Indexer, time.Duration, bool) MutationCache
https://grep.app/search?q=NewIntegerResourceVersionMutationCache
Not used outside of Kubernetes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an overkill, based on the discussion we've had with Jordan during KubeCon I believe we've agreed to apply context where it made sense. This cache seems like one that should not have, at most I'd expect it to only accept logger. This would resolve the other comment I had about creating a controller without the necessity for context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. This is one of those places that I hadn't updated based on the "it's okay to put logr.Logger into our APIs" conclusion.
What could potentially throw a monkey wrench into this "no context during construction" approach is when other components don't follow it. We've seen that with event recorder. Yesterday I found that construction a delaying queue also already spawns a goroutine during New*
.
I still think "no goroutines during construction" is a sound principle, sometimes it just might be a bit harder to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took your ctx
-> logger
replacement for NewIntegerResourceVersionMutationCache
from #129148 into this PR.
I did not take the other changes for controller construction. That can be a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think "no goroutines during construction" is a sound principle, sometimes it just might be a bit harder to follow.
I don't want to clean those, yet. Probably in a separate PR, but in the long run we should definitely ensure that.
// where additional optional parameters are passed in a struct. If no resync period | ||
// is specified there, AddEventHandlerWithConfig behaves like AddEventHandler. | ||
// The context is used for contextual logging. | ||
AddEventHandlerWithConfig(ctx context.Context, handler ResourceEventHandler, config HandlerConfig) (ResourceEventHandlerRegistration, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This WithConfig
is used instead of adding a WithContext
because it enables adding other optional parameters besides resync period without having to add another method.
Incompatible changes:
- ./tools/cache.SharedInformer.AddEventHandlerWithConfig: added
- ./tools/cache.SharedInformer.RunWithContext: added
- ./tools/cache.SharedInformer.SetWatchErrorHandlerWithContext: added
Same as with Controller
: probably shouldn't be implemented outside of Kubernetes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is implemented outside of kube, especially for mocking. Controller-runtime for example has an instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it acceptable to break that other code? Probably not.
The alternative is to leave cache.SharedInformer
as it is and add a second cache.SharedInformerCtx
with the new methods. The code which returns a cache.SharedInformer
can return a cache.SharedInformerCtx
(assignable to variables or fields of type cache.SharedInformer
). The only API break for such producers is that the function signature changes.
The downside of this approach is that code accepting a cache.SharedInformer
must continue to do so. It can optionally use the new method only after a checked type cast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't such a struct usually called Options
in Kube, not Config
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in that case, I would rather not mention Config/Options in the method name, but call it AddEventHandlerWithContext
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that satisfy what consumers of controller-runtime want
... unless those consumers import those packages:
https://grep.app/search?q=sigs.k8s.io/controller-runtime/pkg/controller/controllertest
😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds okay from my side. We intentionally state in our docs that every CR minor version is only compatible with a specific client-go minor version:
Every minor version of controller-runtime has been tested with a specific minor version of client-go. A controller-runtime minor version may be compatible with other client-go minor versions, but this is by chance and neither supported nor tested. In general, we create one minor version of controller-runtime for each minor version of client-go and other k8s.io/* dependencies.
https://github.com/kubernetes-sigs/controller-runtime?tab=readme-ov-file#compatibility
Also happened in the past that we weren't entirely compatible with other client-go versions.
So seems okay to me to have a breaking change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sbueringer for heads up. I've pushed a single commit with the original proposal (extending SharedInformer
) and with the renames suggested by @sttts above.
Please take another look, it might be ready for merging now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me from a CR perspective
I force-pushed to sync with #129125. That includes the |
11fc83b
to
eea739a
Compare
/test pull-kubernetes-apidiff |
eea739a
to
1e2bb64
Compare
@@ -210,7 +211,7 @@ func (c completedConfig) New(delegationTarget genericapiserver.DelegationTarget) | |||
aggregatedDiscoveryManager = aggregatedDiscoveryManager.WithSource(aggregated.CRDSource) | |||
} | |||
discoveryController := NewDiscoveryController(s.Informers.Apiextensions().V1().CustomResourceDefinitions(), versionDiscoveryHandler, groupDiscoveryHandler, aggregatedDiscoveryManager) | |||
namingController := status.NewNamingConditionController(s.Informers.Apiextensions().V1().CustomResourceDefinitions(), crdClient.ApiextensionsV1()) | |||
namingController := status.NewNamingConditionController(context.TODO() /* for contextual logging */, s.Informers.Apiextensions().V1().CustomResourceDefinitions(), crdClient.ApiextensionsV1()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be just the logger, like we agreed for all controllers. Especially in the New*
methods.
}, r.backoffManager, true, stopCh) | ||
klog.V(3).Infof("Stopping reflector %s (%s) from %s", r.typeDescription, r.resyncPeriod, r.name) | ||
}, r.backoffManager, true, ctx.Done()) | ||
klog.FromContext(ctx).V(3).Info("Stopping reflector", "type", r.typeDescription, "resyncPeriod", r.resyncPeriod, "reflector", r.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
klog.FromContext(ctx).V(3).Info("Stopping reflector", "type", r.typeDescription, "resyncPeriod", r.resyncPeriod, "reflector", r.name) | |
logger.V(3).Info("Stopping reflector", "type", r.typeDescription, "resyncPeriod", r.resyncPeriod, "reflector", r.name) |
you already have there logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -1019,15 +1065,16 @@ type initialEventsEndBookmarkTicker struct { | |||
// Note that the caller controls whether to call t.C() and t.Stop(). | |||
// | |||
// In practice, the reflector exits the watchHandler as soon as the bookmark event is received and calls the t.C() method. | |||
func newInitialEventsEndBookmarkTicker(name string, c clock.Clock, watchStart time.Time, exitOnWatchListBookmarkReceived bool) *initialEventsEndBookmarkTicker { | |||
return newInitialEventsEndBookmarkTickerInternal(name, c, watchStart, 10*time.Second, exitOnWatchListBookmarkReceived) | |||
func newInitialEventsEndBookmarkTicker(ctx context.Context, name string, c clock.Clock, watchStart time.Time, exitOnWatchListBookmarkReceived bool) *initialEventsEndBookmarkTicker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context seems unnecessary here, based on my understanding. Logger should be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
if w == nil && err == nil { | ||
// stopCh was closed | ||
return nil | ||
} | ||
if err != nil { | ||
klog.Warningf("The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = %v", err) | ||
logger.Error(err, "The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below in this method we have:
if fallbackToList {
err = r.list(stopCh)
if err != nil {
return err
}
}
which seems the only consumer of stopCh
. I tested locally and switching that bit to context should be an easy change, especially that later in tests you're using ctx.Done
to pass to list
invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I already made that change in my full PR (#129125).
I've included it here instead.
|
||
if s.HasStarted() { | ||
klog.Warningf("The sharedIndexInformer has started, run more than once is not allowed") | ||
logger.Info("Warning: the sharedIndexInformer has started, run more than once is not allowed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bit on inconsistency, here you're using logger.Info("Warning:
but in staging/src/k8s.io/client-go/tools/cache/listers.go#ListAllByNamespace
you switch to klog.TODO().Error
. I'd either do both with info and warning prefix or error, consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one here is correct. Warnings are not errors, just info message that "need more attention" - whatever that means.
I've changed ListAllByNamespace
to do the same.
processorStopCh := make(chan struct{}) | ||
// Separate stop context because Processor should be stopped strictly after controller. | ||
// Cancelation in the parent context is ignored. | ||
processorStopCtx, stopProcessor := context.WithCancelCause(context.WithoutCancel(ctx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double context wrapping is to:
- drop parent cancel
- maintain the logger from context
Do I get that correct? The former is being written down, but the 2nd point is something I had to figure out. I'd just put it in the comment, because the first thing that came to my mind was why not just start with a fresh context.Background()
rather than double wrapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. It's not just logger
(we could add that back explicitly) but also all other values which we don't know about here. I added a comment.
resyncPeriod = minimumResyncPeriod | ||
} | ||
|
||
if resyncPeriod < s.resyncCheckPeriod { | ||
if s.started { | ||
klog.Warningf("resyncPeriod %v is smaller than resyncCheckPeriod %v and the informer has already started. Changing it to %v", resyncPeriod, s.resyncCheckPeriod, s.resyncCheckPeriod) | ||
logger.Info("Warning: resync period is smaller than resync check period and the informer has already started. Changing it to the resync check period", "resyncPeriod", resyncPeriod, "resyncCheckPeriod", s.resyncCheckPeriod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another instance of info+warning. I'm leaning towards doing that also in that once place where you've used error. I don't see a reason why it would have that error there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, and changed there.
// delivering again. | ||
stopCh := make(chan struct{}) | ||
wait.Until(func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to grok it, but it does make sense.
1e2bb64
to
f8b9a14
Compare
The context is used for cancellation and to support contextual logging. In most cases, alternative *WithContext APIs get added, except for NewIntegerResourceVersionMutationCache where code searches indicate that the API is not used downstream. An API break around SharedInformer couldn't be avoided because the alternative (keeping the interface unchanged and adding a second one with the new method) would have been worse. controller-runtime needs to be updated because it implements that interface in a test package. Downstream consumers of controller-runtime will work unless they use those test package. Converting Kubernetes to use the other new alternatives will follow. In the meantime, usage of the new alternatives cannot be enforced via logcheck yet (see kubernetes#126379 for the process). Passing context through and checking it for cancellation is tricky for event handlers. A better approach is to map the context cancellation to the normal removal of an event handler via a helper goroutine. Thanks to the new HandleErrorWithLogr and HandleCrashWithLogr, remembering the logger is sufficient for handling problems at runtime.
f8b9a14
to
4638ba9
Compare
@pohly: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
LGTM label has been added. Git tree hash: 73c2ccaf794c3ac71eb3ff0dc4c67337c3eab90b
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly, soltysh, sttts The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The test relied on a 100ms sleep to ensure that controller was done. If that race was lost, one goroutine was intentionally prevented from completing by locking a mutex permanently. A TODO was left about detecting that. Adding goroutine leak checking in kubernetes#126387 revealed that this race indeed sometimes is lost because the goroutine leaked (kubernetes#129400). Waiting for controller shutdown instead of relying on timing should fix this.
The test relied on a 100ms sleep to ensure that controller was done. If that race was lost, one goroutine was intentionally prevented from completing by locking a mutex permanently. A TODO was left about detecting that. Adding goroutine leak checking in kubernetes/kubernetes#126387 revealed that this race indeed sometimes is lost because the goroutine leaked (kubernetes/kubernetes#129400). Waiting for controller shutdown instead of relying on timing should fix this. Kubernetes-commit: 8e1403563a60f3b7a258e3bbb64b5c3a7f6548fb
The change we are trying to get is: kubernetes/kubernetes#126387
The change we are trying to get is: kubernetes/kubernetes#126387
The change we are trying to get is: kubernetes/kubernetes#126387
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
The context is used for cancellation and to support contextual logging.
In most cases, alternative *WithContext APIs get added, except for NewIntegerResourceVersionMutationCache where code searches indicate that the API is not used downstream.
An API break around SharedInformer couldn't be avoided because the
alternative (keeping the interface unchanged and adding a second one with
the new method) would have been worse. controller-runtime needs to be updated
because it implements that interface in a test package. Downstream consumers of
controller-runtime will work unless they use those test package.
Converting Kubernetes to use the other new alternatives will follow. In the meantime, usage of the new alternatives cannot be enforced via logcheck yet (see #126379 for the process).
Which issue(s) this PR fixes:
Related-to: #126379
Special notes for your reviewer:
#121989 contains this change and the related changes in the rest of k/k.
Does this PR introduce a user-facing change?