Skip to content

client-go/tools/cache: add APIs with context parameter #126387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 18, 2024

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Jul 26, 2024

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

The context is used for cancellation and to support contextual logging.

In most cases, alternative *WithContext APIs get added, except for NewIntegerResourceVersionMutationCache where code searches indicate that the API is not used downstream.

An API break around SharedInformer couldn't be avoided because the
alternative (keeping the interface unchanged and adding a second one with
the new method) would have been worse. controller-runtime needs to be updated
because it implements that interface in a test package. Downstream consumers of
controller-runtime will work unless they use those test package.

Converting Kubernetes to use the other new alternatives will follow. In the meantime, usage of the new alternatives cannot be enforced via logcheck yet (see #126379 for the process).

Which issue(s) this PR fixes:

Related-to: #126379

Special notes for your reviewer:

#121989 contains this change and the related changes in the rest of k/k.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 26, 2024
@k8s-ci-robot k8s-ci-robot requested review from aojea and bart0sh July 26, 2024 14:14
@pohly pohly changed the title client-go/tools/cache: add APIs with context parameter WIP: client-go/tools/cache: add APIs with context parameter Jul 26, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 26, 2024
@pohly pohly force-pushed the log-client-go-tools-cache-apis branch 2 times, most recently from 8e98f6b to a5ffb2c Compare July 29, 2024 07:08
@pohly pohly changed the title WIP: client-go/tools/cache: add APIs with context parameter client-go/tools/cache: add APIs with context parameter Jul 29, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 29, 2024
@pohly pohly force-pushed the log-client-go-tools-cache-apis branch from a5ffb2c to 215861c Compare July 30, 2024 12:12
@seans3
Copy link
Contributor

seans3 commented Jul 30, 2024

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 30, 2024
@pohly
Copy link
Contributor Author

pohly commented Jul 31, 2024

/test pull-kubernetes-apidiff

@@ -62,6 +62,7 @@ type NamingConditionController struct {
}

func NewNamingConditionController(
ctx context.Context,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/126387/pull-kubernetes-apidiff/1818538357228048384:

- ./pkg/controller/status.NewNamingConditionController: changed from func(k8s.io/apiextensions-apiserver/pkg/client/informers/externalversions/apiextensions/v1.CustomResourceDefinitionInformer, k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset/typed/apiextensions/v1.CustomResourceDefinitionsGetter) *NamingConditionController to func(context.Context, k8s.io/apiextensions-apiserver/pkg/client/informers/externalversions/apiextensions/v1.CustomResourceDefinitionInformer, k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset/typed/apiextensions/v1.CustomResourceDefinitionsGetter) *NamingConditionController

This seems to be unused outside of Kubernetes, so breaking the API seems acceptable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all controllers can be considered internal and breaking consumers is ok.

//
// It's an error to call Run more than once.
// RunWithContext blocks; call via go.
RunWithContext(ctx context.Context)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/126387/pull-kubernetes-apidiff/1818538357228048384:

 ## ./staging/src/k8s.io/client-go
Incompatible changes:
- ./tools/cache.Controller.RunWithContext: added 

Adding a new method to an interface is a breaking change because out-of-tree implementations of that interface no longer work. I think this is an interface only because there is one test mocking it, not because there are supposed to be other implementations, so this API change is okay?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rather change the existing. This interface seems to be very low-level. Would not be surprised if it is not used externally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me - changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this Run method is what everyone calls who uses an informer. After changing it, lots of code no longer compiled. I think we have to keep RunWithContext and transition to it gradually.

type WatchErrorHandlerWithContext func(ctx context.Context, r *Reflector, err error)

// DefaultWatchErrorHandler is the default implementation of WatchErrorHandlerWithContext.
func DefaultWatchErrorHandler(ctx context.Context, r *Reflector, err error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/126387/pull-kubernetes-apidiff/1818538357228048384:

Incompatible changes:
- ./tools/cache.DefaultWatchErrorHandler: changed from func(*Reflector, error) to func(context.Context, *Reflector, error)

https://grep.app/search?q=DefaultWatchErrorHandler

Some copies seem to exist, but not used elsewhere?

Users of cache.Config can still set a watch handler without the extra context parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both opa gatekeeper and kubevirt aren't libraries. So version skew shouldn't be an issue.

@@ -60,21 +61,23 @@ type ResourceVersionComparator interface {
// If includeAdds is true, objects in the mutation cache will be returned even if they don't exist
// in the underlying store. This is only safe if your use of the cache can handle mutation entries
// remaining in the cache for up to ttl when mutations and deletes occur very closely in time.
func NewIntegerResourceVersionMutationCache(backingCache Store, indexer Indexer, ttl time.Duration, includeAdds bool) MutationCache {
func NewIntegerResourceVersionMutationCache(ctx context.Context, backingCache Store, indexer Indexer, ttl time.Duration, includeAdds bool) MutationCache {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incompatible changes:
 - ./tools/cache.NewIntegerResourceVersionMutationCache: changed from func(Store, Indexer, time.Duration, bool) MutationCache to func(context.Context, Store, Indexer, time.Duration, bool) MutationCache 

https://grep.app/search?q=NewIntegerResourceVersionMutationCache

Not used outside of Kubernetes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't look like.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an overkill, based on the discussion we've had with Jordan during KubeCon I believe we've agreed to apply context where it made sense. This cache seems like one that should not have, at most I'd expect it to only accept logger. This would resolve the other comment I had about creating a controller without the necessity for context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. This is one of those places that I hadn't updated based on the "it's okay to put logr.Logger into our APIs" conclusion.

What could potentially throw a monkey wrench into this "no context during construction" approach is when other components don't follow it. We've seen that with event recorder. Yesterday I found that construction a delaying queue also already spawns a goroutine during New*.

I still think "no goroutines during construction" is a sound principle, sometimes it just might be a bit harder to follow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took your ctx -> logger replacement for NewIntegerResourceVersionMutationCache from #129148 into this PR.

I did not take the other changes for controller construction. That can be a follow-up PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think "no goroutines during construction" is a sound principle, sometimes it just might be a bit harder to follow.

I don't want to clean those, yet. Probably in a separate PR, but in the long run we should definitely ensure that.

// where additional optional parameters are passed in a struct. If no resync period
// is specified there, AddEventHandlerWithConfig behaves like AddEventHandler.
// The context is used for contextual logging.
AddEventHandlerWithConfig(ctx context.Context, handler ResourceEventHandler, config HandlerConfig) (ResourceEventHandlerRegistration, error)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This WithConfig is used instead of adding a WithContext because it enables adding other optional parameters besides resync period without having to add another method.

Incompatible changes:
- ./tools/cache.SharedInformer.AddEventHandlerWithConfig: added
- ./tools/cache.SharedInformer.RunWithContext: added
- ./tools/cache.SharedInformer.SetWatchErrorHandlerWithContext: added 

Same as with Controller: probably shouldn't be implemented outside of Kubernetes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is implemented outside of kube, especially for mocking. Controller-runtime for example has an instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it acceptable to break that other code? Probably not.

The alternative is to leave cache.SharedInformer as it is and add a second cache.SharedInformerCtx with the new methods. The code which returns a cache.SharedInformer can return a cache.SharedInformerCtx (assignable to variables or fields of type cache.SharedInformer). The only API break for such producers is that the function signature changes.

The downside of this approach is that code accepting a cache.SharedInformer must continue to do so. It can optionally use the new method only after a checked type cast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't such a struct usually called Options in Kube, not Config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in that case, I would rather not mention Config/Options in the method name, but call it AddEventHandlerWithContext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that satisfy what consumers of controller-runtime want

... unless those consumers import those packages:
https://grep.app/search?q=sigs.k8s.io/controller-runtime/pkg/controller/controllertest

😢

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds okay from my side. We intentionally state in our docs that every CR minor version is only compatible with a specific client-go minor version:

Every minor version of controller-runtime has been tested with a specific minor version of client-go. A controller-runtime minor version may be compatible with other client-go minor versions, but this is by chance and neither supported nor tested. In general, we create one minor version of controller-runtime for each minor version of client-go and other k8s.io/* dependencies.
https://github.com/kubernetes-sigs/controller-runtime?tab=readme-ov-file#compatibility

Also happened in the past that we weren't entirely compatible with other client-go versions.

So seems okay to me to have a breaking change here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sbueringer for heads up. I've pushed a single commit with the original proposal (extending SharedInformer) and with the renames suggested by @sttts above.

Please take another look, it might be ready for merging now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me from a CR perspective

@pohly
Copy link
Contributor Author

pohly commented Dec 12, 2024

I force-pushed to sync with #129125. That includes the WithLogr -> WithLogger renaming and adding of comments to suppress logcheck warnings. These suppressions will become relevant later, when the Contextual logging: HandleErrorWithContext or HandleErrorWithLogger should be used instead of HandleError in code which supports contextual logging gets turned into logcheck:context // HandleErrorWithContext ... to make it something that gets enforced in relevant code by logcheck.

@pohly pohly force-pushed the log-client-go-tools-cache-apis branch from 11fc83b to eea739a Compare December 12, 2024 11:56
@pohly
Copy link
Contributor Author

pohly commented Dec 12, 2024

/test pull-kubernetes-apidiff

@pohly pohly force-pushed the log-client-go-tools-cache-apis branch from eea739a to 1e2bb64 Compare December 12, 2024 14:16
pohly added a commit to pohly/kubernetes that referenced this pull request Dec 12, 2024
@@ -210,7 +211,7 @@ func (c completedConfig) New(delegationTarget genericapiserver.DelegationTarget)
aggregatedDiscoveryManager = aggregatedDiscoveryManager.WithSource(aggregated.CRDSource)
}
discoveryController := NewDiscoveryController(s.Informers.Apiextensions().V1().CustomResourceDefinitions(), versionDiscoveryHandler, groupDiscoveryHandler, aggregatedDiscoveryManager)
namingController := status.NewNamingConditionController(s.Informers.Apiextensions().V1().CustomResourceDefinitions(), crdClient.ApiextensionsV1())
namingController := status.NewNamingConditionController(context.TODO() /* for contextual logging */, s.Informers.Apiextensions().V1().CustomResourceDefinitions(), crdClient.ApiextensionsV1())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be just the logger, like we agreed for all controllers. Especially in the New* methods.

}, r.backoffManager, true, stopCh)
klog.V(3).Infof("Stopping reflector %s (%s) from %s", r.typeDescription, r.resyncPeriod, r.name)
}, r.backoffManager, true, ctx.Done())
klog.FromContext(ctx).V(3).Info("Stopping reflector", "type", r.typeDescription, "resyncPeriod", r.resyncPeriod, "reflector", r.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
klog.FromContext(ctx).V(3).Info("Stopping reflector", "type", r.typeDescription, "resyncPeriod", r.resyncPeriod, "reflector", r.name)
logger.V(3).Info("Stopping reflector", "type", r.typeDescription, "resyncPeriod", r.resyncPeriod, "reflector", r.name)

you already have there logger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -1019,15 +1065,16 @@ type initialEventsEndBookmarkTicker struct {
// Note that the caller controls whether to call t.C() and t.Stop().
//
// In practice, the reflector exits the watchHandler as soon as the bookmark event is received and calls the t.C() method.
func newInitialEventsEndBookmarkTicker(name string, c clock.Clock, watchStart time.Time, exitOnWatchListBookmarkReceived bool) *initialEventsEndBookmarkTicker {
return newInitialEventsEndBookmarkTickerInternal(name, c, watchStart, 10*time.Second, exitOnWatchListBookmarkReceived)
func newInitialEventsEndBookmarkTicker(ctx context.Context, name string, c clock.Clock, watchStart time.Time, exitOnWatchListBookmarkReceived bool) *initialEventsEndBookmarkTicker {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context seems unnecessary here, based on my understanding. Logger should be sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

if w == nil && err == nil {
// stopCh was closed
return nil
}
if err != nil {
klog.Warningf("The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking, err = %v", err)
logger.Error(err, "The watchlist request ended with an error, falling back to the standard LIST/WATCH semantics because making progress is better than deadlocking")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below in this method we have:

	if fallbackToList {
		err = r.list(stopCh)
		if err != nil {
			return err
		}
	}

which seems the only consumer of stopCh. I tested locally and switching that bit to context should be an easy change, especially that later in tests you're using ctx.Done to pass to list invocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I already made that change in my full PR (#129125).

I've included it here instead.


if s.HasStarted() {
klog.Warningf("The sharedIndexInformer has started, run more than once is not allowed")
logger.Info("Warning: the sharedIndexInformer has started, run more than once is not allowed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bit on inconsistency, here you're using logger.Info("Warning: but in staging/src/k8s.io/client-go/tools/cache/listers.go#ListAllByNamespace you switch to klog.TODO().Error. I'd either do both with info and warning prefix or error, consistently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one here is correct. Warnings are not errors, just info message that "need more attention" - whatever that means.

I've changed ListAllByNamespace to do the same.

processorStopCh := make(chan struct{})
// Separate stop context because Processor should be stopped strictly after controller.
// Cancelation in the parent context is ignored.
processorStopCtx, stopProcessor := context.WithCancelCause(context.WithoutCancel(ctx))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double context wrapping is to:

  1. drop parent cancel
  2. maintain the logger from context

Do I get that correct? The former is being written down, but the 2nd point is something I had to figure out. I'd just put it in the comment, because the first thing that came to my mind was why not just start with a fresh context.Background() rather than double wrapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. It's not just logger (we could add that back explicitly) but also all other values which we don't know about here. I added a comment.

resyncPeriod = minimumResyncPeriod
}

if resyncPeriod < s.resyncCheckPeriod {
if s.started {
klog.Warningf("resyncPeriod %v is smaller than resyncCheckPeriod %v and the informer has already started. Changing it to %v", resyncPeriod, s.resyncCheckPeriod, s.resyncCheckPeriod)
logger.Info("Warning: resync period is smaller than resync check period and the informer has already started. Changing it to the resync check period", "resyncPeriod", resyncPeriod, "resyncCheckPeriod", s.resyncCheckPeriod)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another instance of info+warning. I'm leaning towards doing that also in that once place where you've used error. I don't see a reason why it would have that error there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and changed there.

// delivering again.
stopCh := make(chan struct{})
wait.Until(func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a while to grok it, but it does make sense.

@pohly pohly force-pushed the log-client-go-tools-cache-apis branch from 1e2bb64 to f8b9a14 Compare December 18, 2024 15:54
The context is used for cancellation and to support contextual logging.

In most cases, alternative *WithContext APIs get added, except for
NewIntegerResourceVersionMutationCache where code searches indicate that the
API is not used downstream.

An API break around SharedInformer couldn't be avoided because the
alternative (keeping the interface unchanged and adding a second one with
the new method) would have been worse. controller-runtime needs to be updated
because it implements that interface in a test package. Downstream consumers of
controller-runtime will work unless they use those test package.

Converting Kubernetes to use the other new alternatives will follow. In the
meantime, usage of the new alternatives cannot be enforced via logcheck
yet (see kubernetes#126379 for the
process).

Passing context through and checking it for cancellation is tricky for event
handlers. A better approach is to map the context cancellation to the normal
removal of an event handler via a helper goroutine. Thanks to the new
HandleErrorWithLogr and HandleCrashWithLogr, remembering the logger is
sufficient for handling problems at runtime.
@pohly pohly force-pushed the log-client-go-tools-cache-apis branch from f8b9a14 to 4638ba9 Compare December 18, 2024 17:46
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Dec 18, 2024

@pohly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-apidiff eea739a link false /test pull-kubernetes-apidiff
pull-kubernetes-apidiff-client-go 4638ba9 link false /test pull-kubernetes-apidiff-client-go
pull-kubernetes-linter-hints 4638ba9 link false /test pull-kubernetes-linter-hints

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 18, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 73c2ccaf794c3ac71eb3ff0dc4c67337c3eab90b

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, soltysh, sttts

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2024
@k8s-ci-robot k8s-ci-robot merged commit e305c33 into kubernetes:master Dec 18, 2024
15 of 17 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Dec 18, 2024
pohly added a commit to pohly/kubernetes that referenced this pull request Dec 29, 2024
The test relied on a 100ms sleep to ensure that controller was done.  If that
race was lost, one goroutine was intentionally prevented from completing by
locking a mutex permanently. A TODO was left about detecting that.

Adding goroutine leak checking in
kubernetes#126387 revealed that this race
indeed sometimes is lost because the goroutine
leaked (kubernetes#129400).

Waiting for controller shutdown instead of relying on timing should fix this.
k8s-publishing-bot pushed a commit to kubernetes/client-go that referenced this pull request Dec 29, 2024
The test relied on a 100ms sleep to ensure that controller was done.  If that
race was lost, one goroutine was intentionally prevented from completing by
locking a mutex permanently. A TODO was left about detecting that.

Adding goroutine leak checking in
kubernetes/kubernetes#126387 revealed that this race
indeed sometimes is lost because the goroutine
leaked (kubernetes/kubernetes#129400).

Waiting for controller shutdown instead of relying on timing should fix this.

Kubernetes-commit: 8e1403563a60f3b7a258e3bbb64b5c3a7f6548fb
maelvls added a commit to jetstack/jetstack-secure that referenced this pull request Mar 14, 2025
maelvls added a commit to jetstack/jetstack-secure that referenced this pull request Mar 14, 2025
maelvls added a commit to jetstack/jetstack-secure that referenced this pull request Mar 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/dependency Issues or PRs related to dependency changes area/logging area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/device-management Categorizes an issue or PR as relevant to WG Device Management. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Projects
Archived in project
Archived in project
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.