-
Notifications
You must be signed in to change notification settings - Fork 41.1k
Fix startup probe worker termination for sidecar containers #133072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Hi @AadiDev005. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/cc |
2e2983c
to
29f796b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks mostly good at first glance 👍
left some comments. let's see if the e2e passes.
@@ -2126,6 +2126,133 @@ var _ = SIGDescribe(feature.SidecarContainers, "Containers Lifecycle", func() { | |||
addAfterEachForCleaningUpPods(f) | |||
f.NamespacePodSecurityLevel = admissionapi.LevelPrivileged | |||
|
|||
ginkgo.When("A sidecar container with startup probe fails initially", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Can we move this test to the bottom of "SidecarContainers Container Lifecycle" block, since this is a very edge case?
@@ -2126,6 +2126,133 @@ var _ = SIGDescribe(feature.SidecarContainers, "Containers Lifecycle", func() { | |||
addAfterEachForCleaningUpPods(f) | |||
f.NamespacePodSecurityLevel = admissionapi.LevelPrivileged | |||
|
|||
ginkgo.When("A sidecar container with startup probe fails initially", func() { | |||
ginkgo.It("should continue probing and allow main container to start after sidecar recovers [NodeConformance]", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the [NodeConformance]
label here?
I think we don't need it for this edge case, and it is deprecated as well.
pkg/kubelet/prober/worker.go
Outdated
@@ -253,7 +253,9 @@ func (w *worker) doProbe(ctx context.Context) (keepGoing bool) { | |||
} | |||
// Abort if the container will not be restarted. | |||
return c.State.Terminated == nil || | |||
w.pod.Spec.RestartPolicy != v1.RestartPolicyNever | |||
w.pod.Spec.RestartPolicy != v1.RestartPolicyNever || | |||
(w.container.RestartPolicy != nil && *w.container.RestartPolicy == v1.ContainerRestartPolicyAlways) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check this only if the container is an init container?
The usage of the container restart policy for a regular container is not defined yet.
fc077e0
to
bc4856a
Compare
/test pull-kubernetes-node-kubelet-serial-containerd-sidecar-containers |
Thank you. |
LGTM label has been added. Git tree hash: f45acaaa0c792d94f849ef3ab006a26ee7d9b01b
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @SergeyKanzhelev
Could you take a look at this?
This fixes the bug where a startup probe of a restartable init container may not be executed if the restartable init container restarts.
/retest |
1 similar comment
/retest |
Hi @gjkim42 / @SergeyKanzhelev, The required
My patch only touches
Before I start rebasing and regenerating code, could you please confirm the expected next step? Thanks! |
#133302 (FYI: We cannot merge a PR unless all required tests succeed, so we have to retest them if it looks like a flaky test issue.) |
@Random-Liu @tallclair @SergeyKanzhelev |
@AadiDev005: You must be a member of the kubernetes/milestone-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Milestone Maintainers Team and have them propose you as an additional delegate for this responsibility. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: AadiDev005 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The code freeze is already in effect and this doesn't look like a urgent fix, so let's merge this for the next cycle. |
pkg/kubelet/prober/worker_test.go
Outdated
// newTestWorkerWithInitContainer creates a test worker with an init container setup | ||
func newTestWorkerWithInitContainer(m *manager, probeType probeType, probe v1.Probe) *worker { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It might be better to clarify that this is specifically about restartable init container.
// newTestWorkerWithInitContainer creates a test worker with an init container setup | |
func newTestWorkerWithInitContainer(m *manager, probeType probeType, probe v1.Probe) *worker { | |
// newTestWorkerWithRestartableInitContainer creates a test worker with an init container setup | |
func newTestWorkerWithRestartableInitContainer(m *manager, probeType probeType, probe v1.Probe) *worker { |
pkg/kubelet/prober/worker_test.go
Outdated
Image: "registry.k8s.io/pause:3.9", | ||
}} | ||
|
||
return &worker{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse the existing newWorker()
here? That way, we won't need getInitialResult()
anymore.
Something like this:
diff --git a/pkg/kubelet/prober/worker_test.go b/pkg/kubelet/prober/worker_test.go
index 1f6104179b1..d8ef9ae7462 100644
--- a/pkg/kubelet/prober/worker_test.go
+++ b/pkg/kubelet/prober/worker_test.go
@@ -38,7 +38,7 @@ func init() {
}
// newTestWorkerWithInitContainer creates a test worker with an init container setup
-func newTestWorkerWithInitContainer(m *manager, probeType probeType, probe v1.Probe) *worker {
+func newTestWorkerWithInitContainer(m *manager, probeType probeType) *worker {
pod := getTestPod()
// Set up init container with restart policy
@@ -50,19 +50,10 @@ func newTestWorkerWithInitContainer(m *manager, probeType probeType, probe v1.Pr
// Move container to init containers and add a regular container
pod.Spec.InitContainers = []v1.Container{initContainer}
pod.Spec.Containers = []v1.Container{{
- Name: "main-container",
- Image: "registry.k8s.io/pause:3.9",
+ Name: "main-container",
}}
- return &worker{
- pod: pod,
- container: initContainer,
- probeType: probeType,
- spec: &probe,
- probeManager: m,
- resultsManager: resultsManager(m, probeType),
- initialValue: getInitialResult(probeType),
- }
+ return newWorker(m, probeType, pod, initContainer)
}
pkg/kubelet/prober/worker_test.go
Outdated
pod.Spec.InitContainers = []v1.Container{initContainer} | ||
pod.Spec.Containers = []v1.Container{{ | ||
Name: "main-container", | ||
Image: "registry.k8s.io/pause:3.9", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Since the image field isn't used in the test, it can be removed.
Image: "registry.k8s.io/pause:3.9", |
pkg/kubelet/prober/worker.go
Outdated
@@ -253,7 +263,8 @@ func (w *worker) doProbe(ctx context.Context) (keepGoing bool) { | |||
} | |||
// Abort if the container will not be restarted. | |||
return c.State.Terminated == nil || | |||
w.pod.Spec.RestartPolicy != v1.RestartPolicyNever | |||
w.pod.Spec.RestartPolicy != v1.RestartPolicyNever || | |||
(w.isInitContainer() && w.container.RestartPolicy != nil && *w.container.RestartPolicy == v1.ContainerRestartPolicyAlways) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't feel strongly about this, but for readability, it might be worth giving this condition a name.
diff --git a/pkg/kubelet/prober/worker.go b/pkg/kubelet/prober/worker.go
index e71fd1b5687..4948cdb4236 100644
--- a/pkg/kubelet/prober/worker.go
+++ b/pkg/kubelet/prober/worker.go
@@ -261,10 +261,14 @@ func (w *worker) doProbe(ctx context.Context) (keepGoing bool) {
if !w.containerID.IsEmpty() {
w.resultsManager.Set(w.containerID, results.Failure, w.pod)
}
+
+ isRestartableInitContainer := w.isInitContainer() &&
+ w.container.RestartPolicy != nil && *w.container.RestartPolicy == v1.ContainerRestartPolicyAlways
+
// Abort if the container will not be restarted.
return c.State.Terminated == nil ||
w.pod.Spec.RestartPolicy != v1.RestartPolicyNever ||
- (w.isInitContainer() && w.container.RestartPolicy != nil && *w.container.RestartPolicy == v1.ContainerRestartPolicyAlways)
+ isRestartableInitContainer
}
// Graceful shutdown of the pod.
pkg/kubelet/prober/worker_test.go
Outdated
@@ -530,6 +572,72 @@ func TestResultRunOnStartupCheckFailure(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestDoProbe_TerminatedRestartableInitContainerWithRestartPolicyNone(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
func TestDoProbe_TerminatedRestartableInitContainerWithRestartPolicyNone(t *testing.T) { | |
func TestDoProbe_TerminatedRestartableInitContainerWithRestartPolicyNever(t *testing.T) { |
pkg/kubelet/prober/worker_test.go
Outdated
// Test: Terminated restartable init container with restartPolicy=Always should continue probing | ||
// even when pod has restartPolicy=Never | ||
if !w.doProbe(ctx) { | ||
t.Error("Expected to continue probing for terminated restartable init container with restart policy Always") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
t.Error("Expected to continue probing for terminated restartable init container with restart policy Always") | |
t.Error("Expected to continue probing for terminated restartable init container with pod restart policy Never") |
pkg/kubelet/prober/worker_test.go
Outdated
} | ||
|
||
// Verify result is set to Failure for terminated container | ||
expectResult(t, w, results.Failure, "terminated restartable init container with restart policy Always") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
expectResult(t, w, results.Failure, "terminated restartable init container with restart policy Always") | |
expectResult(t, w, results.Failure, "restartable init container with pod restart policy Never") |
Fixes a bug where startup probe workers terminate incorrectly for sidecar containers with restartPolicy=Always when the pod has restartPolicy=Never, causing main containers to remain stuck in "Initializing" state. Changes: - Add container-level restart policy check for init containers only - Extract complex boolean logic to named variable for readability - Refactor test helper to use existing newWorker() function - Fix terminology: RestartPolicyNone → RestartPolicyNever - Add comprehensive unit and e2e tests for both scenarios
bc4856a
to
0a26f03
Compare
New changes are detected. LGTM label has been removed. |
Hey @toVersus, thanks so much for the thorough review! 🙏 I've implemented all your suggestions and squashed everything into a clean single commit: • Renamed the function to be clearer about restartable init containers Really appreciate you taking the time to make the code cleaner and more maintainable. The refactoring suggestions especially make a lot of sense - no need to reinvent the wheel when @gjkim42 - can u take a look for a moment |
/test pull-kubernetes-node-e2e-containerd-features |
@AadiDev005: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The Windows unit test failure in @SergeyKanzhelev @gjkim42 All code review feedback addressed and Linux tests green. Ready for approval when the freeze lifts? |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Fixes a bug where startup probe workers terminate incorrectly for sidecar containers with
restartPolicy=Always
when the pod hasrestartPolicy=Never
. This causes main containers to remain stuck in "Initializing" state even after sidecar containers successfully restart.Problem Description:
restartPolicy: Never
restartPolicy: Always
and startup probeSolution:
Minimal fix adds container-level restart policy check only for init containers to the probe worker continuation logic, allowing workers to continue for sidecar containers while preserving existing behavior for regular containers.
Which issue(s) this PR is related to:
Fixes #132826
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: