-
Notifications
You must be signed in to change notification settings - Fork 41.1k
Add Kubelet stress test for pod cleanup when rejection due to VolumeAttachmentLimitExceeded
#133357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: torredil The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/sig node |
…LimitExceeded Signed-off-by: Eddie Torres <torredil@amazon.com>
fc53225
to
3e88780
Compare
/area test this test is useful. I would also urge to make some e2e test as e2e may catch some issues that are hard to repro with unit tests. I am not 100% sure we will need unit test stress if we can do e2e stress instead |
@SergeyKanzhelev In the first iteration of this PR I was actually writing this stress test as an e2e test similar to
Important to note that the test intentionally uses a real instance of podWorkers. I think the code coverage gained from this test is valuable and worth merging. I'm happy to explore the e2e path once more if you recommend it 👍 |
The thing e2e will validate that the pod admission will keep breaking at the same stage and not on some new condition like IP exhaustion. |
We will triage next week in SIG Node CI meeting and find a reviewer, thanks! |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds a stress test to validate pod cleanup behavior when
SyncPod
hits theVolumeAttachmentLimitExceeded
error path in WaitForAttachAndMount. The key assertions being validated are that all 500 pods reachPhase=Failed
with reasonVolumeAttachmentLimitExceeded
, confirming that admission fails as intended. Additionally, the test checks that SyncTerminatedPod completes successfully for each pod (we therefore know that kubelet executes the full teardown flow for every rejected pod; volumes unmounted, cgroup destroyed, etc). Finally, the test checks that allocated resources (as managed by allocationManager) to pod are cleaned up.Which issue(s) this PR is related to:
Fixes #133188
Special notes for your reviewer: See #132933 (comment) for context.
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: