-
Notifications
You must be signed in to change notification settings - Fork 41.1k
comment out the device plugin test case: supports extended resources … #133286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/assign @pohly |
In the PR description you claim that it conflicts, but that claim is not supported by an actual explanation why or how. Let's investigate and find the root cause before making further changes. /cc @johnbelamaric |
the comment in the code provides more details on why there is such conflict |
18 minutes after the test finishes, kubelet still publishes 'example.com/resource: "0"' in node.status.Capacity and node.status.Allocatable. I tried to remove it from node.status, but kubelet keeps putting it back. still need to figure out how to cleanly remove it. while true; do date; kubectl get no -o yaml|grep example; sleep 5; done Tue Jul 29 08:24:39 PM UTC 2025 |
ok, it failed again, now i need to keep looking into it. |
/test pull-kubernetes-kind-dra-all |
@pohly @johnbelamaric found out the root cause. the feature "DRADeviceBindingConditions: true" does not work well with "DRAExtendedResources: true", i think DRADeviceBindingConditions was merged last night? I did not enable it when running the test locally. 360 2025-07-29T23:19:13.141476715Z stderr F panic: runtime error: invalid memory address or nil pointer dereference |
So this E2E failure was legitimate and #133290 fixes the problem, and was merged via #132522. There's still some learnings here: removing the extended resource from node status doesn't really work because of the kubelet. We should update the code accordingly and replace patching the node status with waiting for the count to go to zero or (in case of a future kubelet enhancement) to not be present. Let's use this PR to implement that? Or file an issue and close the PR. |
we can keep using this PR since it has some context already, what needs to figure out is why kubelet keeping publishing the extended resource after device plugin has been removed. |
Should we fix the leak/conflict instead? These tests could also be marked as serial. |
@BenTheElder we should fix the leak, but before that, it is better to comment out the other conflicting test, which was added for testing the deployDevicePlugin, which has already been tested in "must run pods with extended resource on dra nodes and device plugin nodes" test case. they are already marked as Serial, but that would not help, since the leaked example.com/resouce in the cluster's nodes still interferes with the next test run in the same cluster. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: yliaog The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-unit |
thet two test failures are not related to this PR, they can be ignored. |
Ah ... that's pretty bad, would like to get that fixed. We usually don't comment out tests, we e.g. framework.WithFlaky() might be an appropriate short-term solution? |
ok, added withFlaky() to temporarily disable the test |
…esources together with ResourceClaim
@yliaog: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
…together with ResourceClaim
What type of PR is this?
/kind bug
What this PR does / why we need it:
comment out the device plugin test case: supports extended resources together with ResourceClaim, it conflicts with the other test case: "must run pods with extended resource on dra nodes and device plugin nodes"
Which issue(s) this PR is related to:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: