-
Notifications
You must be signed in to change notification settings - Fork 41.1k
improve skip devices allocation for running pods #133452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Sat Aug 9 04:20:07 UTC 2025. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Welcome @daimaxiaxie! |
Hi @daimaxiaxie. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: daimaxiaxie The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
klog.V(4).InfoS("Container not present in the initial running set", "podUID", podUID, "containerName", cntName, "containerID", cntID) | ||
return false | ||
} | ||
found := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your submission. We have indeed more evidence the original fix is incomplete, but I have reason to believe the actual problem is in the first part of this guard:
kubernetes/pkg/kubelet/cm/devicemanager/manager.go
Lines 598 to 601 in 948afe5
if !m.sourcesReady.AllReady() && m.isContainerAlreadyRunning(podUID, contName) { | |
klog.V(3).InfoS("container detected running, nothing to do", "deviceNumber", needed, "resourceName", resource, "podUID", podUID, "containerName", contName) | |
return nil, nil | |
} |
In other words, I think we are misusing m.sourcesReady.AllReady()
because AllReady()
actually turns true
when the sources are all connected, not when they have processed all the pods. We assumed the latter, turns out is actually the former (pending final verification). This explains why we see the failure happen not at every restart: we still have a race, albeit less likely (hopefully less likely?)
My initial thought is we can probably just write
if m.isContainerAlreadyRunning(podUID, contName) {
klog.V(3).InfoS("container detected running, nothing to do", "deviceNumber", needed, "resourceName", resource, "podUID", podUID, "containerName", contName)
return nil, nil
}
but this has to be very carefully validated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, it seems that both AllReady
and isContainerAlreadyRunning
may have problems.
From my logs, it appears that I entered the isContainerAlreadyRunning
function twice and found different results.
I0731 16:58:54.195904 909172 manager.go:1100] "container found in the initial set, assumed running" podUID="58cf408f-5297-4209-96a0-f2c367392151" containerName="app" containerID="f393bcca6c8028c74f8987165759a472b3085f4bdf26173eb3dbb4cbe1f6cc9d"
I0731 16:58:54.195953 909172 manager.go:1095] "container not present in the initial running set" podUID="58cf408f-5297-4209-96a0-f2c367392151" containerName="app" containerID="491ae36456e71a83a6c39270e364e85306f583f959b386660262898086462374"
From the #133382 perspective, it seems that AllReady
also has problems, and I will also investigate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree isContainerAlreadyRunning
can have bugs. The intention to use it was and is:
- create the initial set at startup, once
- check for presence during the allocation flow
in other words, the set is supposed to be created onceusing the data from the container runtime and never mutated again, which greatly reduces the chance to introduce bugs.
In your observation, which flow can make the container running set inconsistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. There could be an inconsistency between containerMap
and containerRunningSet
, this is also what you mentioned in the commit message. Am I right?
Still is not very clear to me how the new code is more robust: it seems an equivalent rewrite with some different pros and cons.
The original code was written trying to be as defensive as possible and reuse as much as possible the existing data we collect anyway in kubelet. The intention was to make a minimal, as safe as possible incremental change because this flow (kubelet restart) is ancient, rarely touched and hard to test. The fact the code is hard to test is unfortunate and we should eventually rectify this.
It's possible the original intent backfired and led to suboptimal, still buggy implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. There is an inconsistency between map and set. The map contains containers that have been stopped, which causes the same containerName to have two containerID.
/ok-to-test |
@daimaxiaxie: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
More stable pods when restarting kubelet.
Which issue(s) this PR is related to:
Fixes #133451
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: