-
Notifications
You must be signed in to change notification settings - Fork 41.1k
fix: data race in UnschedulablePods by adding lock #132043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: googs1025 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
in kubernetes/pkg/scheduler/schedule_one.go Lines 336 to 370 in 4832b57
|
@@ -1363,6 +1363,8 @@ func (p *PriorityQueue) newQueuedPodInfo(pod *v1.Pod, plugins ...string) *framew | |||
// UnschedulablePods holds pods that cannot be scheduled. This data structure | |||
// is used to implement unschedulablePods. | |||
type UnschedulablePods struct { | |||
// lock synchronizes access to podInfoMap and ensures thread-safe operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand it like: activeQ and backoffQ We need to use a read-write lock at the bottom 🤔
Have you tested if this change fixes the race? |
55a771e
to
66d4f4c
Compare
thanks for @macsko I used If I understand correctly, it seems that adding a read-write lock can solve this problem(but not). Could you give me some suggestions to solve this problem? 🤔 |
66d4f4c
to
8feedbb
Compare
I see the race might be different. It's between this line: kubernetes/pkg/scheduler/schedule_one.go Line 1069 in 849a82b
And this:
Precisely, it's a race on writing to
You could check if this is the case. But if it is, I'm not sure how we could fix this. |
8feedbb
to
f16f0b2
Compare
Signed-off-by: googs1025 <googs1025@gmail.com>
f16f0b2
to
760b454
Compare
@@ -1017,7 +1017,7 @@ func (p *PriorityQueue) Update(logger klog.Logger, oldPod, newPod *v1.Pod) { | |||
queue := p.requeuePodWithQueueingStrategy(logger, pInfo, hint, evt.Label()) | |||
if queue != unschedulablePods { | |||
logger.V(5).Info("Pod moved to an internal scheduling queue because the Pod is updated", "pod", klog.KObj(newPod), "event", evt.Label(), "queue", queue) | |||
p.unschedulablePods.delete(pInfo.Pod, gated) | |||
p.unschedulablePods.delete(pInfo.GetPodCopy(), gated) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit strange, we already use deepcopy which I understand should avoid data race 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race is on access to pInfo.Pod
field. If you try to copy it, you reference it, causing race.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, using pInfo.GetPodCopy()
could not avoid a data race since it accesses pInfo.Pod
as well?
func (pi *PodInfo) GetPodCopy() *v1.Pod {
if pi.Pod == nil {
return nil
}
return pi.Pod.DeepCopy(). <-- Access to `pInfo.Pod`
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
Scheduler receives pod update and calls SchedulingQueue.Update() with it Looking at the code, it seems to be the reason you described 🤔 |
It seems that these two lines have multiple data races, including reads and writes to |
@googs1025 I found a fix for this race. See #132451 |
/close There is a better way #132451 |
@googs1025: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #132025
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: