fix: data race in UnschedulablePods by adding lock #132043

googs1025 · 2025-05-31T01:17:55Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

fix data race in UnschedulablePods by adding lock

Which issue(s) this PR fixes:

Fixes #132025

Special notes for your reviewer:

Does this PR introduce a user-facing change?

None

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

None

k8s-ci-robot · 2025-05-31T01:18:04Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-05-31T01:18:09Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: googs1025
Once this PR has been reviewed and has the lgtm label, please assign macsko for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

pkg/scheduler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

googs1025 · 2025-05-31T01:22:00Z

in handleSchedulingFailure func, we We might call handleBindingCycleError in another goroutine.

kubernetes/pkg/scheduler/schedule_one.go

Lines 336 to 370 in 4832b57

    
           func (sched *Scheduler) handleBindingCycleError( 
        
           	ctx context.Context, 
        
           	state fwk.CycleState, 
        
           	fwk framework.Framework, 
        
           	podInfo *framework.QueuedPodInfo, 
        
           	start time.Time, 
        
           	scheduleResult ScheduleResult, 
        
           	status *framework.Status) { 
        
           	logger := klog.FromContext(ctx) 
        
           	assumedPod := podInfo.Pod 
        
           	// trigger un-reserve plugins to clean up state associated with the reserved Pod 
        
           	fwk.RunReservePluginsUnreserve(ctx, state, assumedPod, scheduleResult.SuggestedHost) 
        
           	if forgetErr := sched.Cache.ForgetPod(logger, assumedPod); forgetErr != nil { 
        
           		logger.Error(forgetErr, "scheduler cache ForgetPod failed") 
        
           	} else { 
        
           		// "Forget"ing an assumed Pod in binding cycle should be treated as a PodDelete event, 
        
           		// as the assumed Pod had occupied a certain amount of resources in scheduler cache. 
        
           		// 
        
           		// Avoid moving the assumed Pod itself as it's always Unschedulable. 
        
           		// It's intentional to "defer" this operation; otherwise MoveAllToActiveOrBackoffQueue() would 
        
           		// add this event to in-flight events and thus move the assumed pod to backoffQ anyways if the plugins don't have appropriate QueueingHint. 
        
           		if status.IsRejected() { 
        
           			defer sched.SchedulingQueue.MoveAllToActiveOrBackoffQueue(logger, framework.EventAssignedPodDelete, assumedPod, nil, func(pod *v1.Pod) bool { 
        
           				return assumedPod.UID != pod.UID 
        
           			}) 
        
           		} else { 
        
           			sched.SchedulingQueue.MoveAllToActiveOrBackoffQueue(logger, framework.EventAssignedPodDelete, assumedPod, nil, nil) 
        
           		} 
        
           	} 
        
           	sched.FailureHandler(ctx, fwk, podInfo, status, clearNominatedNode, start) 
        
           }

googs1025 · 2025-05-31T01:23:15Z

pkg/scheduler/backend/queue/scheduling_queue.go

@@ -1363,6 +1363,8 @@ func (p *PriorityQueue) newQueuedPodInfo(pod *v1.Pod, plugins ...string) *framew
 // UnschedulablePods holds pods that cannot be scheduled. This data structure
 // is used to implement unschedulablePods.
 type UnschedulablePods struct {
+	// lock synchronizes access to podInfoMap and ensures thread-safe operations.


I understand it like: activeQ and backoffQ We need to use a read-write lock at the bottom 🤔

macsko · 2025-06-02T09:06:25Z

Have you tested if this change fixes the race?

googs1025 · 2025-06-02T12:12:56Z

thanks for @macsko

I used GOFLAGS="-race" make test-integration WHAT=test/integration/scheduler/preemption KUBE_TEST_ARGS="-test.run TestPreemption/basic_pod_preemption_with_preFilter_" to test this change, and it seems that the data race still occurs.

If I understand correctly, it seems that adding a read-write lock can solve this problem(but not). Could you give me some suggestions to solve this problem? 🤔

macsko · 2025-06-02T13:12:26Z

I see the race might be different. It's between this line:

kubernetes/pkg/scheduler/schedule_one.go

Line 1069 in 849a82b

podInfo.PodInfo, _ = framework.NewPodInfo(cachedPod.DeepCopy())

And this:

kubernetes/pkg/scheduler/backend/queue/scheduling_queue.go

Line 1020 in 849a82b

p.unschedulablePods.delete(pInfo.Pod, gated)

Precisely, it's a race on writing to podInfo.PodInfo in the first and reading pInfo.Pod in the second. It's interesting, because the first writes to a pod info that can't be in a scheduling queue in the second. Probably, it's a very unlikely race when:

Scheduler receives pod update and calls SchedulingQueue.Update() with it
Pod is popped from the scheduling queue (either activeQ or backoffQ)
Pod is unschedulable and goes to handleSchedulingFailure (first - schedule_one.go#L1069), but in the same time SchedulingQueue.Update() is on scheduling_queue.go#L1020 - RACE

You could check if this is the case. But if it is, I'm not sure how we could fix this.

Signed-off-by: googs1025 <googs1025@gmail.com>

googs1025 · 2025-06-02T14:15:34Z

pkg/scheduler/backend/queue/scheduling_queue.go

@@ -1017,7 +1017,7 @@ func (p *PriorityQueue) Update(logger klog.Logger, oldPod, newPod *v1.Pod) {
 				queue := p.requeuePodWithQueueingStrategy(logger, pInfo, hint, evt.Label())
 				if queue != unschedulablePods {
 					logger.V(5).Info("Pod moved to an internal scheduling queue because the Pod is updated", "pod", klog.KObj(newPod), "event", evt.Label(), "queue", queue)
-					p.unschedulablePods.delete(pInfo.Pod, gated)
+					p.unschedulablePods.delete(pInfo.GetPodCopy(), gated)


This seems a bit strange, we already use deepcopy which I understand should avoid data race 🤔

Race is on access to pInfo.Pod field. If you try to copy it, you reference it, causing race.

If so, using pInfo.GetPodCopy() could not avoid a data race since it accesses pInfo.Pod as well?

func (pi *PodInfo) GetPodCopy() *v1.Pod { if pi.Pod == nil { return nil } return pi.Pod.DeepCopy(). <-- Access to `pInfo.Pod` }

googs1025 · 2025-06-02T14:17:12Z

Precisely, it's a race on writing to podInfo.PodInfo in the first and reading pInfo.Pod in the second. It's interesting, because the first writes to a pod info that can't be in a scheduling queue in the second. Probably, it's a very unlikely race when:

Scheduler receives pod update and calls SchedulingQueue.Update() with it
Pod is popped from the scheduling queue (either activeQ or backoffQ)
Pod is unschedulable and goes to handleSchedulingFailure (first - schedule_one.go#L1069), but in the same time SchedulingQueue.Update() is on scheduling_queue.go#L1020 - RACE

Looking at the code, it seems to be the reason you described 🤔

likakuli · 2025-06-18T13:10:52Z

It seems that these two lines have multiple data races, including reads and writes to podInfo.PodInfo, reads and writes to the podInfo.Pod itself, as well as reads and writes to the pod's properties.

macsko · 2025-06-23T09:06:13Z

@googs1025 I found a fix for this race. See #132451

googs1025 · 2025-06-26T00:52:37Z

/close

There is a better way #132451

k8s-ci-robot · 2025-06-26T00:52:43Z

@googs1025: Closed this PR.

In response to this:

/close

There is a better way #132451

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label May 31, 2025

k8s-ci-robot requested review from dom4ha and macsko May 31, 2025 01:18

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 31, 2025

googs1025 commented May 31, 2025

View reviewed changes

googs1025 force-pushed the fix/scheduler/data_race branch 2 times, most recently from 55a771e to 66d4f4c Compare June 2, 2025 11:53

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 2, 2025

googs1025 force-pushed the fix/scheduler/data_race branch from 66d4f4c to 8feedbb Compare June 2, 2025 12:17

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 2, 2025

googs1025 force-pushed the fix/scheduler/data_race branch from 8feedbb to f16f0b2 Compare June 2, 2025 13:48

fix: data race in UnschedulablePods by adding lock

760b454

Signed-off-by: googs1025 <googs1025@gmail.com>

googs1025 force-pushed the fix/scheduler/data_race branch from f16f0b2 to 760b454 Compare June 2, 2025 14:01

googs1025 commented Jun 2, 2025

View reviewed changes

hashim21223445 approved these changes Jun 18, 2025

View reviewed changes

macsko mentioned this pull request Jun 23, 2025

Fix race in scheduler integration tests #132451

Open

k8s-ci-robot closed this Jun 26, 2025

fix: data race in UnschedulablePods by adding lock #132043

fix: data race in UnschedulablePods by adding lock #132043

Uh oh!

Conversation

googs1025 commented May 31, 2025

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented May 31, 2025

Uh oh!

k8s-ci-robot commented May 31, 2025

Uh oh!

googs1025 commented May 31, 2025

Uh oh!

googs1025 May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

macsko commented Jun 2, 2025

Uh oh!

googs1025 commented Jun 2, 2025

Uh oh!

macsko commented Jun 2, 2025

Uh oh!

googs1025 Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

macsko Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

haosdent Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

macsko Jun 6, 2025

Choose a reason for hiding this comment

Uh oh!

googs1025 commented Jun 2, 2025

Uh oh!

likakuli commented Jun 18, 2025

Uh oh!

macsko commented Jun 23, 2025

Uh oh!

googs1025 commented Jun 26, 2025

Uh oh!

k8s-ci-robot commented Jun 26, 2025

Uh oh!

Uh oh!

googs1025 May 31, 2025 •

edited

Loading

macsko Jun 3, 2025 •

edited

Loading