Skip to content

scheduler: stop clearing NominatedNodeName on all cases #132439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

utam0k
Copy link
Member

@utam0k utam0k commented Jun 21, 2025

What type of PR is this?

/kind feature
/sig scheduling
/cc sanposhiho

What this PR does / why we need it:

ref: #132384

Which issue(s) this PR is related to:

Fixes #132384

Special notes for your reviewer:

Does this PR introduce a user-facing change?

The scheduler no longer clears the `nominatedNodeName` field for Pods. External components (such as Cluster Autoscaler and Karpenter) are responsible for managing this field when needed.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot requested a review from sanposhiho June 21, 2025 04:14
@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 21, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Jun 21, 2025
@utam0k
Copy link
Member Author

utam0k commented Jun 21, 2025

/test pull-kubernetes-unit

@utam0k utam0k force-pushed the not-to-clear-nnn branch from a06ef74 to f53ac40 Compare June 21, 2025 08:22
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 21, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: utam0k
Once this PR has been reviewed and has the lgtm label, please assign sanposhiho for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jun 21, 2025
@utam0k
Copy link
Member Author

utam0k commented Jun 21, 2025

/test pull-kubernetes-e2e-kind

@lmktfy
Copy link

lmktfy commented Jun 22, 2025

Is this relevant to #132443?

@lmktfy
Copy link

lmktfy commented Jun 22, 2025

Changelog suggestion

-The scheduler no longer clears the NominatedNodeName field for pods. External components (like Cluster Autoscaler and Karpenter) are responsible for managing this field when needed.
+The scheduler no longer clears the `nominatedNodeName` field for Pods. External components (such as Cluster Autoscaler and Karpenter) are responsible for managing this field when needed.

However, see #132443 (comment)

We should align the two changelog entries.

@utam0k
Copy link
Member Author

utam0k commented Jun 23, 2025

Is this relevant to #132443?

Yes, it is. I've updated the release note.

@sanposhiho
Copy link
Member

/cc @macsko @dom4ha

This is part of nnn kep

@k8s-ci-robot k8s-ci-robot requested review from dom4ha and macsko June 23, 2025 22:11
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 2, 2025
@utam0k utam0k force-pushed the not-to-clear-nnn branch from 9b2783b to cc23a20 Compare July 20, 2025 00:53
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2025
@utam0k
Copy link
Member Author

utam0k commented Jul 21, 2025

/retest-required

@utam0k
Copy link
Member Author

utam0k commented Jul 21, 2025

@macsko @dom4ha @sanposhiho PTAL 🙏

@utam0k utam0k force-pushed the not-to-clear-nnn branch from cc23a20 to 4463e7b Compare July 21, 2025 13:00
name: "pod with existing nominated node name clears NNN when feature gate is disabled",
sendPod: func() *v1.Pod {
p := podWithID("foo", "")
p.Status.NominatedNodeName = "existing-node"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we add the logic with NominatedNodeName to the test itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macsko Thank you for your review. I want to make sure I understand your suggestion correctly.

Currently, the test logic already handles NominatedNodeName dynamically for all test cases (lines
992-993):

if !nominatedNodeNameForExpectationEnabled && expectedNominatingInfo == nil {
    expectedNominatingInfo = &framework.NominatingInfo{NominatingMode: framework.ModeOverride,
NominatedNodeName: ""}
}

Are you suggesting that:

  • We should remove the dedicated test case I added and rely on the existing test logic?
  • Or should we modify the existing test cases to include NominatedNodeName scenarios?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to run even this dedicated test case with the feature enabled and disabled. And, check everything required in the testing code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think this test case can even be dropped. It's outcome is equivalent to the schedule pod failed test case, as the NominatedNodeName on sendPod is ignored in these tests anyway.

@utam0k utam0k force-pushed the not-to-clear-nnn branch 2 times, most recently from b5d0d32 to bb3dece Compare July 24, 2025 02:11
@utam0k
Copy link
Member Author

utam0k commented Jul 24, 2025

/retest-required

@utam0k utam0k force-pushed the not-to-clear-nnn branch from bb3dece to 0ebd6ab Compare July 25, 2025 11:32
@utam0k utam0k force-pushed the not-to-clear-nnn branch from 0ebd6ab to 12a8d43 Compare July 25, 2025 13:38
Signed-off-by: utam0k <k0ma@utam0k.jp>
@utam0k utam0k force-pushed the not-to-clear-nnn branch from 12a8d43 to a468787 Compare July 26, 2025 01:09
@utam0k
Copy link
Member Author

utam0k commented Jul 26, 2025

/test pull-kubernetes-unit

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 28, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Vyom-Yadav
Copy link
Member

/milestone v1.34

@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jul 28, 2025
@wendy-ha18
Copy link
Member

/milestone v1.34

Comment on lines +857 to +859
}
if nominatedNodeNameForExpectationEnabled {
featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.NominatedNodeNameForExpectation, nominatedNodeNameForExpectationEnabled)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
if nominatedNodeNameForExpectationEnabled {
featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.NominatedNodeNameForExpectation, nominatedNodeNameForExpectationEnabled)
} else {
featuregatetesting.SetFeatureGateDuringTest(t, utilfeature.DefaultFeatureGate, features.NominatedNodeNameForExpectation, nominatedNodeNameForExpectationEnabled)

expectedNominatingInfo = &framework.NominatingInfo{NominatingMode: framework.ModeOverride, NominatedNodeName: ""}
}
if diff := cmp.Diff(expectedNominatingInfo, gotNominatingInfo); diff != "" {
t.Errorf("Unexpected nominatingInfo diff (-want +got):\n%s", diff)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.Errorf("Unexpected nominatingInfo diff (-want +got):\n%s", diff)
t.Errorf("Unexpected nominatingInfo (-want,+got):\n%s", diff)

name: "pod with existing nominated node name clears NNN when feature gate is disabled",
sendPod: func() *v1.Pod {
p := podWithID("foo", "")
p.Status.NominatedNodeName = "existing-node"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think this test case can even be dropped. It's outcome is equivalent to the schedule pod failed test case, as the NominatedNodeName on sendPod is ignored in these tests anyway.


for _, p := range pods {
if p.Status.NominatedNodeName != "" {
podsOnceNominated = append(podsOnceNominated, p.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this slice even populate at any point? p.Status.NominatedNodeName is likely always empty, as we don't overwrite these particular objects with the new value anywhere (not calling informers). Isn't it enough to check the "medium" pod only, similarly to the previous mechanism (in if !tt.enableNominatedNodeNameForExpectation)?

@@ -1735,6 +1767,23 @@ func TestNominatedNodeCleanUp(t *testing.T) {
}
}

if tt.enableNominatedNodeNameForExpectation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code could be moved at the bottom of the testing code (where if !tt.enableNominatedNodeNameForExpectation resides). But, we have to make sure this check won't be ran too early, i.e. before the pod is actually re-processed.

@macsko
Copy link
Member

macsko commented Jul 29, 2025

/close
In favor of #133276

@k8s-ci-robot
Copy link
Contributor

@macsko: Closed this PR.

In response to this:

/close
In favor of #133276

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@github-project-automation github-project-automation bot moved this from Tracked to Done in [sig-release] Bug Triage Jul 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

KEP-5278: Change the scheduler not to clear NNN
8 participants