Skip to content

Allow implicit extended resource name to be used no matter explicit extendedResourceName field is set or not in device class #133363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yliaog
Copy link
Contributor

@yliaog yliaog commented Aug 3, 2025

What type of PR is this?

/kind bug
#133366

What this PR does / why we need it:

fixed bug such that implicit extended resource name can always be used,
no matter the explicit extendedResourceName field in device class is set or not.

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

implicit extended resource name derived from device class (deviceclass.resource.kubernetes.io/<device-class-name>) can be used to request DRA devices matching the device class.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 3, 2025
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. wg/device-management Categorizes an issue or PR as relevant to WG Device Management. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 3, 2025
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 3, 2025
@yliaog
Copy link
Contributor Author

yliaog commented Aug 3, 2025

/assign @johnbelamaric @macsko

@@ -430,7 +430,7 @@ func hasDeviceClassMappedExtendedResource(reqs v1.ResourceList, deviceClassMappi
// We only care about the resources requested by the pod we are trying to schedule.
continue
}
if v1helper.IsExtendedResourceName(rName) {
if v1helper.IsExtendedResourceName(rName) || strings.HasPrefix(string(rName), resourceapi.ResourceDeviceClassPrefix) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What with the withDeviceClass func in NodeResourcesFit plugin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, we also need to add that. added. PTAL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can consider exporting v1helper.IsExtendedResourceName(rName) || strings.HasPrefix(string(rName), resourceapi.ResourceDeviceClassPrefix) to a separate function, e.g. in pkg/scheduler/util/utils.go

@bart0sh
Copy link
Contributor

bart0sh commented Aug 5, 2025

/triage accepted
/priority important-soon
/lgtm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. labels Aug 5, 2025
@yliaog
Copy link
Contributor Author

yliaog commented Aug 5, 2025

/retest

1 similar comment
@yliaog
Copy link
Contributor Author

yliaog commented Aug 5, 2025

/retest

@macsko
Copy link
Member

macsko commented Aug 6, 2025

/approve
Scheduler changes

Please add a release note (Does this PR introduce a user-facing change? block)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: macsko, yliaog
Once this PR has been reviewed and has the lgtm label, please ask for approval from klueska. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 6, 2025
@yliaog
Copy link
Contributor Author

yliaog commented Aug 6, 2025

@klueska could you take a look at this PR?

@yliaog
Copy link
Contributor Author

yliaog commented Aug 6, 2025

/retest

1 similar comment
@yliaog
Copy link
Contributor Author

yliaog commented Aug 6, 2025

/retest

@natasha41575 natasha41575 moved this from Triage to Archive-it in SIG Node CI/Test Board Aug 6, 2025
@yliaog
Copy link
Contributor Author

yliaog commented Aug 6, 2025

/retest

1 similar comment
@yliaog
Copy link
Contributor Author

yliaog commented Aug 6, 2025

/retest

@bart0sh
Copy link
Contributor

bart0sh commented Aug 7, 2025

/assign @SergeyKanzhelev
for SIG-Node approval

@@ -1910,6 +1910,22 @@ var _ = framework.SIGDescribe("node")(framework.WithLabel("DRA"), func() {
b := drautils.NewBuilder(f, driver)
b.UseExtendedResourceName = true

ginkgo.It("must run a pod with implicit extended resource with one container one resource", func(ctx context.Context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a test that will confirm that a pod with both definitions works as described in #133366

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, added, the test has one two resources, one implicit, one explicit, both are using the same device class.

no matter the explicit extendedResourceName field in device class is set or not.
@yliaog
Copy link
Contributor Author

yliaog commented Aug 7, 2025

/retest

6 similar comments
@yliaog
Copy link
Contributor Author

yliaog commented Aug 7, 2025

/retest

@yliaog
Copy link
Contributor Author

yliaog commented Aug 7, 2025

/retest

@yliaog
Copy link
Contributor Author

yliaog commented Aug 8, 2025

/retest

@yliaog
Copy link
Contributor Author

yliaog commented Aug 8, 2025

/retest

@yliaog
Copy link
Contributor Author

yliaog commented Aug 9, 2025

/retest

@yliaog
Copy link
Contributor Author

yliaog commented Aug 11, 2025

/retest

@k8s-ci-robot
Copy link
Contributor

@yliaog: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-dra-integration d961af4 link false /test pull-kubernetes-dra-integration
pull-kubernetes-kind-dra-n-1 d961af4 link false /test pull-kubernetes-kind-dra-n-1

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@yliaog
Copy link
Contributor Author

yliaog commented Aug 11, 2025

The test failure is due to the release, as seen from the error below. it is not related to the test, or code change.
[FAILED] FATAL ERROR: get https://dl.k8s.io/release/stable-1.34.txt: 404 - 404 Not Found

@SergeyKanzhelev PTAL

@johnbelamaric
Copy link
Member

The test failure is due to the release, as seen from the error below. it is not related to the test, or code change. [FAILED] FATAL ERROR: get https://dl.k8s.io/release/stable-1.34.txt: 404 - 404 Not Found

@SergeyKanzhelev PTAL

The upgrade/downgrade test will resolve when 1.34 is released. Right now in master it sees 1.35 and tries then to look for 1.34 to download, but it's not available yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: 👀 In review
Status: Archive-it
Development

Successfully merging this pull request may close these issues.

8 participants