-
Notifications
You must be signed in to change notification settings - Fork 41.1k
Open
Labels
kind/flakeCategorizes issue or PR as related to a flaky test.Categorizes issue or PR as related to a flaky test.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.release-blockerwg/device-managementCategorizes an issue or PR as relevant to WG Device Management.Categorizes an issue or PR as relevant to WG Device Management.
Milestone
Description
Which jobs are flaking?
https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kubernetes-unit/1950385628969439232
https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kubernetes-unit/1950340329899036672
https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kubernetes-unit-ppc64le/1950346621065629696
Which tests are flaking?
k8s.io/kubernetes/pkg/scheduler/framework/plugins/dynamicresources/dynamicresources_test.go: TestPlugin
Since when has it been flaking?
Triage tool shows the earliest failures on 7/30/2025, 8:09:29 AM as per Link.
Testgrid link
https://testgrid.k8s.io/sig-release-master-blocking#ci-kubernetes-unit
Reason for failure (if possible)
{Failed === RUN TestPlugin/extended-resource-name-with-resources-delete-claim/postfilter
dynamicresources_test.go:1837: Assumed claims are different (- expected, + actual):
[]v1.Object(
- nil,
+ {
+ s"&ResourceClaim{ObjectMeta:{my-pod-extended-resources-0 my-pod-extended-resources- default UID-0 2 0 0001-01-01 00:00:00 +0000 U"...,
+ },
)
--- FAIL: TestPlugin/extended-resource-name-with-resources-delete-claim/postfilter (0.00s)
=== RUN TestPlugin/extended-resource-name-with-resources-delete-claim
=== PAUSE TestPlugin/extended-resource-name-with-resources-delete-claim
=== CONT TestPlugin/extended-resource-name-with-resources-delete-claim
--- FAIL: TestPlugin/extended-resource-name-with-resources-delete-claim (0.11s)
=== RUN TestPlugin
--- FAIL: TestPlugin (0.00s)
}
Anything else we need to know?
We were able to reproduce this flake locally using stress tool.
Steps to reproduce:
- Install stress tool (# go install golang.org/x/tools/cmd/stress@latest).
- Clone the k8s repo (git clone https://github.com/kubernetes/kubernetes.git).
- cd kubernetes
- go test ./pkg/scheduler/framework/plugins/dynamicresources/ -c -count=1 -race
- stress ./dynamicresources.test -test.run TestPlugin
[root@varsharani1 kubernetes]# uname -a
Linux varsharani1.fyre.ibm.com 5.14.0-585.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Wed May 14 18:37:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
[root@varsharani1 kubernetes]# stress ./dynamicresources.test -test.run TestPlugin
5s: 0 runs so far, 0 failures, 8 active
10s: 0 runs so far, 0 failures, 8 active
/tmp/go-stress-20250730T022752-2012972044
I0730 02:27:52.583774 3949024 reflector.go:358] "Starting reflector" type="*v1.DeviceClass" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.583961 3949024 reflector.go:404] "Listing and watching" type="*v1.DeviceClass" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584153 3949024 reflector.go:358] "Starting reflector" type="*v1alpha3.DeviceTaintRule" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.583773 3949024 reflector.go:358] "Starting reflector" type="*v1.ResourceClaim" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584234 3949024 reflector.go:404] "Listing and watching" type="*v1alpha3.DeviceTaintRule" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.583868 3949024 reflector.go:358] "Starting reflector" type="*v1alpha3.DeviceTaintRule" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584333 3949024 reflector.go:404] "Listing and watching" type="*v1alpha3.DeviceTaintRule" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584547 3949024 reflector.go:358] "Starting reflector" type="*v1.DeviceClass" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584576 3949024 reflector.go:358] "Starting reflector" type="*v1.ResourceSlice" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584637 3949024 reflector.go:404] "Listing and watching" type="*v1.DeviceClass" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584744 3949024 reflector.go:358] "Starting reflector" type="*v1.ResourceClaim" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584797 3949024 reflector.go:404] "Listing and watching" type="*v1.ResourceClaim" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584873 3949024 reflector.go:358] "Starting reflector" type="*v1.ResourceSlice" resyncPeriod="0s"
…
15s: 8 runs so far, 1 failures (12.50%), 8 active
20s: 8 runs so far, 1 failures (12.50%), 8 active
[root@varsharani1 kubernetes]# cat /tmp/go-stress-20250730T022752-2012972044
I0730 02:27:52.583774 3949024 reflector.go:358] "Starting reflector" type="*v1.DeviceClass" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.583961 3949024 reflector.go:404] "Listing and watching" type="*v1.DeviceClass" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584153 3949024 reflector.go:358] "Starting reflector" type="*v1alpha3.DeviceTaintRule" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.583773 3949024 reflector.go:358] "Starting reflector" type="*v1.ResourceClaim" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584234 3949024 reflector.go:404] "Listing and watching" type="*v1alpha3.DeviceTaintRule" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.583868 3949024 reflector.go:358] "Starting reflector" type="*v1alpha3.DeviceTaintRule" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584333 3949024 reflector.go:404] "Listing and watching" type="*v1alpha3.DeviceTaintRule"
...............................
............................
.................................
I0730 02:28:03.418430 3949024 watch.go:218] "Stopping fake watcher"
I0730 02:28:03.418516 3949024 reflector.go:364] "Stopping reflector" type="*v1.DeviceClass" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:28:03.418566 3949024 reflector.go:364] "Stopping reflector" type="*v1.ResourceClaim" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:28:03.418566 3949024 reflector.go:364] "Stopping reflector" type="*v1.ResourceSlice" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:28:03.418384 3949024 watch.go:218] "Stopping fake watcher"
I0730 02:28:03.418717 3949024 reflector.go:364] "Stopping reflector" type="*v1alpha3.DeviceTaintRule" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
--- FAIL: TestPlugin (0.00s)
--- FAIL: TestPlugin/extended-resource-name-with-resources (0.13s)
tracker.go:499: I0730 02:27:53.485716] DeviceClass add class="my-resource-class"
tracker.go:390: I0730 02:27:53.485797] ResourceSlice add slice="worker-some-driver"
tracker.go:556: I0730 02:27:53.485866] syncing ResourceSlice resourceslice="worker-some-driver"
tracker.go:624: I0730 02:27:53.485959] ResourceSlice synced resourceslice="worker-some-driver"
tracker.go:556: I0730 02:27:53.485999] syncing ResourceSlice resourceslice="worker-some-driver"
tracker.go:624: I0730 02:27:53.486050] ResourceSlice synced resourceslice="worker-some-driver"
dynamicresources.go:561: I0730 02:27:53.588311] pod resource claims pod="default/my-pod" resourceclaims=[]
dynamicresources.go:648: I0730 02:27:53.588808] Preparing allocation with structured parameters pod="default/my-pod" resourceclaims=["default/<extended-resources>"]
allocator_stable.go:111: I0730 02:27:53.589524] Starting allocation node="worker" numClaims=1
allocator_stable.go:123: I0730 02:27:53.589621] Gathered pool information node="worker" numPools=1
allocator_stable.go:357: I0730 02:27:53.589711] Done with allocation node="worker" success=false err=<nil>: nil
dynamicresources.go:1184: I0730 02:27:53.590200] Reserved resource in allocation result claim="default/<extended-resources>" allocation=<
{
"devices": {
"results": [
{
"request": "container-0-request-0",
"driver": "some-driver",
"pool": "worker",
"device": "instance-1"
}
]
},
"nodeSelector": {
"nodeSelectorTerms": [
{
"matchFields": [
{
"key": "metadata.name",
"operator": "In",
"values": [
"worker"
]
}
]
}
]
}
},
"reservedFor": [
{
"resource": "pods",
"name": "my-pod",
"uid": "1234"
}
]
}
}
>
dynamicresources.go:1450: I0730 02:27:53.595551] Claim not stored in assume cache err=<nil>: nil
--- FAIL: TestPlugin/extended-resource-name-with-resources/prebind (0.01s)
dynamicresources_test.go:1824: Assumed claims are different (- expected, + actual):
[]v1.Object(
- {
- s"&ResourceClaim{ObjectMeta:{my-pod-extended-resources-0 my-pod-extended-resources- default 0 0001-01-01 00:00:00 +0000 UTC <ni"...,
- },
+ nil,
)
FAIL
ERROR: exit status 1
Relevant SIG(s)
Metadata
Metadata
Assignees
Labels
kind/flakeCategorizes issue or PR as related to a flaky test.Categorizes issue or PR as related to a flaky test.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.Indicates an issue or PR lacks a `triage/foo` label and requires one.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.release-blockerwg/device-managementCategorizes an issue or PR as relevant to WG Device Management.Categorizes an issue or PR as relevant to WG Device Management.
Type
Projects
Status
FLAKY
Status
🆕 New