Skip to content

[Flaking Test] UT k8s.io/kubernetes/pkg/scheduler/framework/plugins: dynamicresources #133302

@varshadeshmane92

Description

@varshadeshmane92

Which jobs are flaking?

https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kubernetes-unit/1950385628969439232
https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kubernetes-unit/1950340329899036672
https://prow.k8s.io/view/gs/kubernetes-ci-logs/logs/ci-kubernetes-unit-ppc64le/1950346621065629696

Which tests are flaking?

k8s.io/kubernetes/pkg/scheduler/framework/plugins/dynamicresources/dynamicresources_test.go: TestPlugin

Since when has it been flaking?

Triage tool shows the earliest failures on 7/30/2025, 8:09:29 AM as per Link.

Testgrid link

https://testgrid.k8s.io/sig-release-master-blocking#ci-kubernetes-unit

Reason for failure (if possible)

{Failed  === RUN   TestPlugin/extended-resource-name-with-resources-delete-claim/postfilter
    dynamicresources_test.go:1837: Assumed claims are different (- expected, + actual):
          []v1.Object(
        - 	nil,
        + 	{
        + 		s"&ResourceClaim{ObjectMeta:{my-pod-extended-resources-0 my-pod-extended-resources- default  UID-0 2 0 0001-01-01 00:00:00 +0000 U"...,
        + 	},
          )
--- FAIL: TestPlugin/extended-resource-name-with-resources-delete-claim/postfilter (0.00s)

=== RUN   TestPlugin/extended-resource-name-with-resources-delete-claim
=== PAUSE TestPlugin/extended-resource-name-with-resources-delete-claim
=== CONT  TestPlugin/extended-resource-name-with-resources-delete-claim
--- FAIL: TestPlugin/extended-resource-name-with-resources-delete-claim (0.11s)

=== RUN   TestPlugin
--- FAIL: TestPlugin (0.00s)
}

Anything else we need to know?

We were able to reproduce this flake locally using stress tool.
Steps to reproduce:

  1. Install stress tool (# go install golang.org/x/tools/cmd/stress@latest).
  2. Clone the k8s repo (git clone https://github.com/kubernetes/kubernetes.git).
  3. cd kubernetes
  4. go test ./pkg/scheduler/framework/plugins/dynamicresources/ -c -count=1 -race
  5. stress ./dynamicresources.test -test.run TestPlugin
[root@varsharani1 kubernetes]# uname -a
Linux varsharani1.fyre.ibm.com 5.14.0-585.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Wed May 14 18:37:27 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

[root@varsharani1 kubernetes]# stress ./dynamicresources.test -test.run TestPlugin
5s: 0 runs so far, 0 failures, 8 active
10s: 0 runs so far, 0 failures, 8 active

/tmp/go-stress-20250730T022752-2012972044
I0730 02:27:52.583774 3949024 reflector.go:358] "Starting reflector" type="*v1.DeviceClass" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.583961 3949024 reflector.go:404] "Listing and watching" type="*v1.DeviceClass" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584153 3949024 reflector.go:358] "Starting reflector" type="*v1alpha3.DeviceTaintRule" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.583773 3949024 reflector.go:358] "Starting reflector" type="*v1.ResourceClaim" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584234 3949024 reflector.go:404] "Listing and watching" type="*v1alpha3.DeviceTaintRule" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.583868 3949024 reflector.go:358] "Starting reflector" type="*v1alpha3.DeviceTaintRule" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584333 3949024 reflector.go:404] "Listing and watching" type="*v1alpha3.DeviceTaintRule" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584547 3949024 reflector.go:358] "Starting reflector" type="*v1.DeviceClass" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584576 3949024 reflector.go:358] "Starting reflector" type="*v1.ResourceSlice" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584637 3949024 reflector.go:404] "Listing and watching" type="*v1.DeviceClass" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584744 3949024 reflector.go:358] "Starting reflector" type="*v1.ResourceClaim" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584797 3949024 reflector.go:404] "Listing and watching" type="*v1.ResourceClaim" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584873 3949024 reflector.go:358] "Starting reflector" type="*v1.ResourceSlice" resyncPeriod="0s"
…
15s: 8 runs so far, 1 failures (12.50%), 8 active
20s: 8 runs so far, 1 failures (12.50%), 8 active
[root@varsharani1 kubernetes]# cat /tmp/go-stress-20250730T022752-2012972044
I0730 02:27:52.583774 3949024 reflector.go:358] "Starting reflector" type="*v1.DeviceClass" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.583961 3949024 reflector.go:404] "Listing and watching" type="*v1.DeviceClass" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584153 3949024 reflector.go:358] "Starting reflector" type="*v1alpha3.DeviceTaintRule" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.583773 3949024 reflector.go:358] "Starting reflector" type="*v1.ResourceClaim" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584234 3949024 reflector.go:404] "Listing and watching" type="*v1alpha3.DeviceTaintRule" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.583868 3949024 reflector.go:358] "Starting reflector" type="*v1alpha3.DeviceTaintRule" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:27:52.584333 3949024 reflector.go:404] "Listing and watching" type="*v1alpha3.DeviceTaintRule" 
...............................
............................
.................................
I0730 02:28:03.418430 3949024 watch.go:218] "Stopping fake watcher"
I0730 02:28:03.418516 3949024 reflector.go:364] "Stopping reflector" type="*v1.DeviceClass" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:28:03.418566 3949024 reflector.go:364] "Stopping reflector" type="*v1.ResourceClaim" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:28:03.418566 3949024 reflector.go:364] "Stopping reflector" type="*v1.ResourceSlice" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
I0730 02:28:03.418384 3949024 watch.go:218] "Stopping fake watcher"
I0730 02:28:03.418717 3949024 reflector.go:364] "Stopping reflector" type="*v1alpha3.DeviceTaintRule" resyncPeriod="0s" reflector="k8s.io/client-go/informers/factory.go:160"
--- FAIL: TestPlugin (0.00s)
    --- FAIL: TestPlugin/extended-resource-name-with-resources (0.13s)
        tracker.go:499: I0730 02:27:53.485716] DeviceClass add class="my-resource-class"
        tracker.go:390: I0730 02:27:53.485797] ResourceSlice add slice="worker-some-driver"
        tracker.go:556: I0730 02:27:53.485866] syncing ResourceSlice resourceslice="worker-some-driver"
        tracker.go:624: I0730 02:27:53.485959] ResourceSlice synced resourceslice="worker-some-driver"
        tracker.go:556: I0730 02:27:53.485999] syncing ResourceSlice resourceslice="worker-some-driver"
        tracker.go:624: I0730 02:27:53.486050] ResourceSlice synced resourceslice="worker-some-driver"
        dynamicresources.go:561: I0730 02:27:53.588311] pod resource claims pod="default/my-pod" resourceclaims=[]
        dynamicresources.go:648: I0730 02:27:53.588808] Preparing allocation with structured parameters pod="default/my-pod" resourceclaims=["default/<extended-resources>"]
        allocator_stable.go:111: I0730 02:27:53.589524] Starting allocation node="worker" numClaims=1
        allocator_stable.go:123: I0730 02:27:53.589621] Gathered pool information node="worker" numPools=1
        allocator_stable.go:357: I0730 02:27:53.589711] Done with allocation node="worker" success=false err=<nil>: nil
        dynamicresources.go:1184: I0730 02:27:53.590200] Reserved resource in allocation result claim="default/<extended-resources>" allocation=<
                {
                  "devices": {
                    "results": [
                      {
                        "request": "container-0-request-0",
                        "driver": "some-driver",
                        "pool": "worker",
                        "device": "instance-1"
                      }
                    ]
                  },
                  "nodeSelector": {
                    "nodeSelectorTerms": [
                      {
                        "matchFields": [
                          {
                            "key": "metadata.name",
                            "operator": "In",
                            "values": [
                              "worker"
                            ]
                               }
                            ]
                          }
                        ]
                      }
                    },
                    "reservedFor": [
                      {
                        "resource": "pods",
                        "name": "my-pod",
                        "uid": "1234"
                      }
                    ]
                  }
                }
             >
        dynamicresources.go:1450: I0730 02:27:53.595551] Claim not stored in assume cache err=<nil>: nil
        --- FAIL: TestPlugin/extended-resource-name-with-resources/prebind (0.01s)
            dynamicresources_test.go:1824: Assumed claims are different (- expected, + actual):
                  []v1.Object(
                -       {
                -               s"&ResourceClaim{ObjectMeta:{my-pod-extended-resources-0 my-pod-extended-resources- default    0 0001-01-01 00:00:00 +0000 UTC <ni"...,
                -       },
                +       nil,
                  )
FAIL


ERROR: exit status 1

                        

Relevant SIG(s)

Metadata

Metadata

Assignees

Labels

kind/flakeCategorizes issue or PR as related to a flaky test.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.release-blockerwg/device-managementCategorizes an issue or PR as relevant to WG Device Management.

Type

No type

Projects

Status

FLAKY

Status

🆕 New

Relationships

None yet

Development

No branches or pull requests

Issue actions