Skip to content

reduced numPods to 5 from 10 to fix flaky test (supports reusing resources) due to timeout #133397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 8, 2025

Conversation

yliaog
Copy link
Contributor

@yliaog yliaog commented Aug 6, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

reduced numPods to 5 from 10 to fix flaky test (supports reusing resources) due to timeout

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. wg/device-management Categorizes an issue or PR as relevant to WG Device Management. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 6, 2025
@k8s-ci-robot k8s-ci-robot requested review from bart0sh and pohly August 6, 2025 07:46
@BenTheElder
Copy link
Member

We should identify the presubmit jobs this runs in and invoke them additional times after they complete with /test $job to confirm that 5 is suitable. (I suspect it is to deflake and that this change makes sense, unless there's a reason we needed 10 for test purposes)

For the purposes of conformance, we should also still try to pick another test that is already proven to be stable, given the release is in ~3 weeks we do not have time to get 2 weeks of non-flaky results for this test to prove it is stable.

Waiting for John & Patrick's feedback on the former, will continue that discussion in #133132.

Thanks for working on fixing the flaky 🙏

@BenTheElder
Copy link
Member

/cc @dims

IMHO this should be in scope for milestone as a test deflake, once sufficiently tested, but holding off for now.

@k8s-ci-robot k8s-ci-robot requested a review from dims August 6, 2025 08:30
@yliaog
Copy link
Contributor Author

yliaog commented Aug 6, 2025

/test pull-kubernetes-kind-dra-all

@yliaog
Copy link
Contributor Author

yliaog commented Aug 6, 2025

/retest

@yliaog
Copy link
Contributor Author

yliaog commented Aug 6, 2025

the picked test to promote below is not flaky.
Kubernetes e2e suite.[It] [sig-node] [DRA] control plane [ConformanceCandidate] supports claim and class parameters

the flaky test below is not promoted in PR #133132
Kubernetes e2e suite.[It] [sig-node] [DRA] control plane [ConformanceCandidate] supports reusing resources

@yliaog
Copy link
Contributor Author

yliaog commented Aug 6, 2025

/retest

@BenTheElder
Copy link
Member

the picked test to promote below is not flaky.

yes, discussing the other aspects back in that PR.


pod timeout in a different test:

Kubernetes e2e suite: [It] [sig-node] [DRA] [FeatureGate:DRAExtendedResource] [Alpha] [Feature:OffByDefault] must run pods with extended resource on dra nodes and device plugin nodes [Serial] ex | Kubernetes e2e suite: [It] [sig-node] [DRA] [FeatureGate:DRAExtendedResource] [Alpha] [Feature:OffByDefault] must run pods with extended resource on dra nodes and device plugin nodes [Serial] 

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/133397/pull-kubernetes-kind-dra-all/1953115552314560512


Let's make sure this change isn't flaky by testing it multiple times please, we usually do this with flake fixes, especially ones with arbitrary constants.

/test all

@BenTheElder
Copy link
Member

Kubernetes e2e suite: [It] [sig-node] [DRA] ResourceSlice Controller creates slices [ConformanceCandidate] 

{ failed [FAILED] Failed after 0.000s.
Expected
    <resourceslice.Stats>: {NumCreates: 100, NumUpdates: 0, NumDeletes: 100}
to equal
    <resourceslice.Stats>: {NumCreates: 101, NumUpdates: 0, NumDeletes: 100}
In [It] at: k8s.io/kubernetes/test/e2e/dra/dra.go:2170 @ 08/06/25 17:18:16.436
}

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/133397/pull-kubernetes-e2e-gce/1953137105446113280

/retest

@liggitt
Copy link
Member

liggitt commented Aug 7, 2025

just saw this flake in https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/133431/pull-kubernetes-e2e-gce/1953575194064850944 ... is this fix intending to land for 1.34?

@yliaog
Copy link
Contributor Author

yliaog commented Aug 7, 2025

Yes, it is intended for 1.34. It is a small fix.

@BenTheElder
Copy link
Member

/triage accepted
/lgtm
/approve
/milestone v1.34

@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Aug 8, 2025
@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 8, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 8, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f6a497357e01e420f5748b4a94043b0f5e1916a6

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BenTheElder, yliaog

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 8, 2025
@BenTheElder
Copy link
Member

[It] [sig-node] Pods Extended Pod Container Status should never report container start when an init container fails

https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/133397/pull-kubernetes-e2e-gce/1953652309154074624

/retest

@k8s-ci-robot k8s-ci-robot merged commit 1b7557e into kubernetes:master Aug 8, 2025
19 checks passed
@github-project-automation github-project-automation bot moved this from Triage to Done in SIG Node CI/Test Board Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.
Projects
Status: 🆕 New
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants