Skip to content

node: unblock e2e serial lanes #133353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 6, 2025

Conversation

ffromani
Copy link
Contributor

@ffromani ffromani commented Aug 1, 2025

What type of PR is this?

/kind bug
/kind failing-test

What this PR does / why we need it:

sig-node serial lanes are currently severely broken. We need to quickly restore signal.
Because of time pressure, disable problematic tests to recover signal.
By all means we will need to fix these tests once signal is restored, so this is a band-aid PR meant to fix the worst emergency, not the complete solution

Which issue(s) this PR is related to:

#133314

Special notes for your reviewer:

Will track long-term fixes editing this message

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 1, 2025
@ffromani
Copy link
Contributor Author

ffromani commented Aug 1, 2025

/test pull-kubernetes-node-kubelet-serial-containerd

@k8s-ci-robot k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. labels Aug 1, 2025
@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Aug 1, 2025
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 1, 2025
@ffromani
Copy link
Contributor Author

ffromani commented Aug 1, 2025

/test pull-kubernetes-e2e-gce

@ffromani
Copy link
Contributor Author

ffromani commented Aug 1, 2025

/test pull-kubernetes-e2e-gce

unrelated failure (flaky tests? can't look into them now)

@ffromani
Copy link
Contributor Author

ffromani commented Aug 1, 2025

/test pull-kubernetes-node-swap-fedora-serial

Skip problematic tests to recover signal, then we will
reintroduce them gradually

See: kubernetes#133314
See: kubernetes#133336

Signed-off-by: Francesco Romani <fromani@redhat.com>
@ffromani ffromani force-pushed the e2e-node-serial-unblock branch from daa674e to aca402f Compare August 1, 2025 11:59
@ffromani
Copy link
Contributor Author

ffromani commented Aug 1, 2025

/test pull-kubernetes-node-kubelet-serial-containerd

@ffromani
Copy link
Contributor Author

ffromani commented Aug 1, 2025

IIRC we recently switched machines for CI. This can be related, the memory manager settings are quite highly machine dependant. Let's see the new CI run

@ffromani
Copy link
Contributor Author

ffromani commented Aug 1, 2025

/test pull-kubernetes-node-swap-fedora-serial

@ffromani
Copy link
Contributor Author

ffromani commented Aug 1, 2025

/test pull-kubernetes-unit

@kannon92
Copy link
Contributor

kannon92 commented Aug 5, 2025

/triage accepted
/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 5, 2025
@kannon92
Copy link
Contributor

kannon92 commented Aug 5, 2025

/milestone v1.34

@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Aug 5, 2025
@kannon92
Copy link
Contributor

kannon92 commented Aug 5, 2025

Can you please add a # to the GitHub issue? It helps to link with the issue

@kannon92
Copy link
Contributor

kannon92 commented Aug 5, 2025

#133314

@ffromani
Copy link
Contributor Author

ffromani commented Aug 5, 2025

/test pull-kubernetes-node-swap-conformance-fedora-serial
/test pull-kubernetes-node-swap-ubuntu-serial
/test pull-kubernetes-node-kubelet-serial-containerd

were other fixes merged elsewhere? The lanes seems to be improving

@kannon92
Copy link
Contributor

kannon92 commented Aug 5, 2025

cc @SergeyKanzhelev @haircommander @mrunalp

It would be worth getting this in to help gain signal again in our serial test lanes.

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@kubernetes/release-managers any objections to merge this while we are investigating? We are loosing a lot of test signals because of these failures

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani, SergeyKanzhelev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@SergeyKanzhelev
Copy link
Member

It would be worth getting this in to help gain signal again in our serial test lanes.

Presubmits are still failing even with this skip it looks like

@kannon92
Copy link
Contributor

kannon92 commented Aug 5, 2025

It would be worth getting this in to help gain signal again in our serial test lanes.

Presubmits are still failing even with this skip it looks like

Yea, unforunely our serial tests have been failing for so long and we may have regressed. :(

@kannon92
Copy link
Contributor

kannon92 commented Aug 5, 2025

image

As you can see, this job has been failing for at least 3 weeks. So I still think we should merge this fix and then we can start looking into the other failures.

@ffromani
Copy link
Contributor Author

ffromani commented Aug 6, 2025

image

As you can see, this job has been failing for at least 3 weeks. So I still think we should merge this fix and then we can start looking into the other failures.

I agree. Restoring some signal is better than no signal. I'm also working on long term fixes (no skips, no hacks) here: #133336

@SergeyKanzhelev
Copy link
Member

/milestone v1.34
/unhole
/skip

These tests are making things very bad. We need a signal

@ffromani
Copy link
Contributor Author

ffromani commented Aug 6, 2025

/unhold

typo in #133353 (comment)

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 6, 2025
@natasha41575 natasha41575 moved this from Triage to PRs - Needs Reviewer in SIG Node CI/Test Board Aug 6, 2025
@natasha41575 natasha41575 moved this from PRs - Needs Reviewer to PRs - Needs Approver in SIG Node CI/Test Board Aug 6, 2025
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Aug 6, 2025

@ffromani: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-node-swap-fedora-serial aca402f link false /test pull-kubernetes-node-swap-fedora-serial
pull-kubernetes-node-kubelet-serial-containerd aca402f link false /test pull-kubernetes-node-kubelet-serial-containerd
pull-kubernetes-node-swap-ubuntu-serial aca402f link false /test pull-kubernetes-node-swap-ubuntu-serial

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@SergeyKanzhelev
Copy link
Member

/skip
/retest

@k8s-ci-robot k8s-ci-robot merged commit 9d3fff5 into kubernetes:master Aug 6, 2025
17 checks passed
@github-project-automation github-project-automation bot moved this from PRs - Needs Approver to Done in SIG Node CI/Test Board Aug 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Development

Successfully merging this pull request may close these issues.

5 participants