-
Notifications
You must be signed in to change notification settings - Fork 41.1k
WIP: check invariant metrics after e2e tests #133394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: BenTheElder The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
for _, sample := range samples { | ||
// TODO: this logic is intentionally inverted to showcase the results of a failure | ||
// It should be > 0 instead | ||
if sample.Value <= 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will assume invariant means is always 0 that I think is correct, but I just wanted to know if we cover all use cases with this assumption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YAGNI?
the two we have are "should remain 0", but if a new one is added that requires a different check we can review a refactor.
I want to keep the implementation as simple as possible to avoid flakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This is also a dummy currently for testing purposes per the comment, will follow up on this particular line)
} | ||
apiserverMetrics, err := grabber.GrabFromAPIServer(ctx) | ||
if err != nil { | ||
framework.Failf("error grabbing api-server metrics: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we fail if we do not get the invariants or just log ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually already have default tests that require fetching metrics with the grabber including apiserver specifically and they seem to be reliable, it is a very simple API call.
I think it is better to fail and detect this unless it is proven to be unreliable. In which case we can either switch to logging or mark the test as flaky until it is improved.
But I don't see any reason for this to fail other than an unhealthy API server.
If you mean should we fail if the specific metrics are missing in the response, I think so as well, that would imply a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an explicit check for if we were able to read the metric being tested versus the empty case for samples.
b2874c0
to
65ee329
Compare
So this intentionally has the wrong logic for the metric value currently to ensure that it will fail and surface the results in CI: ![]() We can also see that it does NOT fail in the jobs specifically selecting conformance, because the "test" that enables checking the metrics at the end is skipped and checking the metrics is therefore skipped: When all the tests have completed, and the right jobs have passed and failed as expected, and we're otherwise happy with the shape, I'll fix the actual metric check line to be correct (there is a TODO line here #133394 (comment)) |
[Invoking additional optional jobs for clarity] /test pull-kubernetes-conformance-kind-ga-only |
# special checks that run after the entire suite | ||
# must be specially reviewed | ||
"e2e_invariants.go": | ||
approvers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this guarantee that top level approver are not able to approve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but this config guarantees auto assignment will go to this group, and there are comments throughout that you must ask.
if we want to enforce that we can refactor a bit to put the key details in a subdirectory and use no_parent_owners.
We already have no_parent_owners at the test/ level but there are a lot of people listed there.
Anyone there could already approve arbitrary changes to the test binary to add something like this, so that level of paranoia seemed unnecessary.
// this allows us to run it by default in most jobs, but it can be opted-out, | ||
// does not run when selecting Conformance, and it can be tagged Flaky | ||
// if we encounter issues with it | ||
ginkgo.It(invariantsLeafText, func() {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also use label for filters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add it as needed, but I didn't see a need yet. Did you have any in mind?
My goal is that it should run by default in most jobs, but not conformance (see which jobs just passed and failed with the check being rigged to fail currently), but it can be labeled flaky if we find problems, and any job can also filter it out at least by SKIP=invariants.
Will debug those verify checks. Otherwise we see that the conformance jobs didn't run the actual metrics grabbing and the other jobs did. We can also see no flakes so far, just the intentional failure at the end with the temporarily rigged inverted metric value assertion. |
The verify failures required adjusting to detect dry-run mode and skip the invariant metrics check in that case. That's fixed now. |
@BenTheElder: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR is related to:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: