Skip to content

test: Close response body in watch tests #131706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 15, 2025

Conversation

karlkfi
Copy link
Contributor

@karlkfi karlkfi commented May 9, 2025

What type of PR is this?

/kind cleanup
/kind flake

What this PR does / why we need it:

  • Close response body after http calls in the watch tests
  • Add apitesting utility methods for closing and testing errors from response body and websocket.
  • Validate read after close errors.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/flake Categorizes issue or PR as related to a flaky test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels May 9, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label May 9, 2025
@k8s-ci-robot k8s-ci-robot added area/apiserver sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 9, 2025
- Close response body after http calls in the watch tests
- Add apitesting utility methods for closing and testing errors
  from response body and websocket.
- Validate read after close errors.
@karlkfi
Copy link
Contributor Author

karlkfi commented May 9, 2025

/assign @wojtek-t

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were the changes in this file just for hygiene or were they actually fixing flakes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many of these test cases, the body was not being closed at all, but the test wasn't failing, because the connection wasn't being re-used.

However, there are a number of tests that loop through multiple cases (like TestNotFound) sharing the same client & server without using sub-tests. In these cases, the defer calls were only getting called after all the cases passed or on fatal error. When you fail to close the response body, the connection cannot be re-used, so it opens a new connection. Closing the response body on the client tells the server it can stop serving, which should trigger the server handle method defers to clean up server-side resources, like storage watches.

So these changes primarily improve hygiene, making the tests more similar to ideal usage patterns, but they also slightly improve efficiency of the tests. More importantly from my perspective, making sure the storage watches are closed is a step towards unblocking cleaning up the fake watcher implementations later, which are a bit of a mess. I pulled this out of a set of larger changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, cleanup is totally fine, this was just marked as /kind flake so I was trying to figure out which line was flaking and what the fix was

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question on this file, just general cleanup or fixing specific flakes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is just cleanup, same as apiserver_test.go.

@@ -156,6 +158,11 @@ func TestWatchWebsocketClientClose(t *testing.T) {
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
// Ensure the websocket is closed without error
closeWebSocket := apitesting.Close
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only way this would get called is if the test already failed, right? just trying to understand if this change is cleanup or fixing a flake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only way this would get called is if the test already failed, right?

Correct. Closing just cleans up the client and server resources.

But I also added a check here to make sure the first ws.Close doesn't return an error and then updated the expectation after websocket.JSON.Receive to confirm that it errors with the expected error and not some unexpected one.

@@ -353,7 +367,10 @@ func TestWatchRead(t *testing.T) {
streamSerializer := info.StreamSerializer

r, contentType := protocol.fn(test.Accept)
defer r.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this silently erroring before because of a double-close (but we weren't checking the error result)? trying to understand what was actually flaking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing an http response body twice should not error, however closing a websocket twice DOES error, at least when using the golang.org/x/net/websocket library.

So I changed TestWatchRead to make sure Close was only called once, similar to how I changed TestWatchWebsocketClientClose.

// Websockets error if closed twice.
// So disable the Body.Close and use Decoder.Close instead.
closeBody = apitesting.CloseNoOp
defer apitesting.Close(t, d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is adding a new Close() call where we didn't have one before? was nothing ever closing the decoder / serializer before, we were just tearing down the overall protocol readcloser before?

this test loops over websocket and non-websocket protocols... is it correct to do things that only apply to websockets like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is adding a new Close() call where we didn't have one before? was nothing ever closing the decoder / serializer before, we were just tearing down the overall protocol readcloser before?

Correct.

If the test was using the decoder after closing the response body or websocket, it would have errored (each of those cases produces a different error), but it's not. So the decoder should just get garbage collected when the test ended, without doing whatever internal cleanup it was supposed to do on Close and without checking the return error.

For the specific streaming decoder being used here, its Close method just closes the reader (response body). So not calling decoder.Close wasn't actually producing any side effects, because the response body was already eventually being closed.

So effectively, this just updates the tests to demonstrate how you should actually be using the decoder, without knowing exactly what needs to be cleaned up underneath.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test loops over websocket and non-websocket protocols... is it correct to do things that only apply to websockets like this?

The decoder should be closed whether it uses http or websockets. And if the decoder is closed, then there's no need to close the response body directly. However, closing both only errors for websockets, because the underlying Reader errors if closed twice.

IMO, the decoder probably shouldn't close the response body on close, but removing that might cause problems in other cases that close the decoder and not the response body.

@karlkfi
Copy link
Contributor Author

karlkfi commented May 14, 2025

@liggitt I don't know of any specific existing test flakes that this PR fixes, however cleaning up the response body and websocket should help avoid flakes in the future, and help ensure correctness as I make future refactors to the client and server watch code. I want to make sure the client and storage watches are being correctly closed, which requires the response/websocket to be closed first.

@liggitt
Copy link
Member

liggitt commented May 14, 2025

@liggitt I don't know of any specific existing test flakes that this PR fixes, however cleaning up the response body and websocket should help avoid flakes in the future, and help ensure correctness as I make future refactors to the client and server watch code. I want to make sure the client and storage watches are being correctly closed, which requires the response/websocket to be closed first.

ok, if it's not fixing an existing flake, let's drop that label for clarity
/remove-kind flake
/kind cleanup

Would you mind running the modified tests with -race / stress just to double check these changes don't introduce any issues?

go install golang.org/x/tools/cmd/stress@latest
go test -race -c $path/to/pkg
stress ./$testbinary -test.run TestName

@k8s-ci-robot k8s-ci-robot removed the kind/flake Categorizes issue or PR as related to a flaky test. label May 14, 2025
@karlkfi
Copy link
Contributor Author

karlkfi commented May 14, 2025

staging/src/k8s.io/apiserver/pkg/endpoints/apiserver_test.go

$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestNotFound
5s: 48 runs so far, 0 failures, 16 active
10s: 113 runs so far, 0 failures, 16 active
15s: 178 runs so far, 0 failures, 16 active
16s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestUnimplementedRESTStorage
5s: 64 runs so far, 0 failures, 16 active
10s: 130 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestSomeUnimplementedRESTStorage
5s: 64 runs so far, 0 failures, 16 active
10s: 130 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestList
5s: 32 runs so far, 0 failures, 16 active
10s: 96 runs so far, 0 failures, 16 active
15s: 150 runs so far, 0 failures, 16 active
19s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestRequestsWithInvalidQuery
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestListCompression
5s: 56 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
15s: 194 runs so far, 0 failures, 6 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestLogs
5s: 64 runs so far, 0 failures, 16 active
10s: 130 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestErrorList
5s: 64 runs so far, 0 failures, 16 active
10s: 131 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestNonEmptyList
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
15s: 199 runs so far, 0 failures, 1 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestGet
5s: 48 runs so far, 0 failures, 16 active
10s: 98 runs so far, 0 failures, 16 active
15s: 150 runs so far, 0 failures, 16 active
20s: 198 runs so far, 0 failures, 2 active
20s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestGetCompression
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
15s: 199 runs so far, 0 failures, 1 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestWatchTable
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
15s: 198 runs so far, 0 failures, 2 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestGetPartialObjectMetadata
5s: 64 runs so far, 0 failures, 16 active
10s: 129 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestGetBinary
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestGetWithOptions
5s: 48 runs so far, 0 failures, 16 active
10s: 118 runs so far, 0 failures, 16 active
15s: 192 runs so far, 0 failures, 8 active
16s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestGetMissing
5s: 64 runs so far, 0 failures, 16 active
10s: 130 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestGetRetryAfter
5s: 64 runs so far, 0 failures, 16 active
10s: 129 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestConnect
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
15s: 194 runs so far, 0 failures, 6 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestConnectResponderObject
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
15s: 196 runs so far, 0 failures, 4 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestConnectResponderError
5s: 64 runs so far, 0 failures, 16 active
10s: 129 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestConnectWithOptions
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
15s: 195 runs so far, 0 failures, 5 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestConnectWithOptionsAndPath
5s: 64 runs so far, 0 failures, 16 active
10s: 129 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestDelete
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
15s: 198 runs so far, 0 failures, 2 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestDeleteWithOptions
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestDeleteWithOptionsQuery
5s: 64 runs so far, 0 failures, 16 active
10s: 129 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestDeleteWithOptionsQueryAndBody
5s: 65 runs so far, 0 failures, 16 active
10s: 130 runs so far, 0 failures, 16 active
15s: 195 runs so far, 0 failures, 5 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestDeleteInvokesAdmissionControl
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestDeleteMissing
5s: 64 runs so far, 0 failures, 16 active
10s: 129 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestUpdate
5s: 48 runs so far, 0 failures, 16 active
10s: 119 runs so far, 0 failures, 16 active
15s: 192 runs so far, 0 failures, 8 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestUpdateInvokesAdmissionControl
5s: 64 runs so far, 0 failures, 16 active
10s: 129 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestUpdateRequiresMatchingName
5s: 64 runs so far, 0 failures, 16 active
10s: 131 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestUpdateAllowsMissingNamespace
5s: 64 runs so far, 0 failures, 16 active
10s: 130 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestUpdateDisallowsMismatchedNamespaceOnError
5s: 64 runs so far, 0 failures, 16 active
10s: 129 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestUpdatePreventsMismatchedNamespace
5s: 64 runs so far, 0 failures, 16 active
10s: 129 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestUpdateMissing
5s: 64 runs so far, 0 failures, 16 active
10s: 131 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestCreateNotFound
5s: 64 runs so far, 0 failures, 16 active
10s: 129 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestCreateChecksDecode
5s: 64 runs so far, 0 failures, 16 active
10s: 129 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestNamedCreaterWithName
5s: 64 runs so far, 0 failures, 16 active
10s: 131 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestNamedCreaterWithoutName
5s: 64 runs so far, 0 failures, 16 active
10s: 130 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestNamedCreaterWithGenerateName
5s: 64 runs so far, 0 failures, 16 active
10s: 130 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestUpdateChecksDecode
5s: 65 runs so far, 0 failures, 16 active
10s: 130 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestCreate
5s: 48 runs so far, 0 failures, 16 active
10s: 113 runs so far, 0 failures, 16 active
15s: 180 runs so far, 0 failures, 16 active
16s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestCreateYAML
5s: 64 runs so far, 0 failures, 16 active
10s: 131 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestCreateInNamespace
5s: 64 runs so far, 0 failures, 16 active
10s: 129 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestCreateInvokeAdmissionControl
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
15s: 197 runs so far, 0 failures, 3 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestCreateChecksAPIVersion
5s: 64 runs so far, 0 failures, 16 active
10s: 130 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestCreateDefaultsAPIVersion
5s: 64 runs so far, 0 failures, 16 active
10s: 131 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestUpdateChecksAPIVersion
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestXGSubresource
5s: 64 runs so far, 0 failures, 16 active
10s: 130 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestFieldValidation
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
15s: 194 runs so far, 0 failures, 6 active
15s: 200 runs total, 0 failures

staging/src/k8s.io/apiserver/pkg/endpoints/watch_test.go

$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestWatchWebsocket
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
15s: 197 runs so far, 0 failures, 3 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestWatchWebsocketClientClose
5s: 64 runs so far, 0 failures, 16 active
10s: 131 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestWatchClientClose
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestWatchRead
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
15s: 197 runs so far, 0 failures, 3 active
15s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./endpoints.test -test.run TestWatchParamParsing
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures

staging/src/k8s.io/apiserver/pkg/endpoints/handlers/watch_test.go

$ ~/go/bin/stress -count 200 ./handlers.test -test.run TestWatchHTTPErrors
5s: 64 runs so far, 0 failures, 16 active
10s: 128 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./handlers.test -test.run TestWatchHTTPErrorsBeforeServe
5s: 64 runs so far, 0 failures, 16 active
10s: 133 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures
$ ~/go/bin/stress -count 200 ./handlers.test -test.run TestWatchHTTPTimeout
5s: 64 runs so far, 0 failures, 16 active
10s: 132 runs so far, 0 failures, 16 active
14s: 200 runs total, 0 failures

@karlkfi
Copy link
Contributor Author

karlkfi commented May 14, 2025

Endpoint Benchmarks

  • BenchmarkGet - 1.159s -> 1.130s
  • BenchmarkGetNoCompression - 1.175s -> 1.075s
  • BenchmarkUpdateProtobuf - 1.278s -> 1.101s
  • BenchmarkWatchWebsocket - 1.482s -> 1.421s

Summary

None of these changes significantly impact the speed of the existing benchmarks. This isn't surprising, because the benchmarks today don't use sub-tests or function closures to ensure the response body is closed before starting the next request. That means they almost always start a new connection for each attempt, instead of re-using the existing connection.

Note: The benchmark times can vary wildly, by up to 3x total time spent. So I ran them all multiple times and chose the lowest time to show below. There's definitely ways to make this more scientific, but it didn't seem worth the effort today. Perhaps after all my pending watch changes are in I'll do a before & after.

I plan to follow up by adding subtests in #131704

Branch karl-close-watch-tests (this PR)

BenchmarkGet

$ go test -benchmem -run=^$ -bench ^BenchmarkGet$ k8s.io/apiserver/pkg/endpoints
goos: linux
goarch: amd64
pkg: k8s.io/apiserver/pkg/endpoints
cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
BenchmarkGet-16             6944            153248 ns/op           21591 B/op        203 allocs/op
PASS
ok      k8s.io/apiserver/pkg/endpoints  1.130s

BenchmarkGetNoCompression

$ go test -benchmem -run=^$ -bench ^BenchmarkGetNoCompression$ k8s.io/apiserver/pkg/endpoints
goos: linux
goarch: amd64
pkg: k8s.io/apiserver/pkg/endpoints
cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
BenchmarkGetNoCompression-16                7927            127926 ns/op           21136 B/op        199 allocs/op
PASS
ok      k8s.io/apiserver/pkg/endpoints  1.075s

BenchmarkUpdateProtobuf

$ go test -benchmem -run=^$ -bench ^BenchmarkUpdateProtobuf$ k8s.io/apiserver/pkg/endpoints
goos: linux
goarch: amd64
pkg: k8s.io/apiserver/pkg/endpoints
cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
BenchmarkUpdateProtobuf-16          5476            188661 ns/op           29822 B/op        283 allocs/op
PASS
ok      k8s.io/apiserver/pkg/endpoints  1.101s

BenchmarkWatchWebsocket

$ go test -benchmem -run=^$ -bench ^BenchmarkWatchWebsocket$ k8s.io/apiserver/pkg/endpoints
goos: linux
goarch: amd64
pkg: k8s.io/apiserver/pkg/endpoints
cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
BenchmarkWatchWebsocket-16         46832             23721 ns/op            3276 B/op         23 allocs/op
PASS
ok      k8s.io/apiserver/pkg/endpoints  1.421s

Branch master

BenchmarkGet

$ go test -benchmem -run=^$ -bench ^BenchmarkGet$ k8s.io/apiserver/pkg/endpoints
goos: linux
goarch: amd64
pkg: k8s.io/apiserver/pkg/endpoints
cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
BenchmarkGet-16             7114            153236 ns/op           21566 B/op        201 allocs/op
PASS
ok      k8s.io/apiserver/pkg/endpoints  1.159s

BenchmarkGetNoCompression

$ go test -benchmem -run=^$ -bench ^BenchmarkGetNoCompression$ k8s.io/apiserver/pkg/endpoints
goos: linux
goarch: amd64
pkg: k8s.io/apiserver/pkg/endpoints
cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
BenchmarkGetNoCompression-16                7390            150368 ns/op           21126 B/op        197 allocs/op
PASS
ok      k8s.io/apiserver/pkg/endpoints  1.175s

BenchmarkUpdateProtobuf

$ go test -benchmem -run=^$ -bench ^BenchmarkUpdateProtobuf$ k8s.io/apiserver/pkg/endpoints
goos: linux
goarch: amd64
pkg: k8s.io/apiserver/pkg/endpoints
cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
BenchmarkUpdateProtobuf-16          6110            198784 ns/op           30353 B/op        281 allocs/op
PASS
ok      k8s.io/apiserver/pkg/endpoints  1.278s

BenchmarkWatchWebsocket

$ go test -benchmem -run=^$ -bench ^BenchmarkWatchWebsocket$ k8s.io/apiserver/pkg/endpoints
goos: linux
goarch: amd64
pkg: k8s.io/apiserver/pkg/endpoints
cpu: AMD Ryzen 7 PRO 7840U w/ Radeon 780M Graphics
BenchmarkWatchWebsocket-16         47685             24637 ns/op            3277 B/op         23 allocs/op
PASS
ok      k8s.io/apiserver/pkg/endpoints  1.482s

@liggitt
Copy link
Member

liggitt commented May 15, 2025

thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 19d844a835cf55493a9f66981651d76000eea31c

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: karlkfi, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 15, 2025
@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@k8s-ci-robot k8s-ci-robot merged commit 5f1a517 into kubernetes:master May 15, 2025
13 of 14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone May 15, 2025
@karlkfi karlkfi deleted the karl-close-watch-tests branch May 15, 2025 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants