-
Notifications
You must be signed in to change notification settings - Fork 41.1k
test: Close response body in watch tests #131706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Close response body in watch tests #131706
Conversation
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
- Close response body after http calls in the watch tests - Add apitesting utility methods for closing and testing errors from response body and websocket. - Validate read after close errors.
edb776d
to
9d96329
Compare
/assign @wojtek-t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
were the changes in this file just for hygiene or were they actually fixing flakes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In many of these test cases, the body was not being closed at all, but the test wasn't failing, because the connection wasn't being re-used.
However, there are a number of tests that loop through multiple cases (like TestNotFound) sharing the same client & server without using sub-tests. In these cases, the defer calls were only getting called after all the cases passed or on fatal error. When you fail to close the response body, the connection cannot be re-used, so it opens a new connection. Closing the response body on the client tells the server it can stop serving, which should trigger the server handle method defers to clean up server-side resources, like storage watches.
So these changes primarily improve hygiene, making the tests more similar to ideal usage patterns, but they also slightly improve efficiency of the tests. More importantly from my perspective, making sure the storage watches are closed is a step towards unblocking cleaning up the fake watcher implementations later, which are a bit of a mess. I pulled this out of a set of larger changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, cleanup is totally fine, this was just marked as /kind flake
so I was trying to figure out which line was flaking and what the fix was
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question on this file, just general cleanup or fixing specific flakes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is just cleanup, same as apiserver_test.go.
@@ -156,6 +158,11 @@ func TestWatchWebsocketClientClose(t *testing.T) { | |||
if err != nil { | |||
t.Fatalf("unexpected error: %v", err) | |||
} | |||
// Ensure the websocket is closed without error | |||
closeWebSocket := apitesting.Close |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only way this would get called is if the test already failed, right? just trying to understand if this change is cleanup or fixing a flake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only way this would get called is if the test already failed, right?
Correct. Closing just cleans up the client and server resources.
But I also added a check here to make sure the first ws.Close doesn't return an error and then updated the expectation after websocket.JSON.Receive to confirm that it errors with the expected error and not some unexpected one.
@@ -353,7 +367,10 @@ func TestWatchRead(t *testing.T) { | |||
streamSerializer := info.StreamSerializer | |||
|
|||
r, contentType := protocol.fn(test.Accept) | |||
defer r.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this silently erroring before because of a double-close (but we weren't checking the error result)? trying to understand what was actually flaking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing an http response body twice should not error, however closing a websocket twice DOES error, at least when using the golang.org/x/net/websocket
library.
So I changed TestWatchRead to make sure Close was only called once, similar to how I changed TestWatchWebsocketClientClose.
// Websockets error if closed twice. | ||
// So disable the Body.Close and use Decoder.Close instead. | ||
closeBody = apitesting.CloseNoOp | ||
defer apitesting.Close(t, d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is adding a new Close() call where we didn't have one before? was nothing ever closing the decoder / serializer before, we were just tearing down the overall protocol readcloser before?
this test loops over websocket and non-websocket protocols... is it correct to do things that only apply to websockets like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is adding a new Close() call where we didn't have one before? was nothing ever closing the decoder / serializer before, we were just tearing down the overall protocol readcloser before?
Correct.
If the test was using the decoder after closing the response body or websocket, it would have errored (each of those cases produces a different error), but it's not. So the decoder should just get garbage collected when the test ended, without doing whatever internal cleanup it was supposed to do on Close and without checking the return error.
For the specific streaming decoder being used here, its Close method just closes the reader (response body). So not calling decoder.Close wasn't actually producing any side effects, because the response body was already eventually being closed.
So effectively, this just updates the tests to demonstrate how you should actually be using the decoder, without knowing exactly what needs to be cleaned up underneath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test loops over websocket and non-websocket protocols... is it correct to do things that only apply to websockets like this?
The decoder should be closed whether it uses http or websockets. And if the decoder is closed, then there's no need to close the response body directly. However, closing both only errors for websockets, because the underlying Reader errors if closed twice.
IMO, the decoder probably shouldn't close the response body on close, but removing that might cause problems in other cases that close the decoder and not the response body.
@liggitt I don't know of any specific existing test flakes that this PR fixes, however cleaning up the response body and websocket should help avoid flakes in the future, and help ensure correctness as I make future refactors to the client and server watch code. I want to make sure the client and storage watches are being correctly closed, which requires the response/websocket to be closed first. |
ok, if it's not fixing an existing flake, let's drop that label for clarity Would you mind running the modified tests with -race / stress just to double check these changes don't introduce any issues?
|
staging/src/k8s.io/apiserver/pkg/endpoints/apiserver_test.go
staging/src/k8s.io/apiserver/pkg/endpoints/watch_test.go
staging/src/k8s.io/apiserver/pkg/endpoints/handlers/watch_test.go
|
Endpoint Benchmarks
SummaryNone of these changes significantly impact the speed of the existing benchmarks. This isn't surprising, because the benchmarks today don't use sub-tests or function closures to ensure the response body is closed before starting the next request. That means they almost always start a new connection for each attempt, instead of re-using the existing connection. Note: The benchmark times can vary wildly, by up to 3x total time spent. So I ran them all multiple times and chose the lowest time to show below. There's definitely ways to make this more scientific, but it didn't seem worth the effort today. Perhaps after all my pending watch changes are in I'll do a before & after. I plan to follow up by adding subtests in #131704 Branch karl-close-watch-tests (this PR)BenchmarkGet
BenchmarkGetNoCompression
BenchmarkUpdateProtobuf
BenchmarkWatchWebsocket
Branch masterBenchmarkGet
BenchmarkGetNoCompression
BenchmarkUpdateProtobuf
BenchmarkWatchWebsocket
|
thanks! /lgtm |
LGTM label has been added. Git tree hash: 19d844a835cf55493a9f66981651d76000eea31c
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: karlkfi, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
What type of PR is this?
/kind cleanup
/kind flake
What this PR does / why we need it:
Does this PR introduce a user-facing change?