-
Notifications
You must be signed in to change notification settings - Fork 41.1k
[WIP] Fix race conditions and deadlocks in Reflector #125107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
23747ce
to
c53ee1f
Compare
/cc @p0lyn0mial Looks like @p0lyn0mial did the last big refactor here and might have the most context. |
c53ee1f
to
bb05d5b
Compare
It looks like this breaks TestWatchList, for multiple reasons. So I'm going to need to rework it a bit.
|
bb05d5b
to
5991f45
Compare
5991f45
to
193fc54
Compare
193fc54
to
a734401
Compare
a734401
to
fb39522
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: karlkfi The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Alright, I tried to fix the problem with the FakeWatcher closing the result channel immediately on Stop() without waiting for the producer to finish sending events, but that turned into a much larger change. So I walked that back and left some TODOs instead. To avoid needing all downstream users to refactor immediately on upgrade, it might be better in the future to add a new fixed FakeWatcher and deprecate the existing one. But I'll leave that for a separate PR to keep this one more reviewable. |
fb39522
to
135ed55
Compare
/triage accepted |
- Ensure watcher.Stop() is called after the consumer is done consuming events from the watcher. This is a signal to the producer that it should stop trying to send events on the result channel. Without it, a producer that sends on an unbuffered channel can get stuck waiting for an event to be read by a consumer who is no longer reading. - Ensure - Add extra checks to exit early when the stop channel has been closed, to avoid unnecessary API calls. - Refactor watchHandler to be easier to read. Use input and output bools, instead of a bool pointer. - Fix some of the unit tests to use the stop channel to stop ListAndWatch, as a real user would, instead of relying on the quirk that FakeWatcher.Stop() happens to close the result channel. - Add additional unit tests to confirm that the watch handler works correctly in various permutations where the result channel or stop channel is closed, before and after watching has started. - Add a MockWatcher for more explicit testing, including order of operations and result channel closure. - Remove a duplicate assertion in TestReflectorWatchHandler - Use assert instead of require in a few cases where debugging a failed test is easier if you can see all the failed assertions at once. - Fix some lint errors regarding error comparison (use errors.Is & As)
135ed55
to
8fd4501
Compare
Moved the FakeWatcher changes out to their own PR, since it requires updating all the tests that use it: #125204 |
Was asked to break this down. So here's the first few bits: |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind cleanup
/kind flake
What & Why?
events from the watcher. This is a signal to the producer that it
should stop trying to send events on the result channel. Without it,
a producer that sends on an unbuffered channel can get stuck waiting
for an event to be read by a consumer who is no longer reading.
to avoid unnecessary API calls.
bools, instead of a bool pointer.
as a real user would, instead of relying on the quirk that
FakeWatcher.Stop() happens to close the result channel.
correctly in various permutations where the result channel or stop
channel is closed, before and after watching has started.
operations and result channel closure.
test is easier if you can see all the failed assertions at once.
These changes make the informer/reflector safer to use with third-party watchers, which are often faked for testing controllers.
Does this PR introduce a user-facing change?