-
Notifications
You must be signed in to change notification settings - Fork 41.1k
Stop exposing list-via-watch from the server #131359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Thu Apr 17 19:35:30 UTC 2025. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
controller and gate defaulting to false for 1.33 look good (though very late ... post-rc1 changes like this are 😬) questions about retconning the 1.32 gate state, given the streaming collections improvement this is disabling in favor of is only in 1.33 |
With StreamingCollectionEncodingToJSON and StreamingCollectionEncodingToProtobuf, the WatchList must re-justify its necessity. To prevent an ecosystem from building around a feature that may not be promoted, we will stop serving list-via-watch until performance numbers can justify its inclusion. This also stops the kube-controller-manager from using the list-via-watch by default. The fallback is a regular list, so during the skew during an upgrade the "right" thing will happen and the new StreamingCollectionEncoding will be used.
I'd still do it in either 1.33.0 or 1.33.z. /milestone v1.33 |
/lgtm cc @kubernetes/release-managers |
LGTM label has been added. Git tree hash: 52758ffabbd3c5fb948b41b922442f81b5c8fa08
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: deads2k, liggitt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Given this is relatively low risk and turns off the WatchListClient feature and skips the tests, I am fine including this in the v1.33 milestone. Slack context: https://kubernetes.slack.com/archives/C0EG7JC6T/p1744987693381559?thread_ts=1741283769.908819&cid=C0EG7JC6T /milestone v1.33 |
This is slightly different than the previous test prior to kubernetes/kubernetes#131359 because this change enables the streaming list for all kube binaries. For a test comparing results, this is probably better anyway though.
We're using the watch-list feature, and we can turn it on ourselves while we're using it, but it shouldn't be deprecated in future versions, right? |
Yes
That remains to be seen, this is just pausing feature progression and avoiding accumulating reliance on the default-enabled state while the relative performance and complexity of the watchlist feature is compared with the streaming list improvements in 1.33. If it is deprecated in a future release, it will be preannounced as deprecated and continue to be supported for the period of time described at https://kubernetes.io/docs/reference/using-api/deprecation-policy/ for beta features. |
ok, I'll keep an eye on the feature progress. Thanks |
@liggitt @deads2k there are some e2e test that started failing after this https://testgrid.k8s.io/sig-network-gce#gci-gce-serial-kube-dns-nodecache
The problem seems the be the assumption that
but it has to use the decorators in ( I'm not sure if we need a new one or the linked one is valid for this) kubernetes/test/e2e/feature/feature.go Lines 560 to 561 in 66931f0
since e2e filtering is based on labels and tags We need also to tag correct these tests, I'm AFK today and I will not be able to send a PR, but it will be good to get those te |
In that case it might be justified: I believe it changes a feature that changes the behavior of client-go and thus it applies to the following test code. Is the problem perhaps that the test relies on server-side behavior that was removed unconditionally in this PR? Then the test can be removed. |
The test does implicitly depend on server side behavior (WatchList feature) which was not removed, but disabled by default. It either needs to explicitly enable the server side gate in the cluster config or gate the test on the WatchList feature gate. I'll take a look later today if no one beats me to it. |
Fix in #131380 |
Discussing in slack also, but cross posting here given where we are in the release cycle: Turning on streaming encoding is separable in my mind to turning off ListWatch (in kube-apiserver). What is the reason for turning off list-watch - does sending the k8s.io/initial-events-end message have a big cost? I know I've been using rv=0 as a simpler API than LIST + WATCH, and eagerly awaiting the versions of apiserver that send k8s.io/initial-events-end ! |
The streaming collection encoders appear more efficient, don't require client changes, and avoid much complexity. The more technical reasons are listed in the disablement PR: kubernetes/kubernetes#131359
commit 9ba7dce Author: Kubernetes Release Robot <k8s-release-robot@users.noreply.github.com> Date: Tue Apr 22 16:27:30 2025 +0000 CHANGELOG: Update directory for v1.30.12 release commit 191c34e Author: Kubernetes Release Robot <k8s-release-robot@users.noreply.github.com> Date: Tue Apr 22 16:15:31 2025 +0000 CHANGELOG: Update directory for v1.31.8 release commit 7bf818f Author: Kubernetes Release Robot <k8s-release-robot@users.noreply.github.com> Date: Tue Apr 22 16:21:58 2025 +0000 CHANGELOG: Update directory for v1.32.4 release commit 680ea07 Merge: 0d9dccf e467c95 Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Mon Apr 21 09:19:06 2025 -0700 Merge pull request kubernetes#131369 from ameukam/update-1242-master [Go] Bump dependencies, images and versions used to Go 1.24.2 and distroless iptables commit 0d9dccf Merge: 66931f0 95b926c Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Sat Apr 19 16:20:59 2025 -0700 Merge pull request kubernetes#131382 from liggitt/watch-list-e2e Correctly feature-gate WatchList e2e commit 95b926c Author: Jordan Liggitt <liggitt@google.com> Date: Sat Apr 19 17:18:31 2025 -0400 Feature-gate watchlist e2e commit 66931f0 Merge: b53b9fb 660df22 Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Fri Apr 18 07:55:08 2025 -0700 Merge pull request kubernetes#131359 from deads2k/disable Stop exposing list-via-watch from the server commit e467c95 Author: Arnaud Meukam <ameukam@gmail.com> Date: Fri Apr 18 15:49:48 2025 +0200 [Go] Bump dependencies, images and versions used to Go 1.24.2 and distroless-iptables Signed-off-by: Arnaud Meukam <ameukam@gmail.com> commit 660df22 Author: David Eads <deads@redhat.com> Date: Thu Apr 17 16:34:46 2025 -0400 Stop exposing list-via-watch from the server With StreamingCollectionEncodingToJSON and StreamingCollectionEncodingToProtobuf, the WatchList must re-justify its necessity. To prevent an ecosystem from building around a feature that may not be promoted, we will stop serving list-via-watch until performance numbers can justify its inclusion. This also stops the kube-controller-manager from using the list-via-watch by default. The fallback is a regular list, so during the skew during an upgrade the "right" thing will happen and the new StreamingCollectionEncoding will be used. commit b53b9fb Merge: 44c230b a8f6d77 Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Wed Apr 16 11:05:08 2025 -0700 Merge pull request kubernetes#131015 from aojea/final_servicecidr Add the missing Conformance Test for ServiceCIDR and IPAddress APIS commit a8f6d77 Author: Antonio Ojea <aojea@google.com> Date: Tue Apr 15 14:52:14 2025 +0000 ServiceCIDR and IPAddess Conformance Change-Id: I6ee188cc8c163c312f8a8da9f1277d83e1ea634c commit 44c230b Author: Kubernetes Release Robot <k8s-release-robot@users.noreply.github.com> Date: Tue Apr 15 17:20:14 2025 +0000 CHANGELOG: Update directory for v1.33.0-rc.1 release commit 30469e1 Merge: cfed8c7 0266d3b Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Mon Apr 14 11:17:06 2025 -0700 Merge pull request kubernetes#131263 from aojea/dualstack_upgrade Allow to convert clusters Service CIDRs from single to dual stack commit 0266d3b Author: Antonio Ojea <aojea@google.com> Date: Fri Apr 11 15:37:28 2025 +0000 Allow single-to-dual-stack reconfiguration for ServiceCIDR This change modifies the validation logic for ServiceCIDR updates (`ValidateServiceCIDRUpdate`) to specifically permit upgrading a single-stack ServiceCIDR (either IPv4 or IPv6) to a dual-stack configuration. This reconfiguration path is considered safe because it only involves adding a new CIDR range without altering the existing primary CIDR. This ensures that existing Service IP allocations are not disrupted. Other modifications, such as: - Downgrading from dual-stack to single-stack - Reordering CIDRs in a dual-stack configuration - Changing the primary CIDR during a single-to-dual-stack reconfiguration remain disallowed by the validation. These operations carry a higher risk of breaking existing Services or cluster networking configurations. Preventing these updates automatically encourages administrators to perform such changes manually after carefully assessing the potential impact on their specific cluster environment. The validation errors and controller logs provide guidance when such disallowed changes are attempted. Change-Id: I41dc09dfddb05f277925da2262f8114d6accbd1d commit cfed8c7 Merge: d6e3f34 7d7fc2d Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Mon Apr 14 08:19:13 2025 -0700 Merge pull request kubernetes#131234 from carlory/fix-131229 Fix flaky test: Metrics should grab all metrics from kubelet /metrics/resource endpoint commit d6e3f34 Merge: b15dfce 18249aa Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Mon Apr 14 08:19:06 2025 -0700 Merge pull request kubernetes#131211 from BenTheElder/max-pods fix flaky garbage collector tests commit b15dfce Merge: 908bdb3 e5a5f72 Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Fri Apr 11 04:46:42 2025 -0700 Merge pull request kubernetes#131240 from jsafrane/selinux-tests-tag Tag SELinux tests that require SELinux warning controller commit 7d7fc2d Author: carlory <baofa.fan@daocloud.io> Date: Thu Apr 10 17:58:05 2025 +0800 Fix flaky test: Metrics should grab all metrics from kubelet /metrics/resource endpoint Signed-off-by: carlory <baofa.fan@daocloud.io> commit 908bdb3 Merge: 88dfcb2 505836c Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Thu Apr 10 17:08:48 2025 -0700 Merge pull request kubernetes#131250 from jimangel/publishing-release-33 staging/publishing: add release-1.33 rules commit 505836c Author: Jim Angel <jameswangel@gmail.com> Date: Thu Apr 10 16:42:44 2025 -0500 staging/publishing: add release-1.33 rules commit e5a5f72 Author: Jan Safranek <jsafrane@redhat.com> Date: Thu Apr 10 14:28:25 2025 +0200 Tag SELinux tests that require SELinux warning controller The controller is available only with SELinuxChangePolicy FG enabled. Kubernetes e2e jobs do the right --focus & --skip explicitly in the SELinux jobs, this just helps other test runners to figure out what FGs are needed to run the tests. When at it, I noticed SELinuxMountReadWriteOncePod tag is always required, so add it on the common level and not in each test. commit 88dfcb2 Merge: cacd595 52298cf Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Wed Apr 9 07:06:48 2025 -0700 Merge pull request kubernetes#131065 from pohly/dra-kubelet-registration-unit-test-fix DRA kubelet: fix potential flake in unit test commit cacd595 Author: Kubernetes Release Robot <k8s-release-robot@users.noreply.github.com> Date: Tue Apr 8 19:22:49 2025 +0000 CHANGELOG: Update directory for v1.33.0-rc.0 release commit 18249aa Author: Benjamin Elder <bentheelder@google.com> Date: Tue Apr 8 15:54:45 2025 -0700 hack/update-conformance-yaml.sh adds [Serial] tags to some tests that schedule lots of pods commit 640489a Merge: 92af6ab b1a9cc3 Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Tue Apr 8 15:12:51 2025 -0700 Merge pull request kubernetes#131196 from siyuanfoundation/forward-api bug fix: fix version order in emulation forward compatibility. commit 1eab303 Author: Benjamin Elder <bentheelder@google.com> Date: Tue Apr 8 14:46:38 2025 -0700 mark tests that use estimateMaxPods as serial commit b2933c0 Author: Benjamin Elder <bentheelder@google.com> Date: Tue Apr 8 14:46:06 2025 -0700 estimate some system daemonset overhead for max pods commit b1a9cc3 Author: Siyuan Zhang <sizhang@google.com> Date: Mon Apr 7 18:38:43 2025 -0700 bug fix: fix version order in emulation forward compatibility. Signed-off-by: Siyuan Zhang <sizhang@google.com> commit 92af6ab Merge: ab3e83f 2ef4a84 Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Tue Apr 8 12:18:44 2025 -0700 Merge pull request kubernetes#131204 from dims/move-to-released-version-of-prometheus/client_golang-v1.22.0-from-rc.0 Move to released version of prometheus/client_golang v1.22.0 from rc.0 commit 2ef4a84 Author: Davanum Srinivas <davanum@gmail.com> Date: Tue Apr 8 08:35:18 2025 -0400 Move to released version of prometheus/client_golang v1.22.0 from rc.0 Signed-off-by: Davanum Srinivas <davanum@gmail.com> commit ab3e83f Merge: 195803c 10a7d6f Author: Kubernetes Prow Robot <20407524+k8s-ci-robot@users.noreply.github.com> Date: Tue Apr 8 02:06:44 2025 -0700 Merge pull request kubernetes#131146 from mauriciopoppe/csi-proxy-1-2-1 Bump CSI Proxy to v1.2.1-gke.2 commit 10a7d6f Author: Mauricio Poppe <mauriciopoppe@google.com> Date: Tue Apr 1 20:22:44 2025 +0000 Update CSI Proxy to v1.2.1-gke.2 commit 52298cf Author: Patrick Ohly <patrick.ohly@intel.com> Date: Wed Mar 26 09:21:50 2025 +0100 DRA kubelet: fix potential flake in unit test If the test binary ran long enough after test completion to reach the ResourceSlice removal grace period, that background activity started and failed because it was using out-dated state and an invalid testing.T pointer, causing a panic. The root cause was to leave those background activities running. They need to be stopped before a test returns.
xref Slack discussion: https://kubernetes.slack.com/archives/C0EG7JC6T/p1745165276094689 |
Upstream disabled list-via-watch in kubernetes/kubernetes#131359 Reenabling would be possible through a feature gate but upstream doesn't seem keen on continuing the feature. Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Upstream disabled list-via-watch in kubernetes/kubernetes#131359 Reenabling would be possible through a feature gate but upstream doesn't seem keen on continuing the feature. Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Upstream disabled list-via-watch in kubernetes/kubernetes#131359 Reenabling would be possible through a feature gate but upstream doesn't seem keen on continuing the feature. Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Upstream disabled list-via-watch in kubernetes/kubernetes#131359 Reenabling would be possible through a feature gate but upstream doesn't seem keen on continuing the feature. Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Upstream disabled list-via-watch in kubernetes/kubernetes#131359 Reenabling would be possible through a feature gate but upstream doesn't seem keen on continuing the feature. Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Upstream disabled list-via-watch in kubernetes/kubernetes#131359 Reenabling would be possible through a feature gate but upstream doesn't seem keen on continuing the feature. Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Upstream disabled list-via-watch in kubernetes/kubernetes#131359 Reenabling would be possible through a feature gate but upstream doesn't seem keen on continuing the feature. Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Upstream disabled list-via-watch in kubernetes/kubernetes#131359 Reenabling would be possible through a feature gate but upstream doesn't seem keen on continuing the feature. Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
With StreamingCollectionEncodingToJSON and
StreamingCollectionEncodingToProtobuf, the WatchList must re-justify its necessity. To prevent an ecosystem from building around a feature that may not be promoted, we will stop serving list-via-watch until performance numbers can justify its inclusion.
This also stops the kube-controller-manager from using the list-via-watch by default. The fallback is a regular list, so during the skew during an upgrade the "right" thing will happen and the new StreamingCollectionEncoding will be used.
We are doing this because the streaming json encoder performs better on memory metrics, it doesn't require any client changes, and is significantly easier on the apiserver side.
From @serathius in slack:
Streaming encoder here, pay attention to the far right of the graph showing the improved state.

watch list encoder here, notice that it consumes more memory than the streaming encoder

/sig-api-machinery
/kind cleanup
/priority important-soon
/assign @liggitt