-
Notifications
You must be signed in to change notification settings - Fork 41.1k
[WIP] reuse etcd client to monitor storage metric #125771
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Welcome @Yiqing-Liu! |
Hi @Yiqing-Liu. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
:/cc @dims @chaochn47 |
/ok-to-test |
cc @serathius |
// get storage metric monitor is protected by mutex | ||
monitorMutex sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a mutex above monitorGetter already whose purpose seems similar. Can we reuse that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, also this testing PR changed a bit of etcd client behavior. (testing is telling us we need figure out how to shut down the "singleton" etcd client)
More code refactoring may be need if we confirmed this is the right fix.
currently we are looking for more evidence to build the causality between the gRPC error and the original PR https://github.com/kubernetes/kubernetes/pull/118812/commits.
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen |
@chaochn47: You can't reopen an issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/reopen |
@Yiqing-Liu: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Please note that we're already in Test Freeze for the Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Mon Aug 11 22:23:43 UTC 2025. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Yiqing-Liu The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Yiqing-Liu: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
motivation
We have recently noticed an increase in gRPC errors (specifically, use of closed network connection) in apiserver logs.
I0619 00:22:[31.927056 11](tel:3192705611) http2_client.go:959] "[transport] [client-transport 0xc004144900] Closing: connection error: desc = \"error reading from server: read tcp 10.0.33.210:43716→10.0.32.16:2379: use of closed network connection\"\n"
would like to rca for the recurring error message.
suspect
The following is
tcpdump
screenshot,noticed there is
etcdserverpb.Maintenance/Status
call around the grpc error occurrence timeline.We suspect the status call was introduced by PR: https://github.com/kubernetes/kubernetes/pull/118812/commits
where etcd client is frequently initialized and shutDown.
validation
the error message seems clear with this PR change (reuse the existing etcd cilent)
Which issue(s) this PR fixes:
Fixes #125770
Special notes for your reviewer:
We are looking for more evidence to build the causality between the grpc erros and the suspected PR https://github.com/kubernetes/kubernetes/pull/118812/commits.
Any insight is appreciated.
Does this PR introduce a user-facing change?
NONE
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
NONE