-
Notifications
You must be signed in to change notification settings - Fork 41.1k
fix volumeAttachment leak when kube-controller restarts during the execution of DetachVolume #130516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…, it may result in a volumeAttachment leak
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Hi @goushicui. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: goushicui, vahan-sahakyan-op The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @gnufied |
/uncc |
/assgin @thockin |
/assign @thockin |
/ok-to-test |
@goushicui: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Shouldn't that volume be detached by external-attacher anyways regardless of restart of KCM? Once, deletionTimestamp is set on a VA object, it will get detached by external-attacher. What exactly did we leak in this case? Are you saying, volume is not detaching in this case? |
@gnufied Yes, the kube-controller restarted before the volumeAttachment started deleting. |
But you didn't answer rest of my question. |
@gnufied Are you referring to this issue? I suggest you take a closer look at the description above. If deletionTimestamp is set on the volumeAttachment, the external-attacher can indeed perform the detachment operation in a timely manner. However, what if the deletion call fails, or if the KCM (Kubernetes Controller Manager) restarts before detachVolume is executed due to serial processing of the volume? |
kcm rebuild the asw from node and va objects, please see https://github.com/carlory/kubernetes/blob/master/pkg/controller/volume/attachdetach/attach_detach_controller.go#L683 |
Look at the judgment condition here? It has already been removed from the node status attachedvolume before detach. Do you think this judgment can hold? @carlory |
This check is correct. If the asw has populated the volume from the node status, the expected state of the volume is attached. if not but found it in the va object, it means that the volume is uncertain. we can not say it is attached or detached. reconciler will take care of it and mark it as attached or detached later. It is no problem. |
It is not correct. |
kCM restart, I want to ask how the volume information in ASW can be obtained through reconciliation. If it cannot be retrieved here, how should it be set to Uncertain status?
|
If the kcm is restarted, the ADC controller will rebuild its cache before it starts reconciler. If the volume is removed from node's attachedVolumes but the va object still exists, the ADC controller will add the volume to asw and mark it as uncertain. After asw and dsw are populated, the reconciler is started. it will compare the asw and dsw, and then re-do detach operation. |
If the volume has already been removed from the node status, it will not trigger a node status update. |
@carlory
First question: Won't this be added to the asw cache here?
Second question: Can it be marked as Uncertain here? Third question: If it is not marked as Uncertain, will reconcile continue to process this volumeattachment? Could you please refer to the code and answer the above questions 1, 2, and 3 respectively? Thank you. |
It won't add the volume to cache. If the volume can be found in node status, it means that the volume should be added to asw and its state is attached.
Yes, it should be Uncertain. we don't know whether the detach operation is called. If the detach operation is called and fails due to timeout, the volume may be detached. If not, the volume is attached. So it's state is Uncertain. We can not mark the attached volume as Uncertain if the volume is found in the node status. So we need this check.
No. If it is not in aws and dsw, the VA won't be handled. the reconciler doesn't know the VA concept. |
@carlory I am not asking whether it should be marked as Uncertain, but rather whether the attachState := adc.actualStateOfWorld.GetAttachState(volumeName, nodeName) can be retrieved from the cache here. Can we proceed further? |
/assign jsafrane |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
When the kube-controller-manager restarts during the execution of DetachVolume, orphaned volumeAttachment objects may persist in the API server, leading to resource leaks. This occurs due to inconsistencies between node status updates and volumeAttachment cleanup logic during controller recovery.
Workflow Leading to Leak:
DetachVolume Initiation
The volume is removed from node.status.volumeAttached before DetachVolume execution.
Controller Restart
If kube-controller-manager restarts at this point, attach_detach_controller rebuilds the actualStateOfWorld cache by iterating over node.Status.VolumesAttached. Since the volume was already removed from the node status, it is not added to the cache.
Orphaned volumeAttachment Handling
During processVolumeAttachments, the controller checks if the volume exists in actualStateOfWorld with AttachStateDetached:
Because the volume is absent from the cache (due to step 2), the orphaned volumeAttachment is not re-added to actualStateOfWorld, resulting in a persistent leak.
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: