Skip to content

fix volumeAttachment leak when kube-controller restarts during the execution of DetachVolume #130516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

goushicui
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

Which issue(s) this PR fixes:

When the kube-controller-manager restarts during the execution of DetachVolume, orphaned volumeAttachment objects may persist in the API server, leading to resource leaks. This occurs due to inconsistencies between node status updates and volumeAttachment cleanup logic during controller recovery.

Workflow Leading to Leak:

DetachVolume Initiation

The volume is removed from node.status.volumeAttached before DetachVolume execution.

Controller Restart

If kube-controller-manager restarts at this point, attach_detach_controller rebuilds the actualStateOfWorld cache by iterating over node.Status.VolumesAttached. Since the volume was already removed from the node status, it is not added to the cache.

Orphaned volumeAttachment Handling

During processVolumeAttachments, the controller checks if the volume exists in actualStateOfWorld with AttachStateDetached:

                attachState := adc.actualStateOfWorld.GetAttachState(volumeName, nodeName)
		if attachState == cache.AttachStateDetached {
		  err = adc.actualStateOfWorld.MarkVolumeAsUncertain(logger, volumeName, volumeSpec, nodeName)
		}

Because the volume is absent from the cache (due to step 2), the orphaned volumeAttachment is not re-added to actualStateOfWorld, resulting in a persistent leak.

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?


Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 1, 2025
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 1, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @goushicui. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 1, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in SIG Apps Mar 1, 2025
@goushicui
Copy link
Contributor Author

@gnufied

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: goushicui, vahan-sahakyan-op
Once this PR has been reviewed and has the lgtm label, please assign thockin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@goushicui
Copy link
Contributor Author

/assign @gnufied

@mauriciopoppe
Copy link
Member

/uncc

@k8s-ci-robot k8s-ci-robot removed the request for review from mauriciopoppe March 3, 2025 15:31
@goushicui
Copy link
Contributor Author

/assgin @thockin

@goushicui
Copy link
Contributor Author

/assign @thockin

@gnufied
Copy link
Member

gnufied commented Mar 7, 2025

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 7, 2025
@k8s-ci-robot
Copy link
Contributor

@goushicui: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-unit 9479aef link true /test pull-kubernetes-unit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@gnufied
Copy link
Member

gnufied commented Mar 7, 2025

Shouldn't that volume be detached by external-attacher anyways regardless of restart of KCM? Once, deletionTimestamp is set on a VA object, it will get detached by external-attacher.

What exactly did we leak in this case? Are you saying, volume is not detaching in this case?

@goushicui
Copy link
Contributor Author

@gnufied Yes, the kube-controller restarted before the volumeAttachment started deleting.

@gnufied
Copy link
Member

gnufied commented Mar 8, 2025

@gnufied Yes, the kube-controller restarted before the volumeAttachment started deleting.

But you didn't answer rest of my question.

@goushicui
Copy link
Contributor Author

goushicui commented Mar 8, 2025

Shouldn't that volume be detached by external-attacher anyways regardless of restart of KCM? Once, deletionTimestamp is set on a VA object, it will get detached by external-attacher.

@gnufied Are you referring to this issue? I suggest you take a closer look at the description above. If deletionTimestamp is set on the volumeAttachment, the external-attacher can indeed perform the detachment operation in a timely manner. However, what if the deletion call fails, or if the KCM (Kubernetes Controller Manager) restarts before detachVolume is executed due to serial processing of the volume?
If it is not re-added here, neither the desired cache nor the actual cache will contain information about this volume. How would the subsequent reconcile operation perform the so-called deletion then?

@carlory
Copy link
Member

carlory commented Mar 10, 2025

@goushicui
Copy link
Contributor Author

https://github.com/carlory/kubernetes/blob/master/pkg/controller/volume/attachdetach/attach_detach_controller.go#L737

attachState := adc.actualStateOfWorld.GetAttachState(volumeName, nodeName)
		if attachState == cache.AttachStateDetached {

Look at the judgment condition here? It has already been removed from the node status attachedvolume before detach. Do you think this judgment can hold? @carlory

@carlory
Copy link
Member

carlory commented Mar 10, 2025

This check is correct. If the asw has populated the volume from the node status, the expected state of the volume is attached. if not but found it in the va object, it means that the volume is uncertain. we can not say it is attached or detached. reconciler will take care of it and mark it as attached or detached later. It is no problem.

@carlory
Copy link
Member

carlory commented Mar 10, 2025

it is not added to the cache.

It is not correct. MarkVolumeAsUncertain adds the volume to asw but its state is marked as Uncertain

@goushicui
Copy link
Contributor Author

goushicui commented Mar 10, 2025

err = rc.actualStateOfWorld.RemoveVolumeFromReportAsAttached(attachedVolume.VolumeName, attachedVolume.NodeName)
			if err != nil {
				logger.V(5).Info("RemoveVolumeFromReportAsAttached failed while removing volume from node",
					"node", klog.KRef("", string(attachedVolume.NodeName)),
					"volumeName", attachedVolume.VolumeName,
					"err", err)
			}

			// Update Node Status to indicate volume is no longer safe to mount.
			err = rc.nodeStatusUpdater.UpdateNodeStatusForNode(logger, attachedVolume.NodeName)

kCM restart, I want to ask how the volume information in ASW can be obtained through reconciliation. If it cannot be retrieved here, how should it be set to Uncertain status?

 If the asw has populated the volume from the node status, the expected state of the volume is attached. 

@carlory

@carlory
Copy link
Member

carlory commented Mar 10, 2025

https://github.com/carlory/kubernetes/blob/aab083972dbb5620b6daa62172aa1694e85facd7/pkg/controller/volume/attachdetach/attach_detach_controller.go#L347

If the kcm is restarted, the ADC controller will rebuild its cache before it starts reconciler. If the volume is removed from node's attachedVolumes but the va object still exists, the ADC controller will add the volume to asw and mark it as uncertain. After asw and dsw are populated, the reconciler is started. it will compare the asw and dsw, and then re-do detach operation.

@carlory
Copy link
Member

carlory commented Mar 10, 2025

If the volume has already been removed from the node status, it will not trigger a node status update.

@goushicui
Copy link
Contributor Author

@carlory
I see you keep emphasizing that populateActualStateOfWorld -> processVolumeAttachments will handle this logic... As I mentioned earlier, if it is removed from node.status.volumeAttached, the execution logic of populateActualStateOfWorld is as follows:

for _, node := range nodes {
    nodeName := types.NodeName(node.Name)

    for _, attachedVolume := range node.Status.VolumesAttached {
        uniqueName := attachedVolume.Name

First question: Won't this be added to the asw cache here?

attachState := adc.actualStateOfWorld.GetAttachState(volumeName, nodeName)
    if attachState == cache.AttachStateDetached {

Second question: Can it be marked as Uncertain here?

Third question: If it is not marked as Uncertain, will reconcile continue to process this volumeattachment?

Could you please refer to the code and answer the above questions 1, 2, and 3 respectively? Thank you.

@carlory
Copy link
Member

carlory commented Mar 10, 2025

First question: Won't this be added to the asw cache here?

It won't add the volume to cache. If the volume can be found in node status, it means that the volume should be added to asw and its state is attached.

Second question: Can it be marked as Uncertain here?

Yes, it should be Uncertain. we don't know whether the detach operation is called. If the detach operation is called and fails due to timeout, the volume may be detached. If not, the volume is attached. So it's state is Uncertain. We can not mark the attached volume as Uncertain if the volume is found in the node status. So we need this check.

Third question: If it is not marked as Uncertain, will reconcile continue to process this volume attachment?

No. If it is not in aws and dsw, the VA won't be handled. the reconciler doesn't know the VA concept.

@goushicui
Copy link
Contributor Author

Yes, it should be Uncertain. we don't know whether the detach operation is called. If the detach operation is called and fails due to timeout, the volume may be detached. If not, the volume is attached. So it's state is Uncertain. We can not mark the attached volume as Uncertain if the volume is found in the node status. So we need this check.

@carlory I am not asking whether it should be marked as Uncertain, but rather whether the attachState := adc.actualStateOfWorld.GetAttachState(volumeName, nodeName) can be retrieved from the cache here. Can we proceed further?

@goushicui
Copy link
Contributor Author

/assign jsafrane

@goushicui
Copy link
Contributor Author

@yuga711

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jul 7, 2025
@k8s-ci-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/bug Categorizes issue or PR as related to a bug. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

9 participants