Skip to content

Conversation

xiaoanyunfei
Copy link
Contributor

@xiaoanyunfei xiaoanyunfei commented Dec 30, 2020

What type of PR is this?

E1230 15:36:15.979242 331116 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 1845 [running]:
dispatcher/vendor/k8s.io/apimachinery/pkg/util/runtime.logPanic(0x17da4a0, 0x278fa20)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0xa6
dispatcher/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x89
panic(0x17da4a0, 0x278fa20)
/usr/local/go/src/runtime/panic.go:969 +0x1b9
dispatcher/vendor/k8s.io/kubernetes/pkg/scheduler/framework/v1alpha1.(*NodeInfo).RemovePod(0xc943bafea0, 0xc5e8261800, 0xc3a153d380, 0x2b)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/kubernetes/pkg/scheduler/framework/v1alpha1/types.go:553 +0x65d
dispatcher/pkg/cache.(*schedulerCache).removePod(0xc00106b4a0, 0xc5e8261800, 0xc3544d1260, 0x2b)
/home/jenkins/go/src/dispatcher/pkg/cache/cache.go:142 +0x85
dispatcher/pkg/cache.(*schedulerCache).updatePod(0xc00106b4a0, 0xc5e8261800, 0xc42ea66400, 0x1, 0x27e83e0)
/home/jenkins/go/src/dispatcher/pkg/cache/cache.go:127 +0x7b
dispatcher/pkg/cache.(*schedulerCache).UpdatePod(0xc00106b4a0, 0xc5e8261800, 0xc42ea66400, 0x0, 0x0)
/home/jenkins/go/src/dispatcher/pkg/cache/cache.go:203 +0x1d3
dispatcher/pkg/clusters/scheduler.(*Scheduler).updatePodInCache(0xc002a2a000, 0x19e1b80, 0xc5e8261800, 0x19e1b80, 0xc42ea66400)
/home/jenkins/go/src/dispatcher/pkg/clusters/scheduler/scheduler.go:385 +0x9b
dispatcher/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/client-go/tools/cache/controller.go:234
dispatcher/vendor/k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnUpdate(0x1ac5c78, 0x1c33780, 0xc0005cd160, 0x19e1b80, 0xc5e8261800, 0x19e1b80, 0xc42ea66400)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/client-go/tools/cache/controller.go:269 +0x122
dispatcher/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1()
/home/jenkins/go/src/dispatcher/vendor/k8s.io/client-go/tools/cache/shared_informer.go:775 +0x1c5
dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0007abf60)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000827f60, 0x1bf1500, 0xc001c94060, 0x17a0e01, 0xc0014e24e0)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xad
dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0007abf60, 0x3b9aca00, 0x0, 0x1ac9401, 0xc0014e24e0)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait.Until(...)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90
dispatcher/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc000a56500)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/client-go/tools/cache/shared_informer.go:771 +0x95
dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc0010de290, 0xc001298160)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:73 +0x51
created by dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x65
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x13b735d]

goroutine 1845 [running]:
dispatcher/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x10c
panic(0x17da4a0, 0x278fa20)
/usr/local/go/src/runtime/panic.go:969 +0x1b9
dispatcher/vendor/k8s.io/kubernetes/pkg/scheduler/framework/v1alpha1.(*NodeInfo).RemovePod(0xc943bafea0, 0xc5e8261800, 0xc3a153d380, 0x2b)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/kubernetes/pkg/scheduler/framework/v1alpha1/types.go:553 +0x65d
dispatcher/pkg/cache.(*schedulerCache).removePod(0xc00106b4a0, 0xc5e8261800, 0xc3544d1260, 0x2b)
/home/jenkins/go/src/dispatcher/pkg/cache/cache.go:142 +0x85
dispatcher/pkg/cache.(*schedulerCache).updatePod(0xc00106b4a0, 0xc5e8261800, 0xc42ea66400, 0x1, 0x27e83e0)
/home/jenkins/go/src/dispatcher/pkg/cache/cache.go:127 +0x7b
dispatcher/pkg/cache.(*schedulerCache).UpdatePod(0xc00106b4a0, 0xc5e8261800, 0xc42ea66400, 0x0, 0x0)
/home/jenkins/go/src/dispatcher/pkg/cache/cache.go:203 +0x1d3
dispatcher/pkg/clusters/scheduler.(*Scheduler).updatePodInCache(0xc002a2a000, 0x19e1b80, 0xc5e8261800, 0x19e1b80, 0xc42ea66400)
/home/jenkins/go/src/dispatcher/pkg/clusters/scheduler/scheduler.go:385 +0x9b
dispatcher/vendor/k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/client-go/tools/cache/controller.go:234
dispatcher/vendor/k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnUpdate(0x1ac5c78, 0x1c33780, 0xc0005cd160, 0x19e1b80, 0xc5e8261800, 0x19e1b80, 0xc42ea66400)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/client-go/tools/cache/controller.go:269 +0x122
dispatcher/vendor/k8s.io/client-go/tools/cache.(*processorListener).run.func1()
/home/jenkins/go/src/dispatcher/vendor/k8s.io/client-go/tools/cache/shared_informer.go:775 +0x1c5
dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc0007abf60)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000827f60, 0x1bf1500, 0xc001c94060, 0x17a0e01, 0xc0014e24e0)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xad
dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc0007abf60, 0x3b9aca00, 0x0, 0x1ac9401, 0xc0014e24e0)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait.Until(...)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90
dispatcher/vendor/k8s.io/client-go/tools/cache.(*processorListener).run(0xc000a56500)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/client-go/tools/cache/shared_informer.go:771 +0x95
dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1(0xc0010de290, 0xc001298160)
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:73 +0x51
created by dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/home/jenkins/go/src/dispatcher/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:71 +0x65

Add one of the following kinds:

/kind bug

What this PR does / why we need it:
When RemoveNode is called before RemovePod, k8s will panic: "invalid memory address or nil pointer dereference"

This PR fix nil pointer dereference when NodeInfo.RemovePod

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/sig-scheduler

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Dec 30, 2020
@k8s-ci-robot
Copy link
Contributor

@xiaoanyunfei: This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-priority Indicates a PR lacks a `priority/foo` label and requires one. label Dec 30, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xiaoanyunfei
To complete the pull request process, please assign ravisantoshgudimetla after the PR has been reviewed.
You can assign the PR to them by writing /assign @ravisantoshgudimetla in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 30, 2020
@xiaoanyunfei
Copy link
Contributor Author

/retest

@@ -581,7 +585,7 @@ func (n *NodeInfo) RemovePod(pod *v1.Pod) error {
return nil
}
}
return fmt.Errorf("no corresponding pod %s in pods of node %s", pod.Name, n.node.Name)
return fmt.Errorf("no corresponding pod %s in pods of node %s", pod.Name, n.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting here means there is another bug, so before we decide whether or not it is worth carrying the node name in nodeinfo, lets debug why we are here in the first place.

/cc @alculquicondor

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would only happen if the Pod was already removed, which in turn would be when 2 Pod delete events are received.
But we can use pod.Spec.NodeName.
Unless this is a bug that we already solved (can't find the PR).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the precedents: #89553 (not merged) and #89908 (merged to 1.19 and 1.18), but also reverted in #93938

So, it's important to know which version the OP was running to see if this is still an issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a legitimate case where we get two delete events for a pod?

@alculquicondor
Copy link
Member

@xiaoanyunfei could you clarify which version where you using?

@xiaoanyunfei xiaoanyunfei force-pushed the bugfix/RemovePod_nil_pointer branch from aaa0e3a to dec0193 Compare January 7, 2021 02:19
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 7, 2021
@xiaoanyunfei
Copy link
Contributor Author

xiaoanyunfei commented Jan 7, 2021

@xiaoanyunfei could you clarify which version where you using?

I'm using v1.19.0

@alculquicondor
Copy link
Member

I'm using v1.19.0

Could you retry in the latest 1.19 release?

@alculquicondor
Copy link
Member

I'll take a look at the code just in case.

@alculquicondor
Copy link
Member

Also I would appreciate if you can provide repro steps (from a real world scenario), including the initial distribution of pods in the node.

Also, the stacktrace seems incomplete.

@alculquicondor
Copy link
Member

/triage needs-information

@k8s-ci-robot k8s-ci-robot added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jan 7, 2021
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 6, 2021
@k8s-ci-robot
Copy link
Contributor

@xiaoanyunfei: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2021
@k8s-ci-robot
Copy link
Contributor

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 5, 2021
@alculquicondor
Copy link
Member

/close
as there was no evidence provided that this happened in a production scenario.

@k8s-ci-robot
Copy link
Contributor

@alculquicondor: Closed this PR.

In response to this:

/close
as there was no evidence provided that this happened in a production scenario.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants