Skip to content

Init Kubelet runtime cache before dependent stats provider #62544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 19, 2018

Conversation

astefanutti
Copy link
Contributor

What this PR does / why we need it:

This PR makes sure the kubelet runtime cache is initiated before the stats providers, that depend on it.

Nil pointer dereference occurs when accessing the container stats via the Kubelet API, using the /{namespace}/{podName}/{uid}/{containerName} path, that returns 503 as a result:

Apr 11 15:08:30 minikube kubelet[3104]: I0411 15:08:30.596209    3104 logs.go:49] http: panic serving 192.168.64.20:47110: runtime error: invalid memory address or nil pointer dereference
Apr 11 15:08:30 minikube kubelet[3104]: goroutine 44964 [running]:
Apr 11 15:08:30 minikube kubelet[3104]: net/http.(*conn).serve.func1(0xc421b16be0)
Apr 11 15:08:30 minikube kubelet[3104]:         /usr/local/go/src/net/http/server.go:1697 +0xd0
Apr 11 15:08:30 minikube kubelet[3104]: panic(0x334ec60, 0x5a1dd80)
Apr 11 15:08:30 minikube kubelet[3104]:         /usr/local/go/src/runtime/panic.go:491 +0x283
Apr 11 15:08:30 minikube kubelet[3104]: k8s.io/kubernetes/pkg/kubelet/stats.(*StatsProvider).GetContainerInfo(0xc420b364b0, 0xc420e514d0, 0x2c, 0xc420bd9a20, 0x20, 0xc4213388dd, 0x17, 0xc420a37b80, 0x0, 0x0, ...)
Apr 11 15:08:30 minikube kubelet[3104]:         /workspace/anago-v1.10.0-rc.1.9+fc32d2f3698e36/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/stats/stats_provider.go:146 +0x93
Apr 11 15:08:30 minikube kubelet[3104]: k8s.io/kubernetes/pkg/kubelet/server/stats.(*handler).handlePodContainer(0xc420b8cb40, 0xc42070b380, 0xc4214237a0)
Apr 11 15:08:30 minikube kubelet[3104]:         /workspace/anago-v1.10.0-rc.1.9+fc32d2f3698e36/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/kubelet/server/stats/handler.go:257 +0x6b9
Apr 11 15:08:30 minikube kubelet[3104]: k8s.io/kubernetes/pkg/kubelet/server/stats.(*handler).(k8s.io/kubernetes/pkg/kubelet/server/stats.handlePodContainer)-fm(0xc42070b380, 0xc4214237a0)

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes #56297

Release note:

Fix nil pointer dereference when accessing the container stats Kubelet endpoint

/sig instrumentation

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 13, 2018
@k8s-ci-robot k8s-ci-robot requested review from ncdc and tmrts April 13, 2018 14:46
@astefanutti
Copy link
Contributor Author

/assign Random-Liu
/assign derekwaynecarr

@dims
Copy link
Member

dims commented Apr 14, 2018

/ok-to-test

@k8s-ci-robot k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 14, 2018
@astefanutti
Copy link
Contributor Author

/retest

@ncdc
Copy link
Member

ncdc commented May 10, 2018

/uncc

@DirectXMan12
Copy link
Contributor

/lgtm

This seems like a fairly straightforward case of "don't try pass a field before actually setting it"

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 17, 2018
@lukaszo
Copy link
Contributor

lukaszo commented Aug 17, 2018

@Random-Liu @derekwaynecarr any chance to merge it?

@jellonek
Copy link
Contributor

ping

@DirectXMan12
Copy link
Contributor

/milestone 1.12
/priority critical-urgent

@kubernetes/sig-node-bugs we probably want to merge this before code freeze since it just straight-up breaks metrics, so I'm adding it to the milestone since it's just sat here without any triage for a while

@k8s-ci-robot
Copy link
Contributor

@DirectXMan12: The provided milestone is not valid for this repository. Milestones in this repository: [next-candidate, v1.10, v1.11, v1.12, v1.13, v1.4, v1.5, v1.6, v1.7, v1.8, v1.9]

Use /milestone clear to clear the milestone.

In response to this:

/milestone 1.12
/priority critical-urgent

@kubernetes/sig-node-bugs we probably want to merge this before code freeze since it just straight-up breaks metrics, so I'm adding it to the milestone since it's just sat here without any triage for a while

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/node Categorizes an issue or PR as relevant to SIG Node. kind/bug Categorizes issue or PR as related to a bug. labels Sep 10, 2018
@DirectXMan12
Copy link
Contributor

/milestone v1.12

@k8s-ci-robot k8s-ci-robot added this to the v1.12 milestone Sep 10, 2018
Copy link
Member

@feiskyer feiskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@yujuhong
Copy link
Contributor

/lgtm

Not sure how the tests missed this...
/cc @dashpole

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: astefanutti, DirectXMan12, feiskyer, yujuhong

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 11, 2018
@dashpole
Copy link
Contributor

We don't have tests for those endpoints... I plan to deprecate and remove this endpoint in the future, but we should fix it for now. We should cherry-pick this back to 1.9 as well

@yujuhong
Copy link
Contributor

We need to cherry-pick this to all supported branch. The bug was introduced in 1.8...

@yguo0905
Copy link
Contributor

1.8 is unsupported now so we'd need to cherry pick this into 1.9 to 1.12.

@dims
Copy link
Member

dims commented Sep 19, 2018

/test all

@k8s-ci-robot k8s-ci-robot merged commit 3429b9a into kubernetes:master Sep 19, 2018
@astefanutti astefanutti deleted the 56297 branch April 6, 2019 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detailed container stats via Kubelet API proxy returns 503 in 1.8 version