-
Notifications
You must be signed in to change notification settings - Fork 41.1k
[FG:InPlacePodVerticalScaling] Move pod resource allocation management out of the status manager #130254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FG:InPlacePodVerticalScaling] Move pod resource allocation management out of the status manager #130254
Conversation
/assign @SergeyKanzhelev |
ea3fad5
to
5bd8f29
Compare
/cc |
5bd8f29
to
4213a82
Compare
stateImpl, err := state.NewStateCheckpoint(checkpointDirectory, podStatusManagerStateFile) | ||
if err != nil { | ||
// This is a crictical, non-recoverable failure. | ||
klog.ErrorS(err, "Could not initialize pod allocation checkpoint manager, please drain node and remove policy state file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I note that https://kubernetes.io/docs/reference/node/kubelet-files/ doesn't mention what a policy state file is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The details are in the error, so most of this message is redundant. I cleaned it up in another commit.
4213a82
to
82ba31b
Compare
82ba31b
to
9024140
Compare
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: felipeagger, tallclair The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-kubernetes-integration |
/lgtm |
@felipeagger: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@@ -1169,9 +1169,9 @@ func (kl *Kubelet) HandlePodCleanups(ctx context.Context) error { | |||
// desired pods. Pods that must be restarted due to UID reuse, or leftover | |||
// pods from previous runs, are not known to the pod worker. | |||
|
|||
allPodsByUID := make(map[types.UID]*v1.Pod) | |||
allPodsByUID := make(sets.Set[types.UID]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to reviewers: this was previously unused, so I repurposed it.
func (m *manager) DeletePodAllocation(uid types.UID) { | ||
if err := m.state.Delete(string(uid), ""); err != nil { | ||
// If the deletion fails, it will be retried by RemoveOrphanedPods, so we can safely ignore the error. | ||
klog.V(3).ErrorS(err, "Failed to delete pod allocation", "podUID", uid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting that this didn't happen even once in CI
https://storage.googleapis.com/kubernetes-ci-logs/pr-logs/pull/130254/pull-kubernetes-e2e-kind/1892992974711689216/artifacts/kind-worker/kubelet.log
gsutil cat gs://kubernetes-ci-logs/pr-logs/pull/130254/pull-kubernetes-e2e-kind/1892992974711689216/artifacts/kind-worker/kubelet.log | grep 'Failed to delete pod allocation'
/lgtm |
LGTM label has been added. Git tree hash: e30f706d4bf01f3ca427d268b1eda3264172663f
|
/test pull-kubernetes-cmd |
/lgtm thanks @tallclair |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Refactoring pod resource allocation management out of status manager, in preparation for expanding the scope of things the allocation manager needs to track. Pod allocations are used beyond setting the pod status, so storing them in the status manager doesn't make sense.
Special notes for your reviewer:
I've tried to organize the commits to make this easier to review:
state
packageallocation.Manager
directly instatus.Manager
so that callers don't need to be updated (yet).Does this PR introduce a user-facing change?
/sig node
/priority important-soon