Deletion and Garbage Collection of Kubernetes Objects

Dec 6th, 2017 9:00am by Maarten Hoogendoorn

Feature image via Pixabay.

This contributed article is part of a series, from members of the Cloud Native Computing Foundation (CNCF), about CNCF’s Kubecon/CloudNativeCon, taking place this week in Austin, Dec. 6 – 8.

Maarten Hoogendoorn

Maarten is an engineer at Container Solutions, where he helps clients with containerizers, build systems, orchestrators and CI/CD pipelines. Maarten enjoys programming in Rust, and building/deploying software declaratively with Nix. He also organizes the Amsterdam Nix and Rust meetups.

With the Kubernetes container orchestration engine, concepts and objects build on top of each other. An example we described previously is how deployments build on top of replica sets to ensure availability, and replica sets build on top of Pods to get scheduling for free.

What exactly happens when we delete a deployment? We would not only expect the deployment itself to be deleted, but also the replica sets and pods that are managed by the deployment.

This problem is solved by garbage collection (GC). Before GC was introduced in Kubernetes 1.8, this was handled by the client and/or hardcoded in the controllers for a specific resource. Obviously, the client could fail halfway through the deletion of the deployment and its components, leaving the system in a limbo state that had to be manually cleaned up afterward. Not ideal for a system that aims to not require human operators to work reliably.

So, back to garbage collection. You’ve probably heard of it already in regards to programming languages.

The classic algorithm for Garbage Collection, mark-and-sweep, assumes that

Each allocation/object knows which children objects it “owns.”
When the program is paused, we can inspect the “root set” (e.g. which variables are in scope).

The collection process then works by:

1. Pausing program execution,
2. Marking all reachable references from your current position as “alive,” starting from the root set,
3. Iterate through all allocations,
  1. Freeing those who are not alive,
  2. Mark the survivors as “dead,” to prepare for the next GC round.

The animation below shows how this works.

Source

Ownership in Kubernetes

Kubernetes also has a garbage collection system, but it works the other way around! In classical GC each object knows which other objects it owns (left in the figure below), but in Kubernetes, the owned object contains an OwnerReference to its owner.

Let’s see how these references look like in practice.

Create a deployment via kubectl run, as shown below. This will cause the deployment controller to create a ReplicaSet, with one replica (which means it will only start one pod).

$ kubectl run my-nginx --image=nginx

deployment "my-nginx" created

Now let’s inspect the ownerReferences of the ReplicaSet. (If you want to know how Deployments, ReplicaSets and Pods relate to each other, check out our previous post.)

$ kubectl get replicaset -l"run=my-nginx" -o json | jq ".items[0].metadata.name, .items[0].metadata.ownerReferences"

"my-nginx-85584476c8"

[

{

"apiVersion": "extensions/v1beta1",

"blockOwnerDeletion": true,

"controller": true,

"kind": "Deployment",

"name": "my-nginx",

"uid": "6e047451-cdca-11e7-83b5-080027d7dd6b"

}

]

NOTE: We used the very handy jq utility here to get just the output we want. We get back both the metadata.name and the metadata.ownerReferences of the ReplicaSet object.

And yes, we can see that the replica set object has metadata.ownerReferences set, and that the owner is a deployment with the name ‘my-nginx’.

And now for the pod associated to the deployment:

$ kubectl get pod -l"run=my-nginx" -o json | jq ".items[0].metadata.ownerReferences"

[

{

"apiVersion": "extensions/v1beta1",

"blockOwnerDeletion": true,

"controller": true,

"kind": "ReplicaSet",

"name": "my-nginx-85584476c8",

"uid": "6e04f916-cdca-11e7-83b5-080027d7dd6b"

}

]

We see indeed that the owner is the replica set named “my-nginx-85584476c8.”

Deleting Objects: Three Variants

There are three different ways to delete a Kubernetes object, by setting the propagationPolicy on the deletion request to one of the following options:

Foreground: The object itself cannot be deleted before all the objects that it owns are deleted.
Background: The object itself is deleted, after which the GC deletes the objects that it owned.
Orphan: The object itself is deleted. The owned objects are “orphaned.” by removing the reference to the owner.

Let’s see how we can invoke them! Unfortunately, kubectl does currently not support setting the propagation policy. We need to access to Kubernetes’ API server directly to be able to set the propagation policy.

An easy solution to get access to the API server is via the kubectl proxy command, which will handle the authentication of all your requests.

Start the kubectl proxy, and keep it running whilst you’re performing the curl requests:

1	kubectl proxy -p 8080

Foreground policy

To delete an object with the foreground propagation policy, run the following curl command:

1	'{"kind":"DeleteOptions","apiVersion":"v1",<strong>"propagationPolicy":"Foreground"</strong>}' -H "Content-Type: application/json"

It will respond with something similar to the following output (I removed some irrelevant output)

{

"kind": "Deployment",

"apiVersion": "extensions/v1beta1",

"metadata": {

"name": "my-nginx",

"creationTimestamp": "2017-11-20T08:35:51Z",

"deletionTimestamp": "2017-11-20T08:36:04Z",

"finalizers": [

"foregroundDeletion"

]

"spec": { … }

"status": {

"observedGeneration": 1,

"replicas": 1,

"updatedReplicas": 1,

"readyReplicas": 1,

"availableReplicas": 1,

"conditions": [

{

"type": "Available",

"status": "True",

"lastUpdateTime": "2017-11-20T08:35:51Z",

"lastTransitionTime": "2017-11-20T08:35:51Z",

"reason": "MinimumReplicasAvailable",

"message": "Deployment has minimum availability."

}

]

}

As you can see, there is now a deletionTimestamp, which marks the object read-only for users. Also, a list of finalizers is added. The only operation that can be applied to the object by Kubernetes, is removing finalizers and updating its status. The foregroundDeletion finalizer is handled by the garbage collection system, which will delete the replica sets first, before removing the deployment. Once all finalizers have been removed, the object itself is removed from Kubernetes.

Background Policy

A background deletion is a lot simpler.

curl -X DELETE localhost:8080/apis/extensions/v1beta1/namespaces/default/deployments/my-nginx -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Background"}' -H "Content-Type: application/json"

{

"kind": "Status",

"apiVersion": "v1",

"metadata": { },

"status": "Success",

"details": {

"name": "my-nginx",

"group": "extensions",

"kind": "deployments",

"uid": "665086e3-cdcd-11e7-83b5-080027d7dd6b"

}

It just deletes the deployment itself, after which the GC system has to figure out that the owner of the replica set is deleted. The replica set is then garbage collected.

Orphan Policy

The last option to delete an object is to use orphan propagation. This will remove the ownerReferences from the replica set, and delete the deployment.

curl -X DELETE localhost:8080/apis/extensions/v1beta1/namespaces/default/deployments/my-nginx -d '{"kind":"DeleteOptions","apiVersion":"v1","propagationPolicy":"Orphan"}' -H "Content-Type: application/json"

{

"kind": "Deployment",

"apiVersion": "extensions/v1beta1",

"metadata": {

"name": "my-nginx",

"creationTimestamp": "2017-11-20T08:55:07Z",

"deletionTimestamp": "2017-11-20T08:55:34Z",

"finalizers": [

"orphan"

]

"spec": { ... },

"status": {

"observedGeneration": 1,

"replicas": 1,

"updatedReplicas": 1,

"readyReplicas": 1,

"availableReplicas": 1,

"conditions": [

{

"type": "Available",

"status": "True",

"lastUpdateTime": "2017-11-20T08:55:07Z",

"lastTransitionTime": "2017-11-20T08:55:07Z",

"reason": "MinimumReplicasAvailable",

"message": "Deployment has minimum availability."

}

]

}

We now check all deployments, replica sets and pods. In the output we only see replica sets and pods, no deployments:

$ kubectl get deploy,rs,pod

NAME DESIRED CURRENT READY AGE

rs/my-nginx-85584476c8 1 1 1 1m

NAME READY STATUS RESTARTS AGE

po/my-nginx-85584476c8-gphf8 1/1 Running 0 1m

And indeed, the ownerReferences have been removed from the replica set…

$ kubectl get rs -o json | jq ".items[0].metadata.ownerReferences"

null

Want to learn more? Check out the Kubernetes reference manual section on Garbage Collection.