Skip to content

Commit 3ddc56e

Browse files
authored
allow delete only if annotations meet configured criteria (zalando#1069)
* define annotations for delete protection * change log level and reduce log lines for e2e tests * reduce wait_for_pod_start even further
1 parent 0d81f97 commit 3ddc56e

18 files changed

+343
-19
lines changed

charts/postgres-operator/crds/operatorconfigurations.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,10 @@ spec:
117117
type: object
118118
additionalProperties:
119119
type: string
120+
delete_annotation_date_key:
121+
type: string
122+
delete_annotation_name_key:
123+
type: string
120124
downscaler_annotations:
121125
type: array
122126
items:

charts/postgres-operator/values-crd.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,12 @@ configKubernetes:
6767
# keya: valuea
6868
# keyb: valueb
6969

70+
# key name for annotation that compares manifest value with current date
71+
# delete_annotation_date_key: "delete-date"
72+
73+
# key name for annotation that compares manifest value with cluster name
74+
# delete_annotation_name_key: "delete-clustername"
75+
7076
# list of annotations propagated from cluster manifest to statefulset and deployment
7177
# downscaler_annotations:
7278
# - deployment-time

charts/postgres-operator/values.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,12 @@ configKubernetes:
6363
# annotations attached to each database pod
6464
# custom_pod_annotations: "keya:valuea,keyb:valueb"
6565

66+
# key name for annotation that compares manifest value with current date
67+
# delete_annotation_date_key: "delete-date"
68+
69+
# key name for annotation that compares manifest value with cluster name
70+
# delete_annotation_name_key: "delete-clustername"
71+
6672
# list of annotations propagated from cluster manifest to statefulset and deployment
6773
# downscaler_annotations: "deployment-time,downscaler/*"
6874

docs/administrator.md

Lines changed: 69 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Once the validation is enabled it can only be disabled manually by editing or
4444
patching the CRD manifest:
4545

4646
```bash
47-
zk8 patch crd postgresqls.acid.zalan.do -p '{"spec":{"validation": null}}'
47+
kubectl patch crd postgresqls.acid.zalan.do -p '{"spec":{"validation": null}}'
4848
```
4949

5050
## Non-default cluster domain
@@ -123,6 +123,68 @@ Every other Postgres cluster which lacks the annotation will be ignored by this
123123
operator. Conversely, operators without a defined `CONTROLLER_ID` will ignore
124124
clusters with defined ownership of another operator.
125125

126+
## Delete protection via annotations
127+
128+
To avoid accidental deletes of Postgres clusters the operator can check the
129+
manifest for two existing annotations containing the cluster name and/or the
130+
current date (in YYYY-MM-DD format). The name of the annotation keys can be
131+
defined in the configuration. By default, they are not set which disables the
132+
delete protection. Thus, one could choose to only go with one annotation.
133+
134+
**postgres-operator ConfigMap**
135+
136+
```yaml
137+
apiVersion: v1
138+
kind: ConfigMap
139+
metadata:
140+
name: postgres-operator
141+
data:
142+
delete_annotation_date_key: "delete-date"
143+
delete_annotation_name_key: "delete-clustername"
144+
```
145+
146+
**OperatorConfiguration**
147+
148+
```yaml
149+
apiVersion: "acid.zalan.do/v1"
150+
kind: OperatorConfiguration
151+
metadata:
152+
name: postgresql-operator-configuration
153+
configuration:
154+
kubernetes:
155+
delete_annotation_date_key: "delete-date"
156+
delete_annotation_name_key: "delete-clustername"
157+
```
158+
159+
Now, every cluster manifest must contain the configured annotation keys to
160+
trigger the delete process when running `kubectl delete pg`. Note, that the
161+
`Postgresql` resource would still get deleted as K8s' API server does not
162+
block it. Only the operator logs will tell, that the delete criteria wasn't
163+
met.
164+
165+
**cluster manifest**
166+
167+
```yaml
168+
apiVersion: "acid.zalan.do/v1"
169+
kind: postgresql
170+
metadata:
171+
name: demo-cluster
172+
annotations:
173+
delete-date: "2020-08-31"
174+
delete-clustername: "demo-cluster"
175+
spec:
176+
...
177+
```
178+
179+
In case, the resource has been deleted accidentally or the annotations were
180+
simply forgotten, it's safe to recreate the cluster with `kubectl create`.
181+
Existing Postgres cluster are not replaced by the operator. But, as the
182+
original cluster still exists the status will show `CreateFailed` at first.
183+
On the next sync event it should change to `Running`. However, as it is in
184+
fact a new resource for K8s, the UID will differ which can trigger a rolling
185+
update of the pods because the UID is used as part of backup path to S3.
186+
187+
126188
## Role-based access control for the operator
127189

128190
The manifest [`operator-service-account-rbac.yaml`](../manifests/operator-service-account-rbac.yaml)
@@ -586,11 +648,11 @@ The configuration paramaters that we will be using are:
586648
* `gcp_credentials`
587649
* `wal_gs_bucket`
588650

589-
### Generate a K8 secret resource
651+
### Generate a K8s secret resource
590652

591-
Generate the K8 secret resource that will contain your service account's
653+
Generate the K8s secret resource that will contain your service account's
592654
credentials. It's highly recommended to use a service account and limit its
593-
scope to just the WAL-E bucket.
655+
scope to just the WAL-E bucket.
594656

595657
```yaml
596658
apiVersion: v1
@@ -613,13 +675,13 @@ the operator's configuration is set up like the following:
613675
...
614676
aws_or_gcp:
615677
additional_secret_mount: "pgsql-wale-creds"
616-
additional_secret_mount_path: "/var/secrets/google" # or where ever you want to mount the file
678+
additional_secret_mount_path: "/var/secrets/google" # or where ever you want to mount the file
617679
# aws_region: eu-central-1
618680
# kube_iam_role: ""
619681
# log_s3_bucket: ""
620682
# wal_s3_bucket: ""
621-
wal_gs_bucket: "postgres-backups-bucket-28302F2" # name of bucket on where to save the WAL-E logs
622-
gcp_credentials: "/var/secrets/google/key.json" # combination of the mount path & key in the K8 resource. (i.e. key.json)
683+
wal_gs_bucket: "postgres-backups-bucket-28302F2" # name of bucket on where to save the WAL-E logs
684+
gcp_credentials: "/var/secrets/google/key.json" # combination of the mount path & key in the K8s resource. (i.e. key.json)
623685
...
624686
```
625687

docs/reference/operator_parameters.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,16 @@ configuration they are grouped under the `kubernetes` key.
200200
of a database created by the operator. If the annotation key is also provided
201201
by the database definition, the database definition value is used.
202202

203+
* **delete_annotation_date_key**
204+
key name for annotation that compares manifest value with current date in the
205+
YYYY-MM-DD format. Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`.
206+
The default is empty which also disables this delete protection check.
207+
208+
* **delete_annotation_name_key**
209+
key name for annotation that compares manifest value with Postgres cluster name.
210+
Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`. The default is
211+
empty which also disables this delete protection check.
212+
203213
* **downscaler_annotations**
204214
An array of annotations that should be passed from Postgres CRD on to the
205215
statefulset and, if exists, to the connection pooler deployment as well.

e2e/requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
kubernetes==9.0.0
1+
kubernetes==11.0.0
22
timeout_decorator==0.4.1
3-
pyyaml==5.1
3+
pyyaml==5.3.1

e2e/tests/test_e2e.py

Lines changed: 87 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
import os
88
import yaml
99

10+
from datetime import datetime
1011
from kubernetes import client, config
1112

1213

@@ -614,6 +615,71 @@ def test_infrastructure_roles(self):
614615
"Origin": 2,
615616
})
616617

618+
@timeout_decorator.timeout(TEST_TIMEOUT_SEC)
619+
def test_x_cluster_deletion(self):
620+
'''
621+
Test deletion with configured protection
622+
'''
623+
k8s = self.k8s
624+
cluster_label = 'application=spilo,cluster-name=acid-minimal-cluster'
625+
626+
# configure delete protection
627+
patch_delete_annotations = {
628+
"data": {
629+
"delete_annotation_date_key": "delete-date",
630+
"delete_annotation_name_key": "delete-clustername"
631+
}
632+
}
633+
k8s.update_config(patch_delete_annotations)
634+
635+
# this delete attempt should be omitted because of missing annotations
636+
k8s.api.custom_objects_api.delete_namespaced_custom_object(
637+
"acid.zalan.do", "v1", "default", "postgresqls", "acid-minimal-cluster")
638+
639+
# check that pods and services are still there
640+
k8s.wait_for_running_pods(cluster_label, 2)
641+
k8s.wait_for_service(cluster_label)
642+
643+
# recreate Postgres cluster resource
644+
k8s.create_with_kubectl("manifests/minimal-postgres-manifest.yaml")
645+
646+
# wait a little before proceeding
647+
time.sleep(10)
648+
649+
# add annotations to manifest
650+
deleteDate = datetime.today().strftime('%Y-%m-%d')
651+
pg_patch_delete_annotations = {
652+
"metadata": {
653+
"annotations": {
654+
"delete-date": deleteDate,
655+
"delete-clustername": "acid-minimal-cluster",
656+
}
657+
}
658+
}
659+
k8s.api.custom_objects_api.patch_namespaced_custom_object(
660+
"acid.zalan.do", "v1", "default", "postgresqls", "acid-minimal-cluster", pg_patch_delete_annotations)
661+
662+
# wait a little before proceeding
663+
time.sleep(10)
664+
k8s.wait_for_running_pods(cluster_label, 2)
665+
k8s.wait_for_service(cluster_label)
666+
667+
# now delete process should be triggered
668+
k8s.api.custom_objects_api.delete_namespaced_custom_object(
669+
"acid.zalan.do", "v1", "default", "postgresqls", "acid-minimal-cluster")
670+
671+
# wait until cluster is deleted
672+
time.sleep(120)
673+
674+
# check if everything has been deleted
675+
self.assertEqual(0, k8s.count_pods_with_label(cluster_label))
676+
self.assertEqual(0, k8s.count_services_with_label(cluster_label))
677+
self.assertEqual(0, k8s.count_endpoints_with_label(cluster_label))
678+
self.assertEqual(0, k8s.count_statefulsets_with_label(cluster_label))
679+
self.assertEqual(0, k8s.count_deployments_with_label(cluster_label))
680+
self.assertEqual(0, k8s.count_pdbs_with_label(cluster_label))
681+
self.assertEqual(0, k8s.count_secrets_with_label(cluster_label))
682+
617683
def get_failover_targets(self, master_node, replica_nodes):
618684
'''
619685
If all pods live on the same node, failover will happen to other worker(s)
@@ -700,11 +766,12 @@ def __init__(self):
700766
self.apps_v1 = client.AppsV1Api()
701767
self.batch_v1_beta1 = client.BatchV1beta1Api()
702768
self.custom_objects_api = client.CustomObjectsApi()
769+
self.policy_v1_beta1 = client.PolicyV1beta1Api()
703770

704771

705772
class K8s:
706773
'''
707-
Wraps around K8 api client and helper methods.
774+
Wraps around K8s api client and helper methods.
708775
'''
709776

710777
RETRY_TIMEOUT_SEC = 10
@@ -755,14 +822,6 @@ def wait_for_pod_start(self, pod_labels, namespace='default'):
755822
if pods:
756823
pod_phase = pods[0].status.phase
757824

758-
if pods and pod_phase != 'Running':
759-
pod_name = pods[0].metadata.name
760-
response = self.api.core_v1.read_namespaced_pod(
761-
name=pod_name,
762-
namespace=namespace
763-
)
764-
print("Pod description {}".format(response))
765-
766825
time.sleep(self.RETRY_TIMEOUT_SEC)
767826

768827
def get_service_type(self, svc_labels, namespace='default'):
@@ -824,6 +883,25 @@ def get_services():
824883
def count_pods_with_label(self, labels, namespace='default'):
825884
return len(self.api.core_v1.list_namespaced_pod(namespace, label_selector=labels).items)
826885

886+
def count_services_with_label(self, labels, namespace='default'):
887+
return len(self.api.core_v1.list_namespaced_service(namespace, label_selector=labels).items)
888+
889+
def count_endpoints_with_label(self, labels, namespace='default'):
890+
return len(self.api.core_v1.list_namespaced_endpoints(namespace, label_selector=labels).items)
891+
892+
def count_secrets_with_label(self, labels, namespace='default'):
893+
return len(self.api.core_v1.list_namespaced_secret(namespace, label_selector=labels).items)
894+
895+
def count_statefulsets_with_label(self, labels, namespace='default'):
896+
return len(self.api.apps_v1.list_namespaced_stateful_set(namespace, label_selector=labels).items)
897+
898+
def count_deployments_with_label(self, labels, namespace='default'):
899+
return len(self.api.apps_v1.list_namespaced_deployment(namespace, label_selector=labels).items)
900+
901+
def count_pdbs_with_label(self, labels, namespace='default'):
902+
return len(self.api.policy_v1_beta1.list_namespaced_pod_disruption_budget(
903+
namespace, label_selector=labels).items)
904+
827905
def wait_for_pod_failover(self, failover_targets, labels, namespace='default'):
828906
pod_phase = 'Failing over'
829907
new_pod_node = ''

manifests/complete-postgres-manifest.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ metadata:
66
# environment: demo
77
# annotations:
88
# "acid.zalan.do/controller": "second-operator"
9+
# "delete-date": "2020-08-31" # can only be deleted on that day if "delete-date "key is configured
10+
# "delete-clustername": "acid-test-cluster" # can only be deleted when name matches if "delete-clustername" key is configured
911
spec:
1012
dockerImage: registry.opensource.zalan.do/acid/spilo-12:1.6-p3
1113
teamId: "acid"
@@ -34,7 +36,7 @@ spec:
3436
defaultUsers: false
3537
postgresql:
3638
version: "12"
37-
parameters: # Expert section
39+
parameters: # Expert section
3840
shared_buffers: "32MB"
3941
max_connections: "10"
4042
log_statement: "all"

manifests/configmap.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,8 @@ data:
2929
# default_cpu_request: 100m
3030
# default_memory_limit: 500Mi
3131
# default_memory_request: 100Mi
32+
# delete_annotation_date_key: delete-date
33+
# delete_annotation_name_key: delete-clustername
3234
docker_image: registry.opensource.zalan.do/acid/spilo-12:1.6-p3
3335
# downscaler_annotations: "deployment-time,downscaler/*"
3436
# enable_admin_role_for_users: "true"

manifests/operatorconfiguration.crd.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,10 @@ spec:
113113
type: object
114114
additionalProperties:
115115
type: string
116+
delete_annotation_date_key:
117+
type: string
118+
delete_annotation_name_key:
119+
type: string
116120
downscaler_annotations:
117121
type: array
118122
items:

manifests/postgresql-operator-default-configuration.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ configuration:
3131
# custom_pod_annotations:
3232
# keya: valuea
3333
# keyb: valueb
34+
# delete_annotation_date_key: delete-date
35+
# delete_annotation_name_key: delete-clustername
3436
# downscaler_annotations:
3537
# - deployment-time
3638
# - downscaler/*

pkg/apis/acid.zalan.do/v1/crds.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -888,6 +888,12 @@ var OperatorConfigCRDResourceValidation = apiextv1beta1.CustomResourceValidation
888888
},
889889
},
890890
},
891+
"delete_annotation_date_key": {
892+
Type: "string",
893+
},
894+
"delete_annotation_name_key": {
895+
Type: "string",
896+
},
891897
"downscaler_annotations": {
892898
Type: "array",
893899
Items: &apiextv1beta1.JSONSchemaPropsOrArray{

pkg/apis/acid.zalan.do/v1/operator_configuration_type.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,8 @@ type KubernetesMetaConfiguration struct {
6666
InheritedLabels []string `json:"inherited_labels,omitempty"`
6767
DownscalerAnnotations []string `json:"downscaler_annotations,omitempty"`
6868
ClusterNameLabel string `json:"cluster_name_label,omitempty"`
69+
DeleteAnnotationDateKey string `json:"delete_annotation_date_key,omitempty"`
70+
DeleteAnnotationNameKey string `json:"delete_annotation_name_key,omitempty"`
6971
NodeReadinessLabel map[string]string `json:"node_readiness_label,omitempty"`
7072
CustomPodAnnotations map[string]string `json:"custom_pod_annotations,omitempty"`
7173
// TODO: use a proper toleration structure?

0 commit comments

Comments
 (0)