-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDB Reaper #52
PDB Reaper #52
Conversation
ReapMisconfigured bool | ||
ReapMultiple bool | ||
ReapCrashLoop bool | ||
ReapablePodDisruptionBudgets []policyv1beta1.PodDisruptionBudget |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're copying a lot of the same data into multiple maps/slices. It would be more efficient if you stored here just the namespace & name in this ReapablePodDisruptionBudgets
slice as that's all that's required for the Delete call or use pointers *policyv1beta1.PodDisruptionBudget
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate?
Data is copied a single time when it's added to the slice of PDBs when it's detected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears you store the full object policyv1beta1.PodDisruptionBudget
in multiple places, maybe in a map you have the full object then you append it to the slice of ReapablePodDisruptionBudgets
, instead of copying by value multiple places you can use refs or maybe have a slice of namespace/name of pdb b/c that slice is only used for a delete call where namespace
and name
are required only.
Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
@@ -14,6 +14,10 @@ Two common problems observed in large Kubernetes clusters are: | |||
**node-reaper** provides the capability for worker nodes to be force terminated so that replacement ones come up. | |||
**pod-reaper** does a force termination of pods stuck in Terminating state for a certain amount of time. | |||
|
|||
In some cases, on multi-tenant platforms, users can own PDBs which block node rotation/drain if they are misconfigured, pods are in crashloop backoff, or if multiple PDBs are targeting the same pods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misbehaving worker nodes are okay to be terminated because they are brought back by various autoscaling mechanisms. Misbehaving pods are okay to be terminated because various higher level objects (Deployments, replicasets, etc.) bring back the terminated pods. Nodes and pods can be considered ephemeral in that sense.
However, this is not true with PDBs. If PDBs are deleted, users will have to find ways of bringing them back. It might be okay if they use GitOPS but in non-gitops envs, objects getting deleted could be problematic.
Not sure what's a good answer for preventing misbehaving PDBs from stopping node drains though. If there was a way to temporarily disable these PDBs during upgrades, that would've been ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added a comment on the above in the individual readme
Since pdb-reaper will not recreate the PDBs it deletes, deletion is
particularly useful in cases where GitOps is used, which can re-create the PDBs at a later time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users who just want visibility can run with --dry-run
and avoid deletion (which will publish events as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a future enhancement would be to save the PDB and then restore it later, presumably after the node was successfully drained? But then pdbreaper would need to be triggered when a node is about to be drained so the 'bad' PDBs can be temporarily deleted, like maybe triggered by upgrade-manager or lifecycle-manager. This is essentially what is recommended by this GCP Anthos link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds a lot of complexity, you would have to track node drains and figure out the right time to recreate - this is easy when doing this manually (such as the doc you linked), but doing it programmatically is much more complex ( complexity which we might not want to have).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. We might consider an enhancement to only delete "misconfigured" PDBs after some number of runs of pdbreaper.
Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
Codecov Report
@@ Coverage Diff @@
## master #52 +/- ##
==========================================
+ Coverage 55.42% 57.81% +2.39%
==========================================
Files 4 7 +3
Lines 821 1183 +362
==========================================
+ Hits 455 684 +229
- Misses 317 416 +99
- Partials 49 83 +34
Continue to review full report at Codecov.
|
Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
* initial impl. * add unit tests * Documentation * event publication Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com> * Update unit-test.yaml Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com> * Add disclaimer for production + extra flags Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com> * minor fixes Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com> * Update pdbreaper.go Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com> * exclude namespaces Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com> Signed-off-by: shaoxt <shaoxt@gmail.com>
TBD: