-
Notifications
You must be signed in to change notification settings - Fork 82
OADP-6653: CloudStorage exponential backoff by removing RequeueAfter #1937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: oadp-dev
Are you sure you want to change the base?
Conversation
The operator was repeatedly logging "Secret already exists, updating" and "Following standardized STS workflow, secret created successfully" even when the secret content hadn't changed. This was happening because the CloudStorage controller calls STSStandardizedFlow() on every reconciliation, which always attempted to create the secret first, then caught the AlreadyExists error and performed an update. Changed the approach to: - First check if the secret exists - Compare existing data with desired data - Only update when there are actual differences - Skip updates and avoid logging when content is identical - Changed CloudStorage controller to use Debug level and more accurate message when STS secret is available (not necessarily created) This eliminates unnecessary API calls to the Kubernetes cluster and reduces noise in the operator logs. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Skipping CI for Draft Pull Request. |
@kaovilai: This pull request references OADP-6653 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
from scrum backoff for transient/unknown 500 errs |
dbbfdab
to
3a9b0b4
Compare
- Add Conditions field to CloudStorageStatus for better observability - Implement exponential backoff by returning errors on bucket operations - Controller-runtime automatically handles retries (5ms to 1000s max) - Add condition constants for type-safe reason strings - Create mock bucket client for improved testing - Add comprehensive tests for backoff behavior and conditions Key improvements: - Standard Kubernetes pattern using built-in workqueue backoff - Self-healing: continues retrying with increasing delays - Better observability through status conditions - Per-item backoff: each CloudStorage CR gets independent retry timing 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
3a9b0b4
to
bb041c9
Compare
@kaovilai: This pull request references OADP-6653 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
if updateErr := b.Client.Status().Update(ctx, &bucket); updateErr != nil { | ||
logger.Error(updateErr, "failed to update CloudStorage status") | ||
} | ||
return ctrl.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ctrl.Result{}, err
makes it requeue with exponential backoff if err != nil
.
// Return error to trigger exponential backoff | ||
if err != nil { | ||
return ctrl.Result{}, err | ||
} | ||
return ctrl.Result{}, fmt.Errorf("bucket creation failed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ctrl.Result{}, err
makes it requeue with exponential backoff if err != nil
.
// Return error to trigger exponential backoff | ||
return ctrl.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return ctrl.Result{}, err
makes it requeue with exponential backoff if err != nil
.
/hold to rebase against #1936 |
Reviewers are asked to review only second commit for this PR. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kaovilai, sseago The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@kaovilai: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
This PR implements exponential backoff for the CloudStorage controller using Kubernetes-native libraries. The controller leverages controller-runtime's built-in workqueue backoff mechanism for automatic retries with increasing delays when bucket operations fail.
Key Changes
Exponential Backoff Implementation
Status Conditions for Observability
Conditions
field toCloudStorageStatus
for standard Kubernetes status reportingBucketReady
condition with multiple reason constants:ReasonBucketCreated
: Bucket successfully createdReasonBucketReady
: Bucket exists and is readyReasonBucketCreationFailed
: Bucket creation failed (triggers backoff)ReasonBucketCheckError
: Error checking bucket status (triggers backoff)ReasonSTSSecretError
: STS secret operation failed (triggers backoff)Improved Testing
mockBucketClient
for dependency injection in testsBucketClientFactory
to CloudStorageReconciler for test mockingBenefits
Testing
Related Issues
Note
Responses generated with Claude