-
Notifications
You must be signed in to change notification settings - Fork 82
OADP-6653: CloudStorage stop retrying after 3 errors #1937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: oadp-dev
Are you sure you want to change the base?
Conversation
The operator was repeatedly logging "Secret already exists, updating" and "Following standardized STS workflow, secret created successfully" even when the secret content hadn't changed. This was happening because the CloudStorage controller calls STSStandardizedFlow() on every reconciliation, which always attempted to create the secret first, then caught the AlreadyExists error and performed an update. Changed the approach to: - First check if the secret exists - Compare existing data with desired data - Only update when there are actual differences - Skip updates and avoid logging when content is identical - Changed CloudStorage controller to use Debug level and more accurate message when STS secret is available (not necessarily created) This eliminates unnecessary API calls to the Kubernetes cluster and reduces noise in the operator logs. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Limit bucket creation retries to 3 attempts with exponential backoff (30s, 1m, 2m) - Add Conditions and RetryCount fields to CloudStorageStatus for tracking state - Set BucketReady=True when bucket is available, BucketCreationFailed=True after retry limit - Provide clear user guidance to recreate CR once permissions are fixed - Reset retry count on successful bucket creation or when bucket already exists - Update CRDs and OLM bundle with new status fields Resolves infinite retry loops on permission denied errors like GCP 403 responses. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kaovilai The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@kaovilai: This pull request references OADP-6653 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
- Add mock bucket client implementation for testing - Refactor all tests to use dependency injection via BucketClientFactory - Extract mock AWS credentials to a constant - Create helper function for test cloud credentials secret creation - Add helper function to find conditions by type - Improve test reliability by using mocks instead of actual bucket operations - Add retry logic and status conditions to CloudStorage controller 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Remove unused namespace parameter since it always receives the same value (test-namespace) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
from scrum backoff for transient/unknown 500 errs |
Why the changes were made
How to test the changes made