Skip to content

Conversation

kaovilai
Copy link
Member

@kaovilai kaovilai commented Sep 3, 2025

Why the changes were made

How to test the changes made

kaovilai and others added 2 commits September 3, 2025 08:39
The operator was repeatedly logging "Secret already exists, updating"
and "Following standardized STS workflow, secret created successfully"
even when the secret content hadn't changed. This was happening because
the CloudStorage controller calls STSStandardizedFlow() on every
reconciliation, which always attempted to create the secret first,
then caught the AlreadyExists error and performed an update.

Changed the approach to:
- First check if the secret exists
- Compare existing data with desired data
- Only update when there are actual differences
- Skip updates and avoid logging when content is identical
- Changed CloudStorage controller to use Debug level and more accurate
  message when STS secret is available (not necessarily created)

This eliminates unnecessary API calls to the Kubernetes cluster and
reduces noise in the operator logs.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Limit bucket creation retries to 3 attempts with exponential backoff (30s, 1m, 2m)
- Add Conditions and RetryCount fields to CloudStorageStatus for tracking state
- Set BucketReady=True when bucket is available, BucketCreationFailed=True after retry limit
- Provide clear user guidance to recreate CR once permissions are fixed
- Reset retry count on successful bucket creation or when bucket already exists
- Update CRDs and OLM bundle with new status fields

Resolves infinite retry loops on permission denied errors like GCP 403 responses.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 3, 2025
Copy link

openshift-ci bot commented Sep 3, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link

openshift-ci bot commented Sep 3, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kaovilai

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 3, 2025
@kaovilai kaovilai changed the title CloudStorage LimitedRetries OADP-6653: CloudStorage stop retrying after 3 errors Sep 3, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Sep 3, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Sep 3, 2025

@kaovilai: This pull request references OADP-6653 which is a valid jira issue.

In response to this:

Why the changes were made

How to test the changes made

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

kaovilai and others added 2 commits September 3, 2025 14:37
- Add mock bucket client implementation for testing
- Refactor all tests to use dependency injection via BucketClientFactory
- Extract mock AWS credentials to a constant
- Create helper function for test cloud credentials secret creation
- Add helper function to find conditions by type
- Improve test reliability by using mocks instead of actual bucket operations
- Add retry logic and status conditions to CloudStorage controller

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Remove unused namespace parameter since it always receives the same value (test-namespace)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@kaovilai
Copy link
Member Author

kaovilai commented Sep 4, 2025

from scrum backoff for transient/unknown 500 errs
if known issue then super long backoff. this would cover the case where there is even nothing to watch.. like STS tokens which on rotation do not result in secret updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants