-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Fix flaky tests by running slow deletion as a background task #7159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems OK to me, but I'll defer to @averikitsch or @grayside.
If this is a best practice for cloud run products or for samples that use gcloud within, should we add this to the authoring guide? |
We usually adjust timeouts and add retries per gcloud command in an effort to guarantee the process completes successfully before the next test run. Cloud Run automatically persists unspecified configuration to new deployments, so if a deletion fails, any configuration bits that weren't explicitly set for a new test will be inherited and potentially disrupt service behaviors. This is not a problem if the Cloud Run service is named per build. If deletions do tend to fail, there's also a risk of reaching the 1000 service and 1000 revision quota limits, which would lead to test failures. |
So far, the deletion has never failed to complete after bubbling up the error. There are no lingering services in the project. Additionally, the services are named per test run. I think any risks associated with this change are quite low and it will fix the failing tests. |
There have been many flaky tests failing on
gcloud run services delete
. The underlying error seems to show that the deletion call is slow, but it succeeds eventually. I am proposing running this asynchronously and trusting that it will delete.