Skip to content

chore: retry TestAgent_Dial subtests #19387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 18, 2025
Merged

chore: retry TestAgent_Dial subtests #19387

merged 5 commits into from
Aug 18, 2025

Conversation

deansheather
Copy link
Member

Adds a new wrapper function testutil.RunRetry that will run the provided function multiple times until the test succeeds or the limit is reached. To accomplish this without failing the parent test, we use a fake testing.TB implementation that swallows failures until the final attempt.

Updates the TestAgent_Dial subtests to use this new wrapper. I believe the failures are coming from dropped UDP packets due to high load on the CI runner.

Closes coder/internal#595

Adds a new wrapper function testutil.RunRetry that will run the provided
function multiple times until the test succeeds or the limit is reached.
To accomplish this without failing the parent test, we use a fake
testing.TB implementation that swallows failures until the final
attempt.

Updates the TestAgent_Dial subtests to use this new wrapper. I believe
the failures are coming from dropped UDP packets due to high load on the
CI runner.
Copy link
Member

@ethanndickson ethanndickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your flake hypothesis sounds very much plausible, and this solution seems fine 👍 Just two minor comments.

Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having the fakeT implementation will be a very useful addition to the testing package, thanks!

Although, I wonder if it's the right solution here. We could also fake the network stack instead, but at the same time we obviously lose a bit of realism. So just to be clear, and considering that, I'm fine with this solution.

I left a few suggestions, and I think we should definitely change the ctx passing in RunRetry, but the other part of that suggestion is optional/for your consideration.

t.mu.Lock()
defer t.mu.Unlock()
t.failed = true
t.T.Log("WARN: t.Fail called in testutil.RunRetry closure")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: We could give a hint here, like: rewrite test with error+early return if needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure what you mean by this comment. I refactored the handler a bit though, so let me know if you still want changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply meant the instances of WARN: XXX called in testutil.RunRetry closure message may be a bit confusing when followed by runtime.Goexit. So adding a little tip there how the user could rewrite their retrying test may be beneficial. But feel free to ignore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, in the stdlib testing package I don't think failing logs at all, so this is most certainly an improvement over that at least. Other than the log message getting added this matches the behavior of stdlib now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think let's just merge this, and if anyone gets confused we can change the logs very easily in the future.

@deansheather deansheather requested a review from mafredri August 18, 2025 13:40
Copy link
Member

@mafredri mafredri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 👍🏻

@deansheather deansheather enabled auto-merge (squash) August 18, 2025 13:45
@deansheather deansheather merged commit e2ba9e7 into main Aug 18, 2025
30 checks passed
@deansheather deansheather deleted the dean/flake-agent-dial branch August 18, 2025 13:51
@github-actions github-actions bot locked and limited conversation to collaborators Aug 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flake: TestAgent_Dial/UDP
3 participants