-
Notifications
You must be signed in to change notification settings - Fork 970
chore: retry TestAgent_Dial subtests #19387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Adds a new wrapper function testutil.RunRetry that will run the provided function multiple times until the test succeeds or the limit is reached. To accomplish this without failing the parent test, we use a fake testing.TB implementation that swallows failures until the final attempt. Updates the TestAgent_Dial subtests to use this new wrapper. I believe the failures are coming from dropped UDP packets due to high load on the CI runner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your flake hypothesis sounds very much plausible, and this solution seems fine 👍 Just two minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the fakeT implementation will be a very useful addition to the testing package, thanks!
Although, I wonder if it's the right solution here. We could also fake the network stack instead, but at the same time we obviously lose a bit of realism. So just to be clear, and considering that, I'm fine with this solution.
I left a few suggestions, and I think we should definitely change the ctx passing in RunRetry
, but the other part of that suggestion is optional/for your consideration.
testutil/retry.go
Outdated
t.mu.Lock() | ||
defer t.mu.Unlock() | ||
t.failed = true | ||
t.T.Log("WARN: t.Fail called in testutil.RunRetry closure") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: We could give a hint here, like: rewrite test with error+early return if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure what you mean by this comment. I refactored the handler a bit though, so let me know if you still want changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simply meant the instances of WARN: XXX called in testutil.RunRetry closure
message may be a bit confusing when followed by runtime.Goexit
. So adding a little tip there how the user could rewrite their retrying test may be beneficial. But feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, in the stdlib testing package I don't think failing logs at all, so this is most certainly an improvement over that at least. Other than the log message getting added this matches the behavior of stdlib now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think let's just merge this, and if anyone gets confused we can change the logs very easily in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 👍🏻
Adds a new wrapper function testutil.RunRetry that will run the provided function multiple times until the test succeeds or the limit is reached. To accomplish this without failing the parent test, we use a fake testing.TB implementation that swallows failures until the final attempt.
Updates the TestAgent_Dial subtests to use this new wrapper. I believe the failures are coming from dropped UDP packets due to high load on the CI runner.
Closes coder/internal#595