-
Notifications
You must be signed in to change notification settings - Fork 904
chore: refactor agent connection updates #11301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current dependencies on/for this PR: This stack of pull requests is managed by Graphite. |
AnyTimes(). | ||
Return(database.WorkspaceBuild{ID: build.ID}, nil) | ||
|
||
go uut.mind(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relying on context to shut down this method its racy in that we may trigger goleak sporadically. Might be best to synchronize this to the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goleak waits for goroutines to complete up to a timeout. I don't believe we are in danger of triggering it unless the routine deadlocks, in which case goleak will have found a legit bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default for only 2 seconds I believe, and I don’t think it’s configurable. Plenty of time for a slow env like Windows runner to trigger the edge case, I’d say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even on a slow Windows runner, 2 seconds is absolutely ages. I've never seen goleak give a false positive on a goroutine that was unblocked and just waiting to be scheduled by the runtime and to finish its work. Have you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen such failures in the past, which is why I raised the concern. In my experience it can and has resulted in rare flakes. I'd like to concretely say which scenarios, but truth be told I don't remember.
I also can't say whether or not it's the case for the current Windows runners, the only thing I can clearly say is that for the old GH Actions Windows runner 2 seconds was definitely not a long time and such expectations often resulted in a flake. 😄 We run so many things in parallel it's unclear how much time is needed, the more things exit leaving other things behind, the closer we get to filling up that 2 second gap, I suppose.
e1f4d89
to
1f31780
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thing, but other than that this looks good 👍🏻.
PS. Now that I know about the uut
pattern, I kinda like it.
return | ||
} | ||
lastPing.Store(ptr.Ref(time.Now())) | ||
func (api *API) startAgentWebsocketMonitor(ctx context.Context, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we now have a close method, I don't think we need to take a context here anymore, wdyt? For start
we could simply create the context/cancel func based on api context here and pass it along.
PS. Thanks for the naming change, this now feels very obvious what it does when reading the code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the API context is used when sending the final updated that the agent has been disconnected.
The context we accept here, and want for start()
, is tied to the connection.
1f31780
to
61f5d26
Compare
Merge activity
|
Refactors the code that handles monitoring an agent websocket with pings and updating the connection times in the DB.
Consolidates v1 and v2 agent APIs under the same code for this.
One substantive change (not just a refactor) is that I've made it so that we actually disconnect if the agent fails to respond to our pings, rather than the old behavior where we would update the database, but not actually tear down the websocket.