-
Notifications
You must be signed in to change notification settings - Fork 886
test flake: TestAgent_UpdatedDERP #9018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Ok, what's happening is a deadlock in tailscale while shutting down. tailscale calls However, the DERP receive function has side effects beyond just reading from DERP, including sometimes triggering a wireguard update.
The wireguard update blocks on (you guessed it!) acquiring the |
Looks like this was introduced recently on the upstream wireguard-go and pulled into our fork: coder/wireguard-go@b7cd547 However, it has not yet made it to tailscale's fork, so opening a bug against tailscale seems like poor form. |
Since the upstream commit claims that allowing IPC updates while shutting down is a problem, I think we shouldn't revert the commit. (It would also just be tech debt preventing us from taking upstream updates.) I don't see any reason why the |
Here's the original issue I opened against Tailscale a few months ago: tailscale/tailscale#8059 The commit I pulled in was originally meant to address this issue, but I don't think it fully solves it. My last attempt at upgrading Tailscale around then was fully stopped by this issue as CI was failing ~90% of the time with this issue, but I was only able to get the deadlock a handful of times this time. |
I see multiple goroutines inside I can't find an instance where this lock isn't taken, nor can I find something spuriously unlocking this mutex. Is the attached goroutine stack coming from a process running tests with t.Parallel() ? |
There are multiple instances of the userspace engine running (it's an integration test, so multiple speakers are present), so presence of multiple calls doesn't indicate that it isn't taking the lock. Closing the userspace engine closes the e.g.
But, it's definitely not as simple as just taking the lock while closing
so, also holding I'm currently testing a fix that has |
c.f. https://github.com/coder/coder/actions/runs/5817048961/job/15771103773?pr=8939
It looks like it might be an issue closing down the agent:
The text was updated successfully, but these errors were encountered: