Skip to content

fix: avoid data race in session signal channel register/deregister #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 28, 2023

Conversation

mafredri
Copy link
Member

We recently added signal support to agentssh, however, I noticed the following goleak flake today:

=== Failed
=== FAIL: agent/agentssh  (0.00s)
PASS
goleak: Errors on successful test run: found unexpected goroutines:
[Goroutine 228 in state chan send (nil chan), with github.com/gliderlabs/ssh.(*session).Signals.func1 on top of the stack:
goroutine 228 [chan send (nil chan)]:
github.com/gliderlabs/ssh.(*session).Signals.func1()
	/home/runner/go/pkg/mod/github.com/coder/ssh@v0.0.0-20230621095435-9a7e23486f1c/session.go:256 +0xcf
created by github.com/gliderlabs/ssh.(*session).Signals
	/home/runner/go/pkg/mod/github.com/coder/ssh@v0.0.0-20230621095435-9a7e23486f1c/session.go:254 +0x176
]
FAIL	github.com/coder/coder/v2/agent/agentssh	1.870s

DONE 5618 tests, 48 skipped, 1 failure in 230.997s
Error: Process completed with exit code 1.

https://github.com/coder/coder/actions/runs/7003589751/job/19049631559?pr=10883

This made me take a closer look at the implementation and saw that it was racy. This PR fixes the race.

@mafredri mafredri force-pushed the mafredri/fix-signals-register-and-deregister branch from 0786b47 to f6b35a4 Compare November 27, 2023 15:25
@mafredri mafredri force-pushed the mafredri/fix-signals-register-and-deregister branch from f6b35a4 to 9fde237 Compare November 27, 2023 15:25
Copy link

@spikecurtis spikecurtis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One suggestion inline

}
go func() {
defer sess.sigMu.Unlock()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unorthodox enough to warrant a comment; it looked like a bug to me at first.

Copy link
Member Author

@mafredri mafredri Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to captured the motivation in 45772fa (and 773ac57) now, thoughts?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I meant is the fact that you're calling

defer sess.sigMu.Unlock()

without the corresponding Lock() in the same function looks very strange to me! We often talk about a function "holding" the lock while it runs, but here you "transfer" the lock from the Signals() function to the child goroutine so the parent can return.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understood that's what you meant. Did you feel the motivation for it was lacking?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, this is explicitly documented in https://pkg.go.dev/sync#Mutex.Unlock

A locked Mutex is not associated with a particular goroutine. It is allowed for one goroutine to lock a Mutex and then arrange for another goroutine to unlock it.

Not that I'm making any claims about this being standard in any way... I was simply trying to retrofit an edge-cases covered fix into the existing code without too many changes.

Interestingly, there's been a proposal to disallow this, but it didn't gain traction: golang/go#9201

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was actually just about the mechanics --- like

// here we're relying on the fact that the mutex was locked in the outer `Signal()`
// function, so this goroutine/function just has to unlock the mutex when it is done

@mafredri mafredri merged commit 70855de into master Nov 28, 2023
@mafredri mafredri deleted the mafredri/fix-signals-register-and-deregister branch November 28, 2023 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants