-
Notifications
You must be signed in to change notification settings - Fork 885
fix: Close peer negotiate mutex if we haven't negotiated #1774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be some condition we can hit fairly often that will cause us trying to unlock the negotiateMutex twice.
Maybe instead of an atomic plus mutex, we should wrap a regular bool in a RWMutex instead?
I tried just throwing in a recover
in another defer
but that didn't seem to help here.
peer/conn.go
Outdated
c.negotiateMutex.Lock() | ||
} | ||
c.hasNegotiated = true | ||
defer c.negotiateMutex.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I can fairly reliably reproduce this issue:
fatal error: sync: unlock of unlocked mutex
goroutine 34 [running]:
runtime.throw({0x100648300?, 0x10050a930?})
/opt/homebrew/Cellar/go/1.18.2/libexec/src/runtime/panic.go:992 +0x50 fp=0x140000a7640 sp=0x140000a7610 pc=0x100141ec0
sync.throw({0x100648300?, 0x1009447ae?})
/opt/homebrew/Cellar/go/1.18.2/libexec/src/runtime/panic.go:978 +0x24 fp=0x140000a7660 sp=0x140000a7640 pc=0x10016fd44
sync.(*Mutex).unlockSlow(0x140000e60a0, 0xffffffff)
/opt/homebrew/Cellar/go/1.18.2/libexec/src/sync/mutex.go:220 +0x3c fp=0x140000a7690 sp=0x140000a7660 pc=0x10017e9dc
sync.(*Mutex).Unlock(0x0?)
/opt/homebrew/Cellar/go/1.18.2/libexec/src/sync/mutex.go:214 +0x64 fp=0x140000a76b0 sp=0x140000a7690 pc=0x10017e974
github.com/coder/coder/peer.(*Conn).negotiate.func1()
/Users/cian/src/cdr/coder/peer/conn.go:296 +0x2c fp=0x140000a76d0 sp=0x140000a76b0 pc=0x10050b7ac
github.com/coder/coder/peer.(*Conn).negotiate(0x140000e6000)
/Users/cian/src/cdr/coder/peer/conn.go:302 +0x2f0 fp=0x140000a7ec0 sp=0x140000a76d0 pc=0x10050a950
github.com/coder/coder/peer.(*Conn).negotiate-fm()
<autogenerated>:1 +0x2c fp=0x140000a7ee0 sp=0x140000a7ec0 pc=0x10050e3fc
github.com/pion/webrtc/v3.(*PeerConnection).negotiationNeededOp(0x140000bc000)
/Users/cian/go/pkg/mod/github.com/pion/webrtc/v3@v3.1.41/peerconnection.go:353 +0x174 fp=0x140000a7f50 sp=0x140000a7ee0 pc=0x1004c5c24
github.com/pion/webrtc/v3.(*PeerConnection).negotiationNeededOp-fm()
<autogenerated>:1 +0x2c fp=0x140000a7f70 sp=0x140000a7f50 pc=0x1004f171c
github.com/pion/webrtc/v3.(*operations).start(0x140000b2018)
/Users/cian/go/pkg/mod/github.com/pion/webrtc/v3@v3.1.41/operations.go:90 +0x60 fp=0x140000a7fb0 sp=0x140000a7f70 pc=0x1004c4290
github.com/pion/webrtc/v3.(*operations).Enqueue.func1()
/Users/cian/go/pkg/mod/github.com/pion/webrtc/v3@v3.1.41/operations.go:38 +0x2c fp=0x140000a7fd0 sp=0x140000a7fb0 pc=0x1004c3d1c
runtime.goexit()
/opt/homebrew/Cellar/go/1.18.2/libexec/src/runtime/asm_arm64.s:1263 +0x4 fp=0x140000a7fd0 sp=0x140000a7fd0 pc=0x1001756e4
created by github.com/pion/webrtc/v3.(*operations).Enqueue
/Users/cian/go/pkg/mod/github.com/pion/webrtc/v3@v3.1.41/operations.go:38 +0x274
EDIT: see Dean's comment below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh oops, yes I have this backwards!
peer/conn.go
Outdated
// It's possible the connection can be closed before negotiation has | ||
// began. In this case, we want to unlock the mutex to unblock any | ||
// pending candidates being flushed. | ||
if !c.hasNegotiated.Load() { | ||
c.negotiateMutex.Unlock() | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need this code at all. None of the negotiation code or ICE code should be blocking forever on close (or else we have bigger problems), and the unlocks are deferred so it should be fine.
This is what causes the panics btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can block on AddRemoteCandidate
since that needs to wait for negotiation to complete.
peer/conn.go
Outdated
case <-c.negotiated: | ||
default: | ||
close(c.negotiated) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't you close(c.closed)
sooner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, but you did bring up a good point about the second close not being needed. Fixed!
Closes #1706 and #1644.
The sleep was taken out a while back due to the belief that it might be fixed upstream, but that turned out to not be true 😢.