Skip to content

flake: GetWorkspaceAgentScriptTimingsByBuildID can deadlock with dbmem #202

Closed
coder/coder
#15405
@spikecurtis

Description

@spikecurtis

causes flakes like here: https://github.com/coder/coder/runs/32527171455

The problem is that since coder/coder#14923, GetWorkspaceAgentScriptTimingsByBuildID() is implemented in dbmem by calling other Get* functions from within the main function. This causes us to attempt to lock the RWMutex a second time, event though we have already locked it.

goroutine 74405 [sync.RWMutex.RLock, 9 minutes]:
sync.runtime_SemacquireRWMutexR(0x0?, 0x0?, 0x1401f64d0c8?)
	/Users/runner/hostedtoolcache/go/1.22.8/arm64/src/runtime/sema.go:82 +0x28
sync.(*RWMutex).RLock(0x5?)
	/Users/runner/hostedtoolcache/go/1.22.8/arm64/src/sync/rwmutex.go:70 +0x64
github.com/coder/coder/v2/coderd/database/dbmem.(*FakeQuerier).GetWorkspaceBuildByID(_, {_, _}, {0x6, 0xf0, 0xd7, 0x9f, 0x7f, 0xa7, 0x4d, ...})
	/Users/runner/work/coder/coder/coderd/database/dbmem/dbmem.go:6406 +0x64
github.com/coder/coder/v2/coderd/database/dbmem.(*FakeQuerier).GetWorkspaceAgentScriptTimingsByBuildID(0x140044200d8, {0x1078e3298, 0x1401f7a1770}, {0x6, 0xf0, 0xd7, 0x9f, 0x7f, 0xa7, 0x4d, ...})
	/Users/runner/work/coder/coder/coderd/database/dbmem/dbmem.go:5857 +0xc8
github.com/coder/coder/v2/coderd/database/dbauthz.(*querier).GetWorkspaceAgentScriptTimingsByBuildID(0x14002a19830, {0x1078e3298, 0x1401f7a1770}, {0x6, 0xf0, 0xd7, 0x9f, 0x7f, 0xa7, 0x4d, ...})

At first glance, this might seem to work, since it's an RWMutex, and that allows multiple calls to RLock() to succeed simultaneously. However, if anyone attempts to Lock() the mutex for writing, this causes all further RLock() calls to wait, giving the write lock priority. (If you have a system with many readers running concurrently, it could take an arbitrarily long time for them all to unlock simultaneously, and writes could hang indefinitely if they are not given priority.) This then causes a deadlock, because the re-entrant RLock call blocks waiting for the write, which is blocked by the RLock further up the stack.

So, the rule is: you must never attempt to lock the mutex if it is already locked.

Metadata

Metadata

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions