flake: GetWorkspaceAgentScriptTimingsByBuildID can deadlock with dbmem

causes flakes like here: https://github.com/coder/coder/runs/32527171455

The problem is that since https://github.com/coder/coder/pull/14923, `GetWorkspaceAgentScriptTimingsByBuildID()` is implemented in `dbmem` by calling other `Get*` functions from within the main function.  This causes us to attempt to lock the RWMutex a second time, event though we have already locked it.

```
goroutine 74405 [sync.RWMutex.RLock, 9 minutes]:
sync.runtime_SemacquireRWMutexR(0x0?, 0x0?, 0x1401f64d0c8?)
	/Users/runner/hostedtoolcache/go/1.22.8/arm64/src/runtime/sema.go:82 +0x28
sync.(*RWMutex).RLock(0x5?)
	/Users/runner/hostedtoolcache/go/1.22.8/arm64/src/sync/rwmutex.go:70 +0x64
github.com/coder/coder/v2/coderd/database/dbmem.(*FakeQuerier).GetWorkspaceBuildByID(_, {_, _}, {0x6, 0xf0, 0xd7, 0x9f, 0x7f, 0xa7, 0x4d, ...})
	/Users/runner/work/coder/coder/coderd/database/dbmem/dbmem.go:6406 +0x64
github.com/coder/coder/v2/coderd/database/dbmem.(*FakeQuerier).GetWorkspaceAgentScriptTimingsByBuildID(0x140044200d8, {0x1078e3298, 0x1401f7a1770}, {0x6, 0xf0, 0xd7, 0x9f, 0x7f, 0xa7, 0x4d, ...})
	/Users/runner/work/coder/coder/coderd/database/dbmem/dbmem.go:5857 +0xc8
github.com/coder/coder/v2/coderd/database/dbauthz.(*querier).GetWorkspaceAgentScriptTimingsByBuildID(0x14002a19830, {0x1078e3298, 0x1401f7a1770}, {0x6, 0xf0, 0xd7, 0x9f, 0x7f, 0xa7, 0x4d, ...})
```

At first glance, this might seem to work, since it's an RWMutex, and that allows multiple calls to `RLock()` to succeed simultaneously.  However, if anyone attempts to `Lock()` the mutex for writing, this causes all further `RLock()` calls to wait, giving the write lock priority. (If you have a system with many readers running concurrently, it could take an arbitrarily long time for them _all_ to unlock simultaneously, and writes could hang indefinitely if they are not given priority.) This then causes a deadlock, because the re-entrant `RLock` call blocks waiting for the write, which is blocked by the `RLock` further up the stack.

So, the rule is: **you must never attempt to lock the mutex if it is already locked**.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

flake: GetWorkspaceAgentScriptTimingsByBuildID can deadlock with dbmem #202

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

flake: GetWorkspaceAgentScriptTimingsByBuildID can deadlock with dbmem #202

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions