Skip to content

Commit 1de023a

Browse files
authored
chore: add README to clock testing (#13583)
Adds README with some draft content explaining why the library exists. Will be most relevant when we spin out into a standalone library.
1 parent 1d3642d commit 1de023a

File tree

1 file changed

+269
-0
lines changed

1 file changed

+269
-0
lines changed

clock/README.md

+269
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
# Quartz
2+
3+
A Go time testing library for writing deterministic unit tests
4+
5+
_Note: Quartz is the name I'm targeting for the standalone open source project when we spin this
6+
out._
7+
8+
## Why another time testing library?
9+
10+
Writing good unit tests for components and functions that use the `time` package is difficult, even
11+
though several open source libraries exist. In building Quartz, we took some inspiration from
12+
13+
- [github.com/benbjohnson/clock](https://github.com/benbjohnson/clock)
14+
- Tailscale's [tstest.Clock](https://github.com/coder/tailscale/blob/main/tstest/clock.go)
15+
- [github.com/aspenmesh/tock](https://github.com/aspenmesh/tock)
16+
17+
Quartz shares the high level design of a `Clock` interface that closely resembles the functions in
18+
the `time` standard library, and a "real" clock passes thru to the standard library in production,
19+
while a mock clock gives precise control in testing.
20+
21+
Our high level goal is to write unit tests that
22+
23+
1. execute quickly
24+
2. don't flake
25+
3. are straightforward to write and understand
26+
27+
For several reasons, this is a tall order when it comes to code that depends on time, and we found
28+
the existing libraries insufficient for our goals.
29+
30+
For tests to execute quickly without flakes, we want to focus on _determinism_: the test should run
31+
the same each time, and it should be easy to force the system into a known state (no races) before
32+
executing test assertions. `time.Sleep`, `runtime.Gosched()`, and
33+
polling/[Eventually](https://pkg.go.dev/github.com/stretchr/testify/assert#Eventually) are all
34+
symptoms of an inability to do this easily.
35+
36+
### Preventing test flakes
37+
38+
The following example comes from the README from benbjohnson/clock:
39+
40+
```go
41+
mock := clock.NewMock()
42+
count := 0
43+
44+
// Kick off a timer to increment every 1 mock second.
45+
go func() {
46+
ticker := mock.Ticker(1 * time.Second)
47+
for {
48+
<-ticker.C
49+
count++
50+
}
51+
}()
52+
runtime.Gosched()
53+
54+
// Move the clock forward 10 seconds.
55+
mock.Add(10 * time.Second)
56+
57+
// This prints 10.
58+
fmt.Println(count)
59+
```
60+
61+
The first race condition is fairly obvious: moving the clock forward 10 seconds may generate 10
62+
ticks on the `ticker.C` channel, but there is no guarantee that `count++` executes before
63+
`fmt.Println(count)`.
64+
65+
The second race condition is more subtle, but `runtime.Gosched()` is the tell. Since the ticker
66+
is started on a separate goroutine, there is no guarantee that `mock.Ticker()` executes before
67+
`mock.Add()`. `runtime.Gosched()` is an attempt to get this to happen, but it makes no hard
68+
promises. On a busy system, especially when running tests in parallel, this can flake, advance the
69+
time 10 seconds first, then start the ticker and never generate a tick.
70+
71+
Let's talk about how Quartz tackles these problems.
72+
73+
In our experience, an extremely common use case is creating a ticker then doing a 2-arm `select`
74+
with ticks in one and context expiring in another, i.e.
75+
76+
```go
77+
t := time.NewTicker(duration)
78+
for {
79+
select {
80+
case <-ctx.Done():
81+
return ctx.Err()
82+
case <-t.C:
83+
err := do()
84+
if err != nil {
85+
return err
86+
}
87+
}
88+
}
89+
```
90+
91+
In Quartz, we refactor this to be more compact and testing friendly:
92+
93+
```go
94+
t := clock.TickerFunc(ctx, duration, do)
95+
return t.Wait()
96+
```
97+
98+
This affords the mock `Clock` the ability to explicitly know when processing of a tick is finished
99+
because it's wrapped in the function passed to `TickerFunc` (`do()` in this example).
100+
101+
In Quartz, when you advance the clock, you are returned an object you can `Wait()` on to ensure all
102+
ticks and timers triggered are finished. This solves the first race condition in the example.
103+
104+
(As an aside, we still support a traditional standard library-style `Ticker`. You may find it useful
105+
if you want to keep your code as close as possible to the standard library, or if you need to use
106+
the channel in a larger `select` block. In that case, you'll have to find some other mechanism to
107+
sync tick processing to your test code.)
108+
109+
To prevent race conditions related to the starting of the ticker, Quartz allows you to set "traps"
110+
for calls that access the clock.
111+
112+
```go
113+
func TestTicker(t *testing.T) {
114+
mClock := quartz.NewMock(t)
115+
trap := mClock.Trap().TickerFunc()
116+
defer trap.Close() // stop trapping at end
117+
go runMyTicker(mClock) // async calls TickerFunc()
118+
call := trap.Wait(context.Background()) // waits for a call and blocks its return
119+
call.Release() // allow the TickerFunc() call to return
120+
// optionally check the duration using call.Duration
121+
// Move the clock forward 1 tick
122+
mClock.Advance(time.Second).MustWait(context.Background())
123+
// assert results of the tick
124+
}
125+
```
126+
127+
Trapping and then releasing the call to `TickerFunc()` ensures the ticker is started at a
128+
deterministic time, so our calls to `Advance()` will have a predictable effect.
129+
130+
Take a look at `TestExampleTickerFunc` in `example_test.go` for a complete worked example.
131+
132+
### Complex time dependence
133+
134+
Another difficult issue to handle when unit testing is when some code under test makes multiple
135+
calls that depend on the time, and you want to simulate some time passing between them.
136+
137+
A very basic example is measuring how long something took:
138+
139+
```go
140+
var measurement time.Duration
141+
go func(clock quartz.Clock) {
142+
start := clock.Now()
143+
doSomething()
144+
measurement = clock.Since(start)
145+
}(mClock)
146+
147+
// how to get measurement to be, say, 5 seconds?
148+
```
149+
150+
The two calls into the clock happen asynchronously, so we need to be able to advance the clock after
151+
the first call to `Now()` but before the call to `Since()`. Doing this with the libraries we
152+
mentioned above means that you have to be able to mock out or otherwise block the completion of
153+
`doSomething()`.
154+
155+
But, with the trap functionality we mentioned in the previous section, you can deterministically
156+
control the time each call sees.
157+
158+
```go
159+
trap := mClock.Trap().Since()
160+
var measurement time.Duration
161+
go func(clock quartz.Clock) {
162+
start := clock.Now()
163+
doSomething()
164+
measurement = clock.Since(start)
165+
}(mClock)
166+
167+
c := trap.Wait(ctx)
168+
mClock.Advance(5*time.Second)
169+
c.Release()
170+
```
171+
172+
We wait until we trap the `clock.Since()` call, which implies that `clock.Now()` has completed, then
173+
advance the mock clock 5 seconds. Finally, we release the `clock.Since()` call. Any changes to the
174+
clock that happen _before_ we release the call will be included in the time used for the
175+
`clock.Since()` call.
176+
177+
As a more involved example, consider an inactivity timeout: we want something to happen if there is
178+
no activity recorded for some period, say 10 minutes in the following example:
179+
180+
```go
181+
type InactivityTimer struct {
182+
mu sync.Mutex
183+
activity time.Time
184+
clock quartz.Clock
185+
}
186+
187+
func (i *InactivityTimer) Start() {
188+
i.mu.Lock()
189+
defer i.mu.Unlock()
190+
next := i.clock.Until(i.activity.Add(10*time.Minute))
191+
t := i.clock.AfterFunc(next, func() {
192+
i.mu.Lock()
193+
defer i.mu.Unlock()
194+
next := i.clock.Until(i.activity.Add(10*time.Minute))
195+
if next == 0 {
196+
i.timeoutLocked()
197+
return
198+
}
199+
t.Reset(next)
200+
})
201+
}
202+
```
203+
204+
The actual contents of `timeoutLocked()` doesn't matter for this example, and assume there are other
205+
functions that record the latest `activity`.
206+
207+
We found that some time testing libraries hold a lock on the mock clock while calling the function
208+
passed to `AfterFunc`, resulting in a deadlock if you made clock calls from within.
209+
210+
Others allow this sort of thing, but don't have the flexibility to test edge cases. There is a
211+
subtle bug in our `Start()` function. The timer may pop a little late, and/or some measurable real
212+
time may elapse before `Until()` gets called inside the `AfterFunc`. If there hasn't been activity,
213+
`next` might be negative.
214+
215+
To test this in Quartz, we'll use a trap. We only want to trap the inner `Until()` call, not the
216+
initial one, so to make testing easier we can "tag" the call we want. Like this:
217+
218+
```go
219+
func (i *InactivityTimer) Start() {
220+
i.mu.Lock()
221+
defer i.mu.Unlock()
222+
next := i.clock.Until(i.activity.Add(10*time.Minute))
223+
t := i.clock.AfterFunc(next, func() {
224+
i.mu.Lock()
225+
defer i.mu.Unlock()
226+
next := i.clock.Until(i.activity.Add(10*time.Minute), "inner")
227+
if next == 0 {
228+
i.timeoutLocked()
229+
return
230+
}
231+
t.Reset(next)
232+
})
233+
}
234+
```
235+
236+
All Quartz `Clock` functions, and functions on returned timers and tickers support zero or more
237+
string tags that allow traps to match on them.
238+
239+
```go
240+
func TestInactivityTimer_Late(t *testing.T) {
241+
// set a timeout on the test itself, so that if Wait functions get blocked, we don't have to
242+
// wait for the default test timeout of 10 minutes.
243+
ctx, cancel := context.WithTimeout(10*time.Second)
244+
defer cancel()
245+
mClock := quartz.NewMock(t)
246+
trap := mClock.Trap.Until("inner")
247+
defer trap.Close()
248+
249+
it := &InactivityTimer{
250+
activity: mClock.Now(),
251+
clock: mClock,
252+
}
253+
it.Start()
254+
255+
// Trigger the AfterFunc
256+
w := mClock.Advance(10*time.Minute)
257+
c := trap.Wait(ctx)
258+
// Advance the clock a few ms to simulate a busy system
259+
mClock.Advance(3*time.Millisecond)
260+
c.Release() // Until() returns
261+
w.MustWait(ctx) // Wait for the AfterFunc to wrap up
262+
263+
// Assert that the timeoutLocked() function was called
264+
}
265+
```
266+
267+
This test case will fail with our bugged implementation, since the triggered AfterFunc won't call
268+
`timeoutLocked()` and instead will reset the timer with a negative number. The fix is easy, use
269+
`next <= 0` as the comparison.

0 commit comments

Comments
 (0)