Skip to content

Commit 741d7f1

Browse files
committed
Use annotations to reduce instability of isolation-test results.
We've long contended with isolation test results that aren't entirely stable. Some test scripts insert long delays to try to force stable results, which is not terribly desirable; but other erratic failure modes remain, causing unrepeatable buildfarm failures. I've spent a fair amount of time trying to solve this by improving the server-side support code, without much success: that way is fundamentally unable to cope with diffs that stem from chance ordering of arrival of messages from different server processes. We can improve matters on the client side, however, by annotating the test scripts themselves to show the desired reporting order of events that might occur in different orders. This patch adds three types of annotations to deal with (a) test steps that might or might not complete their waits before the isolationtester can see them waiting; (b) test steps in different sessions that can legitimately complete in either order; and (c) NOTIFY messages that might arrive before or after the completion of a step in another session. We might need more annotation types later, but this seems to be enough to deal with the instabilities we've seen in the buildfarm. It also lets us get rid of all the long delays that were previously used, cutting more than a minute off the runtime of the isolation tests. Back-patch to all supported branches, because the buildfarm instabilities affect all the branches, and because it seems desirable to keep isolationtester's capabilities the same across all branches to simplify possible future back-patching of tests. Discussion: https://postgr.es/m/327948.1623725828@sss.pgh.pa.us
1 parent d102aaf commit 741d7f1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1077
-723
lines changed

contrib/test_decoding/expected/concurrent_ddl_dml.out

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,7 @@ step s1_insert_tbl2: INSERT INTO tbl2 (val1, val2) VALUES (1, 1);
257257
step s2_alter_tbl1_boolean: ALTER TABLE tbl1 ALTER COLUMN val2 TYPE boolean; <waiting ...>
258258
step s1_commit: COMMIT;
259259
step s2_alter_tbl1_boolean: <... completed>
260-
error in steps s1_commit s2_alter_tbl1_boolean: ERROR: column "val2" cannot be cast automatically to type boolean
260+
ERROR: column "val2" cannot be cast automatically to type boolean
261261
step s2_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
262262
data
263263

src/test/isolation/README

Lines changed: 79 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -57,11 +57,16 @@ Test specification
5757
==================
5858

5959
Each isolation test is defined by a specification file, stored in the specs
60-
subdirectory. A test specification consists of four parts, in this order:
60+
subdirectory. A test specification defines some SQL "steps", groups them
61+
into "sessions" (where all the steps of one session will be run in the
62+
same backend), and specifies the "permutations" or orderings of the steps
63+
that are to be run.
64+
65+
A test specification consists of four parts, in this order:
6166

6267
setup { <SQL> }
6368

64-
The given SQL block is executed once, in one session only, before running
69+
The given SQL block is executed once (per permutation) before running
6570
the test. Create any test tables or other required objects here. This
6671
part is optional. Multiple setup blocks are allowed if needed; each is
6772
run separately, in the given order. (The reason for allowing multiple
@@ -81,8 +86,8 @@ session "<name>"
8186
session is executed in its own connection. A session part consists
8287
of three parts: setup, teardown and one or more "steps". The per-session
8388
setup and teardown parts have the same syntax as the per-test setup and
84-
teardown described above, but they are executed in each session. The
85-
setup part typically contains a "BEGIN" command to begin a transaction.
89+
teardown described above, but they are executed in each session. The setup
90+
part might, for example, contain a "BEGIN" command to begin a transaction.
8691

8792
Each step has the syntax
8893

@@ -101,7 +106,8 @@ permutation "<step name>" ...
101106
order). Note that the list of steps in a manually specified
102107
"permutation" line doesn't actually have to be a permutation of the
103108
available steps; it could for instance repeat some steps more than once,
104-
or leave others out.
109+
or leave others out. Also, each step name can be annotated with some
110+
parenthesized markers, which are described below.
105111

106112
Lines beginning with a # are considered comments.
107113

@@ -110,7 +116,8 @@ specified in the spec file, or automatically generated), the isolation
110116
tester runs the main setup part, then per-session setup parts, then
111117
the selected session steps, then per-session teardown, then the main
112118
teardown script. Each selected step is sent to the connection associated
113-
with its session.
119+
with its session. The main setup and teardown scripts are run in a
120+
separate "control" session.
114121

115122

116123
Support for blocking commands
@@ -129,3 +136,69 @@ tests take a very long time to run, and they serve no useful testing purpose.
129136
Note that isolationtester recognizes that a command has blocked by looking
130137
to see if it is shown as waiting in the pg_locks view; therefore, only
131138
blocks on heavyweight locks will be detected.
139+
140+
141+
Dealing with race conditions
142+
============================
143+
144+
In some cases, the isolationtester's output for a test script may vary
145+
due to timing issues. One way to deal with that is to create variant
146+
expected-files, which follow the usual PG convention that variants for
147+
foo.spec are named foo_1.out, foo_2.out, etc. However, this method is
148+
discouraged since the extra files are a nuisance for maintenance.
149+
Instead, it's usually possible to stabilize the test output by applying
150+
special markers to some of the step names listed in a permutation line.
151+
152+
The general form of a permutation entry is
153+
154+
<step name> [ ( <marker> [ , <marker> ... ] ) ]
155+
156+
where each marker defines a "blocking condition". The step will not be
157+
reported as completed before all the blocking conditions are satisfied.
158+
The possible markers are:
159+
160+
*
161+
<other step name>
162+
<other step name> notices <n>
163+
164+
An asterisk marker, such as mystep(*), forces the isolationtester to
165+
report the step as "waiting" as soon as it's been launched, regardless of
166+
whether it would have been detected as waiting later. This is useful for
167+
stabilizing cases that are sometimes reported as waiting and other times
168+
reported as immediately completing, depending on the relative speeds of
169+
the step and the isolationtester's status-monitoring queries.
170+
171+
A marker consisting solely of a step name indicates that this step may
172+
not be reported as completing until that other step has completed.
173+
This allows stabilizing cases where two queries might be seen to complete
174+
in either order. Note that this step can be *launched* before the other
175+
step has completed. (If the other step is used more than once in the
176+
current permutation, this step cannot complete while any of those
177+
instances is active.)
178+
179+
A marker of the form "<other step name> notices <n>" (where <n> is a
180+
positive integer) indicates that this step may not be reported as
181+
completing until the other step's session has returned at least <n>
182+
NOTICE messages, counting from when this step is launched. This is useful
183+
for stabilizing cases where a step can return NOTICE messages before it
184+
actually completes, and those messages must be synchronized with the
185+
completions of other steps.
186+
187+
Notice that these markers can only delay reporting of the completion
188+
of a step, not the launch of a step. The isolationtester will launch
189+
the next step in a permutation as soon as (A) all prior steps of the
190+
same session are done, and (B) the immediately preceding step in the
191+
permutation is done or deemed blocked. For this purpose, "deemed
192+
blocked" means that it has been seen to be waiting on a database lock,
193+
or that it is complete but the report of its completion is delayed by
194+
one of these markers.
195+
196+
In some cases it is important not to launch a step until after the
197+
completion of a step in another session that could have been deemed
198+
blocked. An example is that if step s1 in session A is issuing a
199+
cancel for step s2 in session B, we'd better not launch B's next step
200+
till we're sure s1 is done. If s1 is blockable, trouble could ensue.
201+
The best way to prevent that is to create an empty step in session A
202+
and run it, without any markers, just before the next session B step.
203+
The empty step cannot be launched until s1 is done, and in turn the
204+
next session B step cannot be launched until the empty step finishes.

src/test/isolation/expected/alter-table-3.out

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ i
5454
step s2c: INSERT INTO a VALUES (0); <waiting ...>
5555
step s1d: COMMIT;
5656
step s2c: <... completed>
57-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
57+
ERROR: duplicate key value violates unique constraint "a_pkey"
5858
step s2d: COMMIT;
5959

6060
starting permutation: s1a s1b s2a s1c s1d s2b s2c s2d
@@ -97,7 +97,7 @@ i
9797
step s2c: INSERT INTO a VALUES (0); <waiting ...>
9898
step s1d: COMMIT;
9999
step s2c: <... completed>
100-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
100+
ERROR: duplicate key value violates unique constraint "a_pkey"
101101
step s2d: COMMIT;
102102

103103
starting permutation: s1a s1b s2a s2b s1c s1d s2c s2d
@@ -126,7 +126,7 @@ step s1c: ALTER TABLE a ENABLE TRIGGER t;
126126
step s2c: INSERT INTO a VALUES (0); <waiting ...>
127127
step s1d: COMMIT;
128128
step s2c: <... completed>
129-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
129+
ERROR: duplicate key value violates unique constraint "a_pkey"
130130
step s2d: COMMIT;
131131

132132
starting permutation: s1a s1b s2a s2b s2c s1c s1d s2d
@@ -141,7 +141,7 @@ step s2c: INSERT INTO a VALUES (0); <waiting ...>
141141
step s1c: ALTER TABLE a ENABLE TRIGGER t;
142142
step s1d: COMMIT;
143143
step s2c: <... completed>
144-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
144+
ERROR: duplicate key value violates unique constraint "a_pkey"
145145
step s2d: COMMIT;
146146

147147
starting permutation: s1a s2a s1b s1c s1d s2b s2c s2d
@@ -184,7 +184,7 @@ i
184184
step s2c: INSERT INTO a VALUES (0); <waiting ...>
185185
step s1d: COMMIT;
186186
step s2c: <... completed>
187-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
187+
ERROR: duplicate key value violates unique constraint "a_pkey"
188188
step s2d: COMMIT;
189189

190190
starting permutation: s1a s2a s1b s2b s1c s1d s2c s2d
@@ -213,7 +213,7 @@ step s1c: ALTER TABLE a ENABLE TRIGGER t;
213213
step s2c: INSERT INTO a VALUES (0); <waiting ...>
214214
step s1d: COMMIT;
215215
step s2c: <... completed>
216-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
216+
ERROR: duplicate key value violates unique constraint "a_pkey"
217217
step s2d: COMMIT;
218218

219219
starting permutation: s1a s2a s1b s2b s2c s1c s1d s2d
@@ -228,7 +228,7 @@ step s2c: INSERT INTO a VALUES (0); <waiting ...>
228228
step s1c: ALTER TABLE a ENABLE TRIGGER t;
229229
step s1d: COMMIT;
230230
step s2c: <... completed>
231-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
231+
ERROR: duplicate key value violates unique constraint "a_pkey"
232232
step s2d: COMMIT;
233233

234234
starting permutation: s1a s2a s2b s1b s1c s1d s2c s2d
@@ -257,7 +257,7 @@ step s1c: ALTER TABLE a ENABLE TRIGGER t;
257257
step s2c: INSERT INTO a VALUES (0); <waiting ...>
258258
step s1d: COMMIT;
259259
step s2c: <... completed>
260-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
260+
ERROR: duplicate key value violates unique constraint "a_pkey"
261261
step s2d: COMMIT;
262262

263263
starting permutation: s1a s2a s2b s1b s2c s1c s1d s2d
@@ -272,7 +272,7 @@ step s2c: INSERT INTO a VALUES (0); <waiting ...>
272272
step s1c: ALTER TABLE a ENABLE TRIGGER t;
273273
step s1d: COMMIT;
274274
step s2c: <... completed>
275-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
275+
ERROR: duplicate key value violates unique constraint "a_pkey"
276276
step s2d: COMMIT;
277277

278278
starting permutation: s1a s2a s2b s2c s1b s1c s1d s2d
@@ -371,7 +371,7 @@ i
371371
step s2c: INSERT INTO a VALUES (0); <waiting ...>
372372
step s1d: COMMIT;
373373
step s2c: <... completed>
374-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
374+
ERROR: duplicate key value violates unique constraint "a_pkey"
375375
step s2d: COMMIT;
376376

377377
starting permutation: s2a s1a s1b s2b s1c s1d s2c s2d
@@ -400,7 +400,7 @@ step s1c: ALTER TABLE a ENABLE TRIGGER t;
400400
step s2c: INSERT INTO a VALUES (0); <waiting ...>
401401
step s1d: COMMIT;
402402
step s2c: <... completed>
403-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
403+
ERROR: duplicate key value violates unique constraint "a_pkey"
404404
step s2d: COMMIT;
405405

406406
starting permutation: s2a s1a s1b s2b s2c s1c s1d s2d
@@ -415,7 +415,7 @@ step s2c: INSERT INTO a VALUES (0); <waiting ...>
415415
step s1c: ALTER TABLE a ENABLE TRIGGER t;
416416
step s1d: COMMIT;
417417
step s2c: <... completed>
418-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
418+
ERROR: duplicate key value violates unique constraint "a_pkey"
419419
step s2d: COMMIT;
420420

421421
starting permutation: s2a s1a s2b s1b s1c s1d s2c s2d
@@ -444,7 +444,7 @@ step s1c: ALTER TABLE a ENABLE TRIGGER t;
444444
step s2c: INSERT INTO a VALUES (0); <waiting ...>
445445
step s1d: COMMIT;
446446
step s2c: <... completed>
447-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
447+
ERROR: duplicate key value violates unique constraint "a_pkey"
448448
step s2d: COMMIT;
449449

450450
starting permutation: s2a s1a s2b s1b s2c s1c s1d s2d
@@ -459,7 +459,7 @@ step s2c: INSERT INTO a VALUES (0); <waiting ...>
459459
step s1c: ALTER TABLE a ENABLE TRIGGER t;
460460
step s1d: COMMIT;
461461
step s2c: <... completed>
462-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
462+
ERROR: duplicate key value violates unique constraint "a_pkey"
463463
step s2d: COMMIT;
464464

465465
starting permutation: s2a s1a s2b s2c s1b s1c s1d s2d
@@ -544,7 +544,7 @@ step s1c: ALTER TABLE a ENABLE TRIGGER t;
544544
step s2c: INSERT INTO a VALUES (0); <waiting ...>
545545
step s1d: COMMIT;
546546
step s2c: <... completed>
547-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
547+
ERROR: duplicate key value violates unique constraint "a_pkey"
548548
step s2d: COMMIT;
549549

550550
starting permutation: s2a s2b s1a s1b s2c s1c s1d s2d
@@ -559,7 +559,7 @@ step s2c: INSERT INTO a VALUES (0); <waiting ...>
559559
step s1c: ALTER TABLE a ENABLE TRIGGER t;
560560
step s1d: COMMIT;
561561
step s2c: <... completed>
562-
error in steps s1d s2c: ERROR: duplicate key value violates unique constraint "a_pkey"
562+
ERROR: duplicate key value violates unique constraint "a_pkey"
563563
step s2d: COMMIT;
564564

565565
starting permutation: s2a s2b s1a s2c s1b s1c s1d s2d

src/test/isolation/expected/alter-table-4.out

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ step s1modc1a: ALTER TABLE c1 ALTER COLUMN a TYPE float;
5050
step s2sel: SELECT SUM(a) FROM p; <waiting ...>
5151
step s1c: COMMIT;
5252
step s2sel: <... completed>
53-
error in steps s1c s2sel: ERROR: attribute "a" of relation "c1" does not match parent's type
53+
ERROR: attribute "a" of relation "c1" does not match parent's type
5454
step s2sel: SELECT SUM(a) FROM p;
5555
sum
5656

src/test/isolation/expected/deadlock-hard.out

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@ step s6a7: LOCK TABLE a7; <waiting ...>
1818
step s7a8: LOCK TABLE a8; <waiting ...>
1919
step s8a1: LOCK TABLE a1; <waiting ...>
2020
step s8a1: <... completed>
21+
ERROR: deadlock detected
2122
step s7a8: <... completed>
22-
error in steps s8a1 s7a8: ERROR: deadlock detected
2323
step s8c: COMMIT;
2424
step s7c: COMMIT;
2525
step s6a7: <... completed>

src/test/isolation/expected/deadlock-simple.out

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ step s1as: LOCK TABLE a1 IN ACCESS SHARE MODE;
55
step s2as: LOCK TABLE a1 IN ACCESS SHARE MODE;
66
step s1ae: LOCK TABLE a1 IN ACCESS EXCLUSIVE MODE; <waiting ...>
77
step s2ae: LOCK TABLE a1 IN ACCESS EXCLUSIVE MODE;
8+
ERROR: deadlock detected
89
step s1ae: <... completed>
9-
error in steps s2ae s1ae: ERROR: deadlock detected
1010
step s1c: COMMIT;
1111
step s2c: COMMIT;

0 commit comments

Comments
 (0)