Skip to content

Commit eadfafd

Browse files
committed
Improve isolationtester's timeout management.
isolationtester.c had a hard-wired limit of 3 minutes per test step. It now emerges that this isn't quite enough for some of the slowest buildfarm animals. This isn't the first time we've had to raise this limit (cf. 1db439a), so let's make it configurable. This patch raises the default to 5 minutes, and introduces an environment variable PGISOLATIONTIMEOUT that can be set if more time is needed, following the precedent of PGCTLTIMEOUT. Also, modify isolationtester so that when the timeout is hit, it explicitly reports having sent a cancel. This makes the regression failure log considerably more intelligible. (In the worst case, a timed-out test might actually be reported as "passing" without this extra output, so arguably this is a bug fix in itself.) In passing, update the README file, which had apparently not gotten touched when we added "make check" support here. Back-patch to 9.6; older versions don't have comparable timeout logic. Discussion: https://postgr.es/m/22964.1575842935@sss.pgh.pa.us
1 parent a7472d2 commit eadfafd

File tree

2 files changed

+48
-14
lines changed

2 files changed

+48
-14
lines changed

src/test/isolation/README

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,18 +11,26 @@ comes from the fact that the original motivation was to test the
1111
serializable isolation level; but tests for other sorts of concurrent
1212
behaviors have been added as well.
1313

14-
To run the tests, you need to have a server running at the default port
15-
expected by libpq. (You can set PGPORT and so forth in your environment
16-
to control this.) Then run
14+
You can run the tests against the current build tree by typing
15+
make check
16+
Alternatively, you can run against an existing installation by typing
1717
make installcheck
18-
To run just specific test(s), you can do something like
18+
(This will contact a server at the default port expected by libpq.
19+
You can set PGPORT and so forth in your environment to control this.)
20+
21+
To run just specific test(s) against an installed server,
22+
you can do something like
1923
./pg_isolation_regress fk-contention fk-deadlock
2024
(look into the specs/ subdirectory to see the available tests).
2125

2226
The prepared-transactions test requires the server's
2327
max_prepared_transactions parameter to be set to at least 3; therefore it
2428
is not run by default. To include it in the test run, use
29+
make check-prepared-txns
30+
or
2531
make installcheck-prepared-txns
32+
after making sure the server configuration is correct (see TEMP_CONFIG
33+
to adjust this in the "check" case).
2634

2735
To define tests with overlapping transactions, we use test specification
2836
files with a custom syntax, which is described in the next section. To add
@@ -39,6 +47,11 @@ pg_isolation_regress is a tool similar to pg_regress, but instead of using
3947
psql to execute a test, it uses isolationtester. It accepts all the same
4048
command-line arguments as pg_regress.
4149

50+
By default, isolationtester will wait at most 300 seconds (5 minutes)
51+
for any one test step to complete. If you need to adjust this, set
52+
the environment variable PGISOLATIONTIMEOUT to the desired timeout
53+
in seconds.
54+
4255

4356
Test specification
4457
==================
@@ -108,10 +121,10 @@ Each step may contain commands that block until further action has been taken
108121
deadlock). A test that uses this ability must manually specify valid
109122
permutations, i.e. those that would not expect a blocked session to execute a
110123
command. If a test fails to follow that rule, isolationtester will cancel it
111-
after 180 seconds. If the cancel doesn't work, isolationtester will exit
112-
uncleanly after a total of 200 seconds of wait time. Testing invalid
113-
permutations should be avoided because they can make the isolation tests take
114-
a very long time to run, and they serve no useful testing purpose.
124+
after PGISOLATIONTIMEOUT seconds. If the cancel doesn't work, isolationtester
125+
will exit uncleanly after a total of twice PGISOLATIONTIMEOUT. Testing
126+
invalid permutations should be avoided because they can make the isolation
127+
tests take a very long time to run, and they serve no useful testing purpose.
115128

116129
Note that isolationtester recognizes that a command has blocked by looking
117130
to see if it is shown as waiting in the pg_locks view; therefore, only

src/test/isolation/isolationtester.c

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,10 @@ static int nconns = 0;
3434
/* In dry run only output permutations to be run by the tester. */
3535
static int dry_run = false;
3636

37+
/* Maximum time to wait before giving up on a step (in usec) */
38+
static int64 max_step_wait = 300 * USECS_PER_SEC;
39+
40+
3741
static void run_testspec(TestSpec *testspec);
3842
static void run_all_permutations(TestSpec *testspec);
3943
static void run_all_permutations_recurse(TestSpec *testspec, int nsteps,
@@ -66,6 +70,7 @@ int
6670
main(int argc, char **argv)
6771
{
6872
const char *conninfo;
73+
const char *env_wait;
6974
TestSpec *testspec;
7075
int i,
7176
j;
@@ -110,6 +115,14 @@ main(int argc, char **argv)
110115
else
111116
conninfo = "dbname = postgres";
112117

118+
/*
119+
* If PGISOLATIONTIMEOUT is set in the environment, adopt its value (given
120+
* in seconds) as the max time to wait for any one step to complete.
121+
*/
122+
env_wait = getenv("PGISOLATIONTIMEOUT");
123+
if (env_wait != NULL)
124+
max_step_wait = ((int64) atoi(env_wait)) * USECS_PER_SEC;
125+
113126
/* Read the test spec from stdin */
114127
spec_yyparse();
115128
testspec = &parseresult;
@@ -779,15 +792,15 @@ try_complete_step(TestSpec *testspec, Step *step, int flags)
779792
td += (int64) current_time.tv_usec - (int64) start_time.tv_usec;
780793

781794
/*
782-
* After 180 seconds, try to cancel the query.
795+
* After max_step_wait microseconds, try to cancel the query.
783796
*
784797
* If the user tries to test an invalid permutation, we don't want
785798
* to hang forever, especially when this is running in the
786799
* buildfarm. This will presumably lead to this permutation
787800
* failing, but remaining permutations and tests should still be
788801
* OK.
789802
*/
790-
if (td > 180 * USECS_PER_SEC && !canceled)
803+
if (td > max_step_wait && !canceled)
791804
{
792805
PGcancel *cancel = PQgetCancel(conn);
793806

@@ -796,24 +809,32 @@ try_complete_step(TestSpec *testspec, Step *step, int flags)
796809
char buf[256];
797810

798811
if (PQcancel(cancel, buf, sizeof(buf)))
812+
{
813+
/*
814+
* print to stdout not stderr, as this should appear
815+
* in the test case's results
816+
*/
817+
printf("isolationtester: canceling step %s after %d seconds\n",
818+
step->name, (int) (td / USECS_PER_SEC));
799819
canceled = true;
820+
}
800821
else
801822
fprintf(stderr, "PQcancel failed: %s\n", buf);
802823
PQfreeCancel(cancel);
803824
}
804825
}
805826

806827
/*
807-
* After 200 seconds, just give up and die.
828+
* After twice max_step_wait, just give up and die.
808829
*
809830
* Since cleanup steps won't be run in this case, this may cause
810831
* later tests to fail. That stinks, but it's better than waiting
811832
* forever for the server to respond to the cancel.
812833
*/
813-
if (td > 200 * USECS_PER_SEC)
834+
if (td > 2 * max_step_wait)
814835
{
815-
fprintf(stderr, "step %s timed out after 200 seconds\n",
816-
step->name);
836+
fprintf(stderr, "step %s timed out after %d seconds\n",
837+
step->name, (int) (td / USECS_PER_SEC));
817838
exit(1);
818839
}
819840
}

0 commit comments

Comments
 (0)