Skip to content

Commit 60690a6

Browse files
committed
Make stats regression test robust in the face of parallel query.
Historically, the wait_for_stats() function in this test has simply checked for a report of an indexscan on tenk2, corresponding to the last command issued before we expect stats updates to appear. However, with parallel query that indexscan could be done by a parallel worker that will emit its stats counters to the collector before the session's main backend does (a full second before, in fact, thanks to the "pg_sleep(1.0)" added by commit 957d08c). That leaves a sizable window in which an autovacuum-triggered write of the stats files would present a state in which the indexscan on tenk2 appears to have been done, but none of the write updates performed by the test have been. This is evidently the explanation for intermittent failures seen by me and on buildfarm member mandrill. To fix, we should check separately for both the tenk2 seqscan and indexscan counts, since those might be reported by different processes that could be delayed arbitrarily on an overloaded test machine. And we need to check for at least one update-related count. If we ever allow parallel workers to do writes, this will get even more complicated ... but in view of all the other hard problems that will entail, I don't feel a need to solve this one today. Per research by Rahila Syed and myself; part of this patch is Rahila's.
1 parent 708020e commit 60690a6

File tree

2 files changed

+40
-8
lines changed

2 files changed

+40
-8
lines changed

src/test/regress/expected/stats.out

+20-4
Original file line numberDiff line numberDiff line change
@@ -37,17 +37,33 @@ SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch,
3737
create function wait_for_stats() returns void as $$
3838
declare
3939
start_time timestamptz := clock_timestamp();
40-
updated bool;
40+
updated1 bool;
41+
updated2 bool;
42+
updated3 bool;
4143
begin
4244
-- we don't want to wait forever; loop will exit after 30 seconds
4345
for i in 1 .. 300 loop
4446

47+
-- With parallel query, the seqscan and indexscan on tenk2 might be done
48+
-- in parallel worker processes, which will send their stats counters
49+
-- asynchronously to what our own session does. So we must check for
50+
-- those counts to be registered separately from the update counts.
51+
52+
-- check to see if seqscan has been sensed
53+
SELECT (st.seq_scan >= pr.seq_scan + 1) INTO updated1
54+
FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
55+
WHERE st.relname='tenk2' AND cl.relname='tenk2';
56+
4557
-- check to see if indexscan has been sensed
46-
SELECT (st.idx_scan >= pr.idx_scan + 1) INTO updated
58+
SELECT (st.idx_scan >= pr.idx_scan + 1) INTO updated2
4759
FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
4860
WHERE st.relname='tenk2' AND cl.relname='tenk2';
4961

50-
exit when updated;
62+
-- check to see if updates have been sensed
63+
SELECT (n_tup_ins > 0) INTO updated3
64+
FROM pg_stat_user_tables WHERE relname='trunc_stats_test';
65+
66+
exit when updated1 and updated2 and updated3;
5167

5268
-- wait a little
5369
perform pg_sleep(0.1);
@@ -127,7 +143,7 @@ SELECT count(*) FROM tenk2 WHERE unique1 = 1;
127143
1
128144
(1 row)
129145

130-
-- force the rate-limiting logic in pgstat_report_tabstat() to time out
146+
-- force the rate-limiting logic in pgstat_report_stat() to time out
131147
-- and send a message
132148
SELECT pg_sleep(1.0);
133149
pg_sleep

src/test/regress/sql/stats.sql

+20-4
Original file line numberDiff line numberDiff line change
@@ -32,17 +32,33 @@ SELECT t.seq_scan, t.seq_tup_read, t.idx_scan, t.idx_tup_fetch,
3232
create function wait_for_stats() returns void as $$
3333
declare
3434
start_time timestamptz := clock_timestamp();
35-
updated bool;
35+
updated1 bool;
36+
updated2 bool;
37+
updated3 bool;
3638
begin
3739
-- we don't want to wait forever; loop will exit after 30 seconds
3840
for i in 1 .. 300 loop
3941

42+
-- With parallel query, the seqscan and indexscan on tenk2 might be done
43+
-- in parallel worker processes, which will send their stats counters
44+
-- asynchronously to what our own session does. So we must check for
45+
-- those counts to be registered separately from the update counts.
46+
47+
-- check to see if seqscan has been sensed
48+
SELECT (st.seq_scan >= pr.seq_scan + 1) INTO updated1
49+
FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
50+
WHERE st.relname='tenk2' AND cl.relname='tenk2';
51+
4052
-- check to see if indexscan has been sensed
41-
SELECT (st.idx_scan >= pr.idx_scan + 1) INTO updated
53+
SELECT (st.idx_scan >= pr.idx_scan + 1) INTO updated2
4254
FROM pg_stat_user_tables AS st, pg_class AS cl, prevstats AS pr
4355
WHERE st.relname='tenk2' AND cl.relname='tenk2';
4456

45-
exit when updated;
57+
-- check to see if updates have been sensed
58+
SELECT (n_tup_ins > 0) INTO updated3
59+
FROM pg_stat_user_tables WHERE relname='trunc_stats_test';
60+
61+
exit when updated1 and updated2 and updated3;
4662

4763
-- wait a little
4864
perform pg_sleep(0.1);
@@ -121,7 +137,7 @@ SELECT count(*) FROM tenk2;
121137
-- do an indexscan
122138
SELECT count(*) FROM tenk2 WHERE unique1 = 1;
123139

124-
-- force the rate-limiting logic in pgstat_report_tabstat() to time out
140+
-- force the rate-limiting logic in pgstat_report_stat() to time out
125141
-- and send a message
126142
SELECT pg_sleep(1.0);
127143

0 commit comments

Comments
 (0)