Skip to content

Commit 456bf26

Browse files
committed
Ignore old stats file timestamps when starting the stats collector.
The stats collector disregards inquiry messages that bear a cutoff_time before when it last wrote the relevant stats file. That's fine, but at startup when it reads the "permanent" stats files, it absorbed their timestamps as if they were the times at which the corresponding temporary stats files had been written. In reality, of course, there's no data out there at all. This led to disregarding inquiry messages soon after startup if the postmaster had been shut down and restarted within less than PGSTAT_STAT_INTERVAL; which is a pretty common scenario, both for testing and in the field. Requesting backends would hang for 10 seconds and then report failure to read statistics, unless they got bailed out by some other backend coming along and making a newer request within that interval. I came across this through investigating unexpected delays in the src/test/recovery TAP tests: it manifests there because the autovacuum launcher hangs for 10 seconds when it can't get statistics at startup, thus preventing a second shutdown from occurring promptly. We might want to do some things in the autovac code to make it less prone to getting stuck that way, but this change is a good bug fix regardless. In passing, also fix pgstat_read_statsfiles() to ensure that it re-zeroes its global stats variables if they are corrupted by a short read from the stats file. (Other reads in that function go into temp variables, so that the issue doesn't arise.) This has been broken since we created the separation between permanent and temporary stats files in 8.4, so back-patch to all supported branches. Discussion: https://postgr.es/m/16860.1498442626@sss.pgh.pa.us
1 parent 2563f07 commit 456bf26

File tree

1 file changed

+20
-0
lines changed

1 file changed

+20
-0
lines changed

src/backend/postmaster/pgstat.c

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3883,9 +3883,20 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
38833883
{
38843884
ereport(pgStatRunningInCollector ? LOG : WARNING,
38853885
(errmsg("corrupted statistics file \"%s\"", statfile)));
3886+
memset(&globalStats, 0, sizeof(globalStats));
38863887
goto done;
38873888
}
38883889

3890+
/*
3891+
* In the collector, disregard the timestamp we read from the permanent
3892+
* stats file; we should be willing to write a temp stats file immediately
3893+
* upon the first request from any backend. This only matters if the old
3894+
* file's timestamp is less than PGSTAT_STAT_INTERVAL ago, but that's not
3895+
* an unusual scenario.
3896+
*/
3897+
if (pgStatRunningInCollector)
3898+
globalStats.stats_timestamp = 0;
3899+
38893900
/*
38903901
* We found an existing collector stats file. Read it and put all the
38913902
* hashtable entries into place.
@@ -3927,6 +3938,15 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
39273938
dbentry->tables = NULL;
39283939
dbentry->functions = NULL;
39293940

3941+
/*
3942+
* In the collector, disregard the timestamp we read from the
3943+
* permanent stats file; we should be willing to write a temp
3944+
* stats file immediately upon the first request from any
3945+
* backend.
3946+
*/
3947+
if (pgStatRunningInCollector)
3948+
dbentry->stats_timestamp = 0;
3949+
39303950
/*
39313951
* Don't create tables/functions hashtables for uninteresting
39323952
* databases.

0 commit comments

Comments
 (0)