Skip to content

Commit fc4da1f

Browse files
committed
Ignore old stats file timestamps when starting the stats collector.
The stats collector disregards inquiry messages that bear a cutoff_time before when it last wrote the relevant stats file. That's fine, but at startup when it reads the "permanent" stats files, it absorbed their timestamps as if they were the times at which the corresponding temporary stats files had been written. In reality, of course, there's no data out there at all. This led to disregarding inquiry messages soon after startup if the postmaster had been shut down and restarted within less than PGSTAT_STAT_INTERVAL; which is a pretty common scenario, both for testing and in the field. Requesting backends would hang for 10 seconds and then report failure to read statistics, unless they got bailed out by some other backend coming along and making a newer request within that interval. I came across this through investigating unexpected delays in the src/test/recovery TAP tests: it manifests there because the autovacuum launcher hangs for 10 seconds when it can't get statistics at startup, thus preventing a second shutdown from occurring promptly. We might want to do some things in the autovac code to make it less prone to getting stuck that way, but this change is a good bug fix regardless. In passing, also fix pgstat_read_statsfiles() to ensure that it re-zeroes its global stats variables if they are corrupted by a short read from the stats file. (Other reads in that function go into temp variables, so that the issue doesn't arise.) This has been broken since we created the separation between permanent and temporary stats files in 8.4, so back-patch to all supported branches. Discussion: https://postgr.es/m/16860.1498442626@sss.pgh.pa.us
1 parent 52a5eac commit fc4da1f

File tree

1 file changed

+21
-0
lines changed

1 file changed

+21
-0
lines changed

src/backend/postmaster/pgstat.c

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3977,16 +3977,28 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
39773977
{
39783978
ereport(pgStatRunningInCollector ? LOG : WARNING,
39793979
(errmsg("corrupted statistics file \"%s\"", statfile)));
3980+
memset(&globalStats, 0, sizeof(globalStats));
39803981
goto done;
39813982
}
39823983

3984+
/*
3985+
* In the collector, disregard the timestamp we read from the permanent
3986+
* stats file; we should be willing to write a temp stats file immediately
3987+
* upon the first request from any backend. This only matters if the old
3988+
* file's timestamp is less than PGSTAT_STAT_INTERVAL ago, but that's not
3989+
* an unusual scenario.
3990+
*/
3991+
if (pgStatRunningInCollector)
3992+
globalStats.stats_timestamp = 0;
3993+
39833994
/*
39843995
* Read archiver stats struct
39853996
*/
39863997
if (fread(&archiverStats, 1, sizeof(archiverStats), fpin) != sizeof(archiverStats))
39873998
{
39883999
ereport(pgStatRunningInCollector ? LOG : WARNING,
39894000
(errmsg("corrupted statistics file \"%s\"", statfile)));
4001+
memset(&archiverStats, 0, sizeof(archiverStats));
39904002
goto done;
39914003
}
39924004

@@ -4031,6 +4043,15 @@ pgstat_read_statsfiles(Oid onlydb, bool permanent, bool deep)
40314043
dbentry->tables = NULL;
40324044
dbentry->functions = NULL;
40334045

4046+
/*
4047+
* In the collector, disregard the timestamp we read from the
4048+
* permanent stats file; we should be willing to write a temp
4049+
* stats file immediately upon the first request from any
4050+
* backend.
4051+
*/
4052+
if (pgStatRunningInCollector)
4053+
dbentry->stats_timestamp = 0;
4054+
40344055
/*
40354056
* Don't create tables/functions hashtables for uninteresting
40364057
* databases.

0 commit comments

Comments
 (0)