Skip to content

Commit 4b02922

Browse files
committed
Improve autoprewarm's handling of early-shutdown scenarios.
Bad things happen if the DBA issues "pg_ctl stop -m fast" before autoprewarm finishes loading its list of blocks to prewarm. The current worker process successfully terminates early, but (if this wasn't the last database with blocks to prewarm) the leader process will just try to launch another worker for the next database. Since the postmaster is now in PM_WAIT_BACKENDS state, it ignores the launch request, and the leader just sits until it's killed manually. This is mostly the fault of our half-baked design for launching background workers, but a proper fix for that is likely to be too invasive to be back-patchable. To ameliorate the situation, fix apw_load_buffers() to check whether SIGTERM has arrived just before trying to launch another worker. That leaves us with only a very narrow window in each worker launch where SIGTERM could occur between the launch request and successful worker start. Another issue is that if the leader process does manage to exit, it unconditionally rewrites autoprewarm.blocks with only the blocks currently in shared buffers, thus forgetting any blocks that we hadn't reached yet while prewarming. This seems quite unhelpful, since the next database start will then not have the expected prewarming benefit. Fix it to not modify the file if we shut down before the initial load attempt is complete. Per bug #16785 from John Thompson. Back-patch to v11 where the autoprewarm code was introduced. Discussion: https://postgr.es/m/16785-c0207d8c67fb5f25@postgresql.org
1 parent 336879f commit 4b02922

File tree

1 file changed

+25
-5
lines changed

1 file changed

+25
-5
lines changed

contrib/pg_prewarm/autoprewarm.c

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,7 @@ void
157157
autoprewarm_main(Datum main_arg)
158158
{
159159
bool first_time = true;
160+
bool final_dump_allowed = true;
160161
TimestampTz last_dump_time = 0;
161162

162163
/* Establish signal handlers; once that's done, unblock signals. */
@@ -197,10 +198,15 @@ autoprewarm_main(Datum main_arg)
197198
* There's not much point in performing a dump immediately after we finish
198199
* preloading; so, if we do end up preloading, consider the last dump time
199200
* to be equal to the current time.
201+
*
202+
* If apw_load_buffers() is terminated early by a shutdown request,
203+
* prevent dumping out our state below the loop, because we'd effectively
204+
* just truncate the saved state to however much we'd managed to preload.
200205
*/
201206
if (first_time)
202207
{
203208
apw_load_buffers();
209+
final_dump_allowed = !got_sigterm;
204210
last_dump_time = GetCurrentTimestamp();
205211
}
206212

@@ -258,7 +264,8 @@ autoprewarm_main(Datum main_arg)
258264
* Dump one last time. We assume this is probably the result of a system
259265
* shutdown, although it's possible that we've merely been terminated.
260266
*/
261-
apw_dump_now(true, true);
267+
if (final_dump_allowed)
268+
apw_dump_now(true, true);
262269
}
263270

264271
/*
@@ -391,6 +398,18 @@ apw_load_buffers(void)
391398
if (!have_free_buffer())
392399
break;
393400

401+
/*
402+
* Likewise, don't launch if we've already been told to shut down.
403+
*
404+
* There is a race condition here: if the postmaster has received a
405+
* fast-shutdown signal, but we've not heard about it yet, then the
406+
* postmaster will ignore our worker start request and we'll wait
407+
* forever. However, that's a bug in the general background-worker
408+
* logic, not the fault of this module.
409+
*/
410+
if (got_sigterm)
411+
break;
412+
394413
/*
395414
* Start a per-database worker to load blocks for this database; this
396415
* function will return once the per-database worker exits.
@@ -408,10 +427,11 @@ apw_load_buffers(void)
408427
apw_state->pid_using_dumpfile = InvalidPid;
409428
LWLockRelease(&apw_state->lock);
410429

411-
/* Report our success. */
412-
ereport(LOG,
413-
(errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
414-
apw_state->prewarmed_blocks, num_elements)));
430+
/* Report our success, if we were able to finish. */
431+
if (!got_sigterm)
432+
ereport(LOG,
433+
(errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
434+
apw_state->prewarmed_blocks, num_elements)));
415435
}
416436

417437
/*

0 commit comments

Comments
 (0)