Skip to content

Commit ff76983

Browse files
committed
Improve autoprewarm's handling of early-shutdown scenarios.
Bad things happen if the DBA issues "pg_ctl stop -m fast" before autoprewarm finishes loading its list of blocks to prewarm. The current worker process successfully terminates early, but (if this wasn't the last database with blocks to prewarm) the leader process will just try to launch another worker for the next database. Since the postmaster is now in PM_WAIT_BACKENDS state, it ignores the launch request, and the leader just sits until it's killed manually. This is mostly the fault of our half-baked design for launching background workers, but a proper fix for that is likely to be too invasive to be back-patchable. To ameliorate the situation, fix apw_load_buffers() to check whether SIGTERM has arrived just before trying to launch another worker. That leaves us with only a very narrow window in each worker launch where SIGTERM could occur between the launch request and successful worker start. Another issue is that if the leader process does manage to exit, it unconditionally rewrites autoprewarm.blocks with only the blocks currently in shared buffers, thus forgetting any blocks that we hadn't reached yet while prewarming. This seems quite unhelpful, since the next database start will then not have the expected prewarming benefit. Fix it to not modify the file if we shut down before the initial load attempt is complete. Per bug #16785 from John Thompson. Back-patch to v11 where the autoprewarm code was introduced. Discussion: https://postgr.es/m/16785-c0207d8c67fb5f25@postgresql.org
1 parent 08dde1b commit ff76983

File tree

1 file changed

+25
-5
lines changed

1 file changed

+25
-5
lines changed

contrib/pg_prewarm/autoprewarm.c

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,7 @@ void
153153
autoprewarm_main(Datum main_arg)
154154
{
155155
bool first_time = true;
156+
bool final_dump_allowed = true;
156157
TimestampTz last_dump_time = 0;
157158

158159
/* Establish signal handlers; once that's done, unblock signals. */
@@ -193,10 +194,15 @@ autoprewarm_main(Datum main_arg)
193194
* There's not much point in performing a dump immediately after we finish
194195
* preloading; so, if we do end up preloading, consider the last dump time
195196
* to be equal to the current time.
197+
*
198+
* If apw_load_buffers() is terminated early by a shutdown request,
199+
* prevent dumping out our state below the loop, because we'd effectively
200+
* just truncate the saved state to however much we'd managed to preload.
196201
*/
197202
if (first_time)
198203
{
199204
apw_load_buffers();
205+
final_dump_allowed = !ShutdownRequestPending;
200206
last_dump_time = GetCurrentTimestamp();
201207
}
202208

@@ -254,7 +260,8 @@ autoprewarm_main(Datum main_arg)
254260
* Dump one last time. We assume this is probably the result of a system
255261
* shutdown, although it's possible that we've merely been terminated.
256262
*/
257-
apw_dump_now(true, true);
263+
if (final_dump_allowed)
264+
apw_dump_now(true, true);
258265
}
259266

260267
/*
@@ -387,6 +394,18 @@ apw_load_buffers(void)
387394
if (!have_free_buffer())
388395
break;
389396

397+
/*
398+
* Likewise, don't launch if we've already been told to shut down.
399+
*
400+
* There is a race condition here: if the postmaster has received a
401+
* fast-shutdown signal, but we've not heard about it yet, then the
402+
* postmaster will ignore our worker start request and we'll wait
403+
* forever. However, that's a bug in the general background-worker
404+
* logic, not the fault of this module.
405+
*/
406+
if (ShutdownRequestPending)
407+
break;
408+
390409
/*
391410
* Start a per-database worker to load blocks for this database; this
392411
* function will return once the per-database worker exits.
@@ -404,10 +423,11 @@ apw_load_buffers(void)
404423
apw_state->pid_using_dumpfile = InvalidPid;
405424
LWLockRelease(&apw_state->lock);
406425

407-
/* Report our success. */
408-
ereport(LOG,
409-
(errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
410-
apw_state->prewarmed_blocks, num_elements)));
426+
/* Report our success, if we were able to finish. */
427+
if (!ShutdownRequestPending)
428+
ereport(LOG,
429+
(errmsg("autoprewarm successfully prewarmed %d of %d previously-loaded blocks",
430+
apw_state->prewarmed_blocks, num_elements)));
411431
}
412432

413433
/*

0 commit comments

Comments
 (0)