Skip to content

Commit 0fae846

Browse files
committed
Fix some minor postmaster-state-machine issues.
In sigusr1_handler, don't ignore PMSIGNAL_ADVANCE_STATE_MACHINE based on pmState. The restriction is unnecessary (PostmasterStateMachine should work in any state), not future-proof (since it makes too many assumptions about why the signal might be sent), and broken even today because a race condition can make it necessary to respond to the signal in PM_WAIT_READONLY state. The race condition seems unlikely, but if it did happen, a hot-standby postmaster could fail to shut down after receiving a smart-shutdown request. In MaybeStartWalReceiver, don't clear the WalReceiverRequested flag if the fork attempt fails. Leaving it set allows us to try again in future iterations of the postmaster idle loop. (The startup process would eventually send a fresh request signal, but this change may allow us to retry the fork sooner.) Remove an obsolete comment and unnecessary test in PostmasterStateMachine's handling of PM_SHUTDOWN_2 state. It's not possible to have a live walreceiver in that state, and AFAICT has not been possible since commit 5e85315. This isn't a live bug, but the false comment is quite confusing to readers. In passing, rearrange sigusr1_handler's CheckPromoteSignal tests so that we don't uselessly perform stat() calls that we're going to ignore the results of. Add some comments clarifying the behavior of MaybeStartWalReceiver; I very nearly rearranged it in a way that'd reintroduce the race condition fixed in e5d494d. Mea culpa for not commenting that properly at the time. Back-patch to all supported branches. The PMSIGNAL_ADVANCE_STATE_MACHINE change is the only one of even minor significance, but we might as well keep this code in sync across branches. Discussion: https://postgr.es/m/9001.1556046681@sss.pgh.pa.us
1 parent 0a999e1 commit 0fae846

File tree

1 file changed

+26
-11
lines changed

1 file changed

+26
-11
lines changed

src/backend/postmaster/postmaster.c

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3811,12 +3811,8 @@ PostmasterStateMachine(void)
38113811
* dead_end children left. There shouldn't be any regular backends
38123812
* left by now anyway; what we're really waiting for is walsenders and
38133813
* archiver.
3814-
*
3815-
* Walreceiver should normally be dead by now, but not when a fast
3816-
* shutdown is performed during recovery.
38173814
*/
3818-
if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL) == 0 &&
3819-
WalReceiverPID == 0)
3815+
if (PgArchPID == 0 && CountChildren(BACKEND_TYPE_ALL) == 0)
38203816
{
38213817
pmState = PM_WAIT_DEAD_END;
38223818
}
@@ -5213,16 +5209,25 @@ sigusr1_handler(SIGNAL_ARGS)
52135209
MaybeStartWalReceiver();
52145210
}
52155211

5216-
if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE) &&
5217-
(pmState == PM_WAIT_BACKUP || pmState == PM_WAIT_BACKENDS))
5212+
/*
5213+
* Try to advance postmaster's state machine, if a child requests it.
5214+
*
5215+
* Be careful about the order of this action relative to sigusr1_handler's
5216+
* other actions. Generally, this should be after other actions, in case
5217+
* they have effects PostmasterStateMachine would need to know about.
5218+
* However, we should do it before the CheckPromoteSignal step, which
5219+
* cannot have any (immediate) effect on the state machine, but does
5220+
* depend on what state we're in now.
5221+
*/
5222+
if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE))
52185223
{
5219-
/* Advance postmaster's state machine */
52205224
PostmasterStateMachine();
52215225
}
52225226

5223-
if (CheckPromoteSignal() && StartupPID != 0 &&
5227+
if (StartupPID != 0 &&
52245228
(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
5225-
pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY))
5229+
pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
5230+
CheckPromoteSignal())
52265231
{
52275232
/* Tell startup process to finish recovery */
52285233
signal_child(StartupPID, SIGUSR2);
@@ -5518,6 +5523,14 @@ StartAutovacuumWorker(void)
55185523
/*
55195524
* MaybeStartWalReceiver
55205525
* Start the WAL receiver process, if not running and our state allows.
5526+
*
5527+
* Note: if WalReceiverPID is already nonzero, it might seem that we should
5528+
* clear WalReceiverRequested. However, there's a race condition if the
5529+
* walreceiver terminates and the startup process immediately requests a new
5530+
* one: it's quite possible to get the signal for the request before reaping
5531+
* the dead walreceiver process. Better to risk launching an extra
5532+
* walreceiver than to miss launching one we need. (The walreceiver code
5533+
* has logic to recognize that it should go away if not needed.)
55215534
*/
55225535
static void
55235536
MaybeStartWalReceiver(void)
@@ -5528,7 +5541,9 @@ MaybeStartWalReceiver(void)
55285541
Shutdown == NoShutdown)
55295542
{
55305543
WalReceiverPID = StartWalReceiver();
5531-
WalReceiverRequested = false;
5544+
if (WalReceiverPID != 0)
5545+
WalReceiverRequested = false;
5546+
/* else leave the flag set, so we'll try again later */
55325547
}
55335548
}
55345549

0 commit comments

Comments
 (0)