Skip to content

Commit b81d08d

Browse files
committed
Fix handling of synchronous replication for stopping WAL senders
This fixes an oversight from c6c3334 which has introduced a more strict ordering in the way WAL senders are stopped to prevent current WAL activity when a shutdown checkpoint is created. After all backends are stopped, all WAL senders are requested to stop which makes them stop any activity, and switching their state as stopping. Once the checkpointer knows that all WAL senders are in a stopping state, the shutdown checkpoint can begin, with all WAL senders activated, waiting for their clients to flush the shutdown checkpoint record. If a subset of WAL senders are stopping and in a sync state, other WAL senders could still be waiting for a WAL position to be synced while committing a transaction, however the subset of stopping senders would not release waiters, potentially breaking synchronous replication guarantees. This commit makes sure that even WAL senders stopping are able to release waiters properly. On 9.4, this can also trigger an assertion failure when setting for example max_wal_senders to 1 where a WAL sender is not able to find itself as in synchronous state when the instance stops. Reported-by: Paul Guo Author: Paul Guo, Michael Paquier Discussion: https://postgr.es/m/CAEET0ZEv8VFqT3C-cQm6byOB4r4VYWcef1J21dOX-gcVhCSpmA@mail.gmail.com Backpatch-through: 9.4
1 parent c1a5cae commit b81d08d

File tree

2 files changed

+10
-5
lines changed

2 files changed

+10
-5
lines changed

src/backend/replication/syncrep.c

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -379,10 +379,12 @@ SyncRepReleaseWaiters(void)
379379
* If this WALSender is serving a standby that is not on the list of
380380
* potential sync standbys then we have nothing to do. If we are still
381381
* starting up, still running base backup or the current flush position
382-
* is still invalid, then leave quickly also.
382+
* is still invalid, then leave quickly also. Streaming or stopping WAL
383+
* senders are allowed to release waiters.
383384
*/
384385
if (MyWalSnd->sync_standby_priority == 0 ||
385-
MyWalSnd->state < WALSNDSTATE_STREAMING ||
386+
(MyWalSnd->state != WALSNDSTATE_STREAMING &&
387+
MyWalSnd->state != WALSNDSTATE_STOPPING) ||
386388
XLogRecPtrIsInvalid(MyWalSnd->flush))
387389
return;
388390

@@ -400,7 +402,8 @@ SyncRepReleaseWaiters(void)
400402
volatile WalSnd *walsnd = &walsndctl->walsnds[i];
401403

402404
if (walsnd->pid != 0 &&
403-
walsnd->state == WALSNDSTATE_STREAMING &&
405+
(walsnd->state == WALSNDSTATE_STREAMING ||
406+
walsnd->state == WALSNDSTATE_STOPPING) &&
404407
walsnd->sync_standby_priority > 0 &&
405408
(priority == 0 ||
406409
priority > walsnd->sync_standby_priority) &&

src/backend/replication/walsender.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2941,12 +2941,14 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
29412941
/*
29422942
* Treat a standby such as a pg_basebackup background process
29432943
* which always returns an invalid flush location, as an
2944-
* asynchronous standby.
2944+
* asynchronous standby. WAL sender must be streaming or
2945+
* stopping.
29452946
*/
29462947
sync_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
29472948
0 : walsnd->sync_standby_priority;
29482949

2949-
if (walsnd->state == WALSNDSTATE_STREAMING &&
2950+
if ((walsnd->state == WALSNDSTATE_STREAMING ||
2951+
walsnd->state == WALSNDSTATE_STOPPING) &&
29502952
walsnd->sync_standby_priority > 0 &&
29512953
(priority == 0 ||
29522954
priority > walsnd->sync_standby_priority) &&

0 commit comments

Comments
 (0)