Skip to content

Commit 85cb4ec

Browse files
committed
walsnd: Don't set waiting_for_ping_response spuriously
Ashutosh Bapat noticed that when logical walsender needs to wait for WAL, and it realizes that it must send a keepalive message to walreceiver to update the sent-LSN, which *does not* request a reply from walreceiver, it wrongly sets the flag that it's going to wait for that reply. That means that any future would-be sender of feedback messages ends up not sending a feedback message, because they all believe that a reply is expected. With built-in logical replication there's not much harm in this, because WalReceiverMain will send a ping-back every wal_receiver_timeout/2 anyway; but with other logical replication systems (e.g. pglogical) it can cause significant pain. This problem was introduced in commit 41d5f8a, where the request-reply flag was changed from true to false to WalSndKeepalive, without at the same time removing the line that sets waiting_for_ping_response. Just removing that line would be a sufficient fix, but it seems better to shift the responsibility of setting the flag to WalSndKeepalive itself instead of requiring caller to do it; this is clearly less error-prone. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> Backpatch: 9.5 and up Discussion: https://postgr.es/m/20200806225558.GA22401@alvherre.pgsql
1 parent 4f26932 commit 85cb4ec

File tree

1 file changed

+16
-14
lines changed

1 file changed

+16
-14
lines changed

src/backend/replication/walsender.c

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ static XLogRecPtr sendTimeLineValidUpto = InvalidXLogRecPtr;
154154
* How far have we sent WAL already? This is also advertised in
155155
* MyWalSnd->sentPtr. (Actually, this is the next WAL location to send.)
156156
*/
157-
static XLogRecPtr sentPtr = 0;
157+
static XLogRecPtr sentPtr = InvalidXLogRecPtr;
158158

159159
/* Buffers for constructing outgoing messages and processing reply messages. */
160160
static StringInfoData output_message;
@@ -1358,16 +1358,15 @@ WalSndWaitForWal(XLogRecPtr loc)
13581358
/*
13591359
* We only send regular messages to the client for full decoded
13601360
* transactions, but a synchronous replication and walsender shutdown
1361-
* possibly are waiting for a later location. So we send pings
1362-
* containing the flush location every now and then.
1361+
* possibly are waiting for a later location. So, before sleeping, we
1362+
* send a ping containing the flush location. If the receiver is
1363+
* otherwise idle, this keepalive will trigger a reply. Processing the
1364+
* reply will update these MyWalSnd locations.
13631365
*/
13641366
if (MyWalSnd->flush < sentPtr &&
13651367
MyWalSnd->write < sentPtr &&
13661368
!waiting_for_ping_response)
1367-
{
13681369
WalSndKeepalive(false);
1369-
waiting_for_ping_response = true;
1370-
}
13711370

13721371
/* check whether we're done */
13731372
if (loc <= RecentFlushPtr)
@@ -2917,10 +2916,7 @@ WalSndDone(WalSndSendDataCallback send_data)
29172916
proc_exit(0);
29182917
}
29192918
if (!waiting_for_ping_response)
2920-
{
29212919
WalSndKeepalive(true);
2922-
waiting_for_ping_response = true;
2923-
}
29242920
}
29252921

29262922
/*
@@ -3419,10 +3415,13 @@ pg_stat_get_wal_senders(PG_FUNCTION_ARGS)
34193415
}
34203416

34213417
/*
3422-
* This function is used to send a keepalive message to standby.
3423-
* If requestReply is set, sets a flag in the message requesting the standby
3424-
* to send a message back to us, for heartbeat purposes.
3425-
*/
3418+
* Send a keepalive message to standby.
3419+
*
3420+
* If requestReply is set, the message requests the other party to send
3421+
* a message back to us, for heartbeat purposes. We also set a flag to
3422+
* let nearby code that we're waiting for that response, to avoid
3423+
* repeated requests.
3424+
*/
34263425
static void
34273426
WalSndKeepalive(bool requestReply)
34283427
{
@@ -3437,6 +3436,10 @@ WalSndKeepalive(bool requestReply)
34373436

34383437
/* ... and send it wrapped in CopyData */
34393438
pq_putmessage_noblock('d', output_message.data, output_message.len);
3439+
3440+
/* Set local flag */
3441+
if (requestReply)
3442+
waiting_for_ping_response = true;
34403443
}
34413444

34423445
/*
@@ -3467,7 +3470,6 @@ WalSndKeepaliveIfNecessary(void)
34673470
if (last_processing >= ping_time)
34683471
{
34693472
WalSndKeepalive(true);
3470-
waiting_for_ping_response = true;
34713473

34723474
/* Try to flush pending output to the client */
34733475
if (pq_flush_if_writable() != 0)

0 commit comments

Comments
 (0)