Skip to content

Commit acf1dd4

Browse files
committed
Don't retry restore_command while reading ahead.
Suppress further attempts to read ahead in the WAL if we run out of data, until the records already decoded have been replayed. This restores the traditional behavior for continuous archive recovery, which is to retry the failing restore_command only every 5 seconds. With the coding in 5dc0418, we would start retrying every time through the recovery loop when our WAL decoding window hit the end of the current segment and we tried to look ahead into a not-yet-available next file. That was very slow. Also change the no_readahead_until mechanism to use <= rather than <, which seems more useful. Otherwise we'd either get one extra unwanted retry of restore_command, or we'd need to add 1 to an LSN. No change in behavior for regular streaming. That was already limited by the flushedUpto variable, which won't be updated until we replay what we have already. Reported by Andres Freund while analyzing the failure of a TAP test on build farm animal skink (investigation ongoing but probably due to otherwise unrelated timing bugs triggered by this slowness magnified by valgrind). Discussion: https://postgr.es/m/20220409005910.alw46xqmmgny2sgr%40alap3.anarazel.de
1 parent 4a736a1 commit acf1dd4

File tree

1 file changed

+8
-3
lines changed

1 file changed

+8
-3
lines changed

src/backend/access/transam/xlogprefetcher.c

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -487,17 +487,22 @@ XLogPrefetcherNextBlock(uintptr_t pgsr_private, XLogRecPtr *lsn)
487487
*/
488488
nonblocking = XLogReaderHasQueuedRecordOrError(reader);
489489

490-
/* Certain records act as barriers for all readahead. */
491-
if (nonblocking && replaying_lsn < prefetcher->no_readahead_until)
490+
/* Readahead is disabled until we replay past a certain point. */
491+
if (nonblocking && replaying_lsn <= prefetcher->no_readahead_until)
492492
return LRQ_NEXT_AGAIN;
493493

494494
record = XLogReadAhead(prefetcher->reader, nonblocking);
495495
if (record == NULL)
496496
{
497497
/*
498498
* We can't read any more, due to an error or lack of data in
499-
* nonblocking mode.
499+
* nonblocking mode. Don't try to read ahead again until
500+
* we've replayed everything already decoded.
500501
*/
502+
if (nonblocking && prefetcher->reader->decode_queue_tail)
503+
prefetcher->no_readahead_until =
504+
prefetcher->reader->decode_queue_tail->lsn;
505+
501506
return LRQ_NEXT_AGAIN;
502507
}
503508

0 commit comments

Comments
 (0)