Skip to content

Commit 82c75f9

Browse files
committed
Fix bug in determining when recovery has reached consistency.
When starting WAL replay from an online checkpoint, the last replayed WAL record variable was initialized using the checkpoint record's location, even though the records between the REDO location and the checkpoint record had not been replayed yet. That was noted as "slightly confusing" but harmless in the comment, but in some cases, it fooled CheckRecoveryConsistency to incorrectly conclude that we had already reached a consistent state immediately at the beginning of WAL replay. That caused the system to accept read-only connections in hot standby mode too early, and also PANICs with message "WAL contains references to invalid pages". Fix by initializing the variables to the REDO location instead. In 9.2 and above, change CheckRecoveryConsistency() to use lastReplayedEndRecPtr variable when checking if backup end location has been reached. It was inconsistently using EndRecPtr for that check, but lastReplayedEndRecPtr when checking min recovery point. It made no difference before this patch, because in all the places where CheckRecoveryConsistency was called the two variables were the same, but it was always an accident waiting to happen, and would have been wrong after this patch anyway. Report and analysis by Tomonari Katsumata, bug #8686. Backpatch to 9.0, where hot standby was introduced.
1 parent 8aa6912 commit 82c75f9

File tree

1 file changed

+18
-18
lines changed
  • src/backend/access/transam

1 file changed

+18
-18
lines changed

src/backend/access/transam/xlog.c

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -6750,20 +6750,12 @@ StartupXLOG(void)
67506750
}
67516751

67526752
/*
6753-
* Initialize shared replayEndRecPtr, lastReplayedEndRecPtr, and
6754-
* recoveryLastXTime.
6755-
*
6756-
* This is slightly confusing if we're starting from an online
6757-
* checkpoint; we've just read and replayed the checkpoint record, but
6758-
* we're going to start replay from its redo pointer, which precedes
6759-
* the location of the checkpoint record itself. So even though the
6760-
* last record we've replayed is indeed ReadRecPtr, we haven't
6761-
* replayed all the preceding records yet. That's OK for the current
6762-
* use of these variables.
6753+
* Initialize shared variables for tracking progress of WAL replay,
6754+
* as if we had just replayed the record before the REDO location.
67636755
*/
67646756
SpinLockAcquire(&xlogctl->info_lck);
6765-
xlogctl->replayEndRecPtr = ReadRecPtr;
6766-
xlogctl->lastReplayedEndRecPtr = EndRecPtr;
6757+
xlogctl->replayEndRecPtr = checkPoint.redo;
6758+
xlogctl->lastReplayedEndRecPtr = checkPoint.redo;
67676759
xlogctl->recoveryLastXTime = 0;
67686760
xlogctl->currentChunkStartTime = 0;
67696761
xlogctl->recoveryPause = false;
@@ -7307,18 +7299,26 @@ StartupXLOG(void)
73077299
static void
73087300
CheckRecoveryConsistency(void)
73097301
{
7302+
XLogRecPtr lastReplayedEndRecPtr;
7303+
73107304
/*
73117305
* During crash recovery, we don't reach a consistent state until we've
73127306
* replayed all the WAL.
73137307
*/
73147308
if (XLogRecPtrIsInvalid(minRecoveryPoint))
73157309
return;
73167310

7311+
/*
7312+
* assume that we are called in the startup process, and hence don't need
7313+
* a lock to read lastReplayedEndRecPtr
7314+
*/
7315+
lastReplayedEndRecPtr = XLogCtl->lastReplayedEndRecPtr;
7316+
73177317
/*
73187318
* Have we reached the point where our base backup was completed?
73197319
*/
73207320
if (!XLogRecPtrIsInvalid(ControlFile->backupEndPoint) &&
7321-
XLByteLE(ControlFile->backupEndPoint, EndRecPtr))
7321+
XLByteLE(ControlFile->backupEndPoint, lastReplayedEndRecPtr))
73227322
{
73237323
/*
73247324
* We have reached the end of base backup, as indicated by pg_control.
@@ -7331,8 +7331,8 @@ CheckRecoveryConsistency(void)
73317331

73327332
LWLockAcquire(ControlFileLock, LW_EXCLUSIVE);
73337333

7334-
if (XLByteLT(ControlFile->minRecoveryPoint, EndRecPtr))
7335-
ControlFile->minRecoveryPoint = EndRecPtr;
7334+
if (XLByteLT(ControlFile->minRecoveryPoint, lastReplayedEndRecPtr))
7335+
ControlFile->minRecoveryPoint = lastReplayedEndRecPtr;
73367336

73377337
MemSet(&ControlFile->backupStartPoint, 0, sizeof(XLogRecPtr));
73387338
MemSet(&ControlFile->backupEndPoint, 0, sizeof(XLogRecPtr));
@@ -7350,7 +7350,7 @@ CheckRecoveryConsistency(void)
73507350
* consistent yet.
73517351
*/
73527352
if (!reachedConsistency && !ControlFile->backupEndRequired &&
7353-
XLByteLE(minRecoveryPoint, XLogCtl->lastReplayedEndRecPtr) &&
7353+
XLByteLE(minRecoveryPoint, lastReplayedEndRecPtr) &&
73547354
XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
73557355
{
73567356
/*
@@ -7362,8 +7362,8 @@ CheckRecoveryConsistency(void)
73627362
reachedConsistency = true;
73637363
ereport(LOG,
73647364
(errmsg("consistent recovery state reached at %X/%X",
7365-
XLogCtl->lastReplayedEndRecPtr.xlogid,
7366-
XLogCtl->lastReplayedEndRecPtr.xrecoff)));
7365+
lastReplayedEndRecPtr.xlogid,
7366+
lastReplayedEndRecPtr.xrecoff)));
73677367
}
73687368

73697369
/*

0 commit comments

Comments
 (0)