Skip to content

Commit f663b00

Browse files
committed
Fix failure at promotion with 2PC transactions and archiving enabled
When archiving is enabled, a promotion request would fail with the following error when some 2PC transaction needs to be recovered from WAL, preventing the promotion to complete: FATAL: requested WAL segment pg_wal/000000010000000000000001 has already been removed The origin of the problem is that the last partial segment of the old timeline is renamed before recovering the 2PC data via RecoverPreparedTransactions() at the end of recovery, causing the FATAL because the segment wanted is now renamed with a .partial suffix. This commit reorders a bit the end-of-recovery actions so as the execution of recovery_end_command, the cleanup of the old segments of the old timeline (RemoveNonParentXlogFiles) and the last partial segment rename are done after the 2PC transaction data is recovered with RecoverPreparedTransactions(). This makes the order of these end-of-recovery actions more consistent with ~15, at the exception of the end-of-recovery checkpoint that still needs to happen before all the actions reordered here in v13 and v14, contrary to what 15~ does. v15 and newer versions have "fixed" this problem somewhat accidentally with 811051c, where the end-of-recovery actions got reordered. In this case, the recovery of 2PC transactions happens before the renaming of the last partial segment of the old timeline. v13 and v14 are the versions that can easily see this problem as per the refactoring of 38a9573 where XLogReaderState is reset in XLogBeginRead() before reading the 2PC transaction data. v11 and v12 could also see this problem, but may finish by reading the 2PC data from some of the WAL buffers instead. Perhaps something could be done for these two branches, but I am not really excited about doing something on these per the lack of complaints and per the fact that v11 is soon going to be EOL'd soon (there is always a risk of breaking something). Note that the TAP test 009_twophase.pl is able to exhibit the issue if it enables archiving on the primary node, which does not impact the test coverage as restore_command would remain unused. This is something that should be changed on v15 and HEAD as well, so this will be changed in a separate commit for clarity. Author: Julian Markwort Reviewed-by: Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/743b9b45a2d4013bd90b6a5cba8d6faeb717ee34.camel@cybertec.at Backpatch-through: 13
1 parent 73f1c17 commit f663b00

File tree

1 file changed

+51
-51
lines changed
  • src/backend/access/transam

1 file changed

+51
-51
lines changed

src/backend/access/transam/xlog.c

Lines changed: 51 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -8016,6 +8016,57 @@ StartupXLOG(void)
80168016
CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IMMEDIATE);
80178017
}
80188018

8019+
/*
8020+
* Preallocate additional log files, if wanted.
8021+
*/
8022+
PreallocXlogFiles(EndOfLog);
8023+
8024+
/*
8025+
* Okay, we're officially UP.
8026+
*/
8027+
InRecovery = false;
8028+
8029+
/* start the archive_timeout timer and LSN running */
8030+
XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
8031+
XLogCtl->lastSegSwitchLSN = EndOfLog;
8032+
8033+
/* also initialize latestCompletedXid, to nextXid - 1 */
8034+
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
8035+
ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
8036+
FullTransactionIdRetreat(&ShmemVariableCache->latestCompletedXid);
8037+
LWLockRelease(ProcArrayLock);
8038+
8039+
/*
8040+
* Start up subtrans, if not already done for hot standby. (commit
8041+
* timestamps are started below, if necessary.)
8042+
*/
8043+
if (standbyState == STANDBY_DISABLED)
8044+
StartupSUBTRANS(oldestActiveXID);
8045+
8046+
/*
8047+
* Perform end of recovery actions for any SLRUs that need it.
8048+
*/
8049+
TrimCLOG();
8050+
TrimMultiXact();
8051+
8052+
/* Reload shared-memory state for prepared transactions */
8053+
RecoverPreparedTransactions();
8054+
8055+
/* Shut down xlogreader */
8056+
if (readFile >= 0)
8057+
{
8058+
close(readFile);
8059+
readFile = -1;
8060+
}
8061+
XLogReaderFree(xlogreader);
8062+
8063+
/*
8064+
* If any of the critical GUCs have changed, log them before we allow
8065+
* backends to write WAL.
8066+
*/
8067+
LocalSetXLogInsertAllowed();
8068+
XLogReportParameters();
8069+
80198070
if (ArchiveRecoveryRequested)
80208071
{
80218072
/*
@@ -8097,57 +8148,6 @@ StartupXLOG(void)
80978148
}
80988149
}
80998150

8100-
/*
8101-
* Preallocate additional log files, if wanted.
8102-
*/
8103-
PreallocXlogFiles(EndOfLog);
8104-
8105-
/*
8106-
* Okay, we're officially UP.
8107-
*/
8108-
InRecovery = false;
8109-
8110-
/* start the archive_timeout timer and LSN running */
8111-
XLogCtl->lastSegSwitchTime = (pg_time_t) time(NULL);
8112-
XLogCtl->lastSegSwitchLSN = EndOfLog;
8113-
8114-
/* also initialize latestCompletedXid, to nextXid - 1 */
8115-
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
8116-
ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
8117-
FullTransactionIdRetreat(&ShmemVariableCache->latestCompletedXid);
8118-
LWLockRelease(ProcArrayLock);
8119-
8120-
/*
8121-
* Start up subtrans, if not already done for hot standby. (commit
8122-
* timestamps are started below, if necessary.)
8123-
*/
8124-
if (standbyState == STANDBY_DISABLED)
8125-
StartupSUBTRANS(oldestActiveXID);
8126-
8127-
/*
8128-
* Perform end of recovery actions for any SLRUs that need it.
8129-
*/
8130-
TrimCLOG();
8131-
TrimMultiXact();
8132-
8133-
/* Reload shared-memory state for prepared transactions */
8134-
RecoverPreparedTransactions();
8135-
8136-
/* Shut down xlogreader */
8137-
if (readFile >= 0)
8138-
{
8139-
close(readFile);
8140-
readFile = -1;
8141-
}
8142-
XLogReaderFree(xlogreader);
8143-
8144-
/*
8145-
* If any of the critical GUCs have changed, log them before we allow
8146-
* backends to write WAL.
8147-
*/
8148-
LocalSetXLogInsertAllowed();
8149-
XLogReportParameters();
8150-
81518151
/*
81528152
* Local WAL inserts enabled, so it's time to finish initialization of
81538153
* commit timestamp.

0 commit comments

Comments
 (0)