Skip to content

Commit 8f8c666

Browse files
committed
Truncate pg_multixact/'s contents during crash recovery
Commit 9dc842f of 8.2 era prevented MultiXact truncation during crash recovery, because there was no guarantee that enough state had been setup, and because it wasn't deemed to be a good idea to remove data during crash recovery anyway. Since then, due to Hot-Standby, streaming replication and PITR, the amount of time a cluster can spend doing crash recovery has increased significantly, to the point that a cluster may even never come out of it. This has made not truncating the content of pg_multixact/ not defensible anymore. To fix, take care to setup enough state for multixact truncation before crash recovery starts (easy since checkpoints contain the required information), and move the current end-of-recovery actions to a new TrimMultiXact() function, analogous to TrimCLOG(). At some later point, this should probably done similarly to the way clog.c is doing it, which is to just WAL log truncations, but we can't do that for the back branches. Back-patch to 9.0. 8.4 also has the problem, but since there's no hot standby there, it's much less pressing. In 9.2 and earlier, this patch is simpler than in newer branches, because multixact access during recovery isn't required. Add appropriate checks to make sure that's not happening. Andres Freund
1 parent 19af7d4 commit 8f8c666

File tree

1 file changed

+40
-14
lines changed

1 file changed

+40
-14
lines changed

src/backend/access/transam/multixact.c

Lines changed: 40 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -826,6 +826,10 @@ GetNewMultiXactId(int nxids, MultiXactOffset *offset)
826826
/* MultiXactIdSetOldestMember() must have been called already */
827827
Assert(MultiXactIdIsValid(OldestMemberMXactId[MyBackendId]));
828828

829+
/* safety check, we should never get this far in a HS slave */
830+
if (RecoveryInProgress())
831+
elog(ERROR, "cannot assign MultiXactIds during recovery");
832+
829833
LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE);
830834

831835
/* Handle wraparound of the nextMXact counter */
@@ -913,6 +917,10 @@ GetMultiXactIdMembers(MultiXactId multi, TransactionId **xids)
913917

914918
Assert(MultiXactIdIsValid(multi));
915919

920+
/* safety check, we should never get this far in a HS slave */
921+
if (RecoveryInProgress())
922+
elog(ERROR, "cannot GetMultiXactIdMembers() during recovery");
923+
916924
/* See if the MultiXactId is in the local cache */
917925
length = mXactCacheGetById(multi, xids);
918926
if (length >= 0)
@@ -1512,14 +1520,37 @@ ZeroMultiXactMemberPage(int pageno, bool writeXlog)
15121520
* This must be called ONCE during postmaster or standalone-backend startup.
15131521
*
15141522
* StartupXLOG has already established nextMXact/nextOffset by calling
1515-
* MultiXactSetNextMXact and/or MultiXactAdvanceNextMXact. Note that we
1516-
* may already have replayed WAL data into the SLRU files.
1517-
*
1518-
* We don't need any locks here, really; the SLRU locks are taken
1519-
* only because slru.c expects to be called with locks held.
1523+
* MultiXactSetNextMXact and/or MultiXactAdvanceNextMXact, but we haven't yet
1524+
* replayed WAL.
15201525
*/
15211526
void
15221527
StartupMultiXact(void)
1528+
{
1529+
MultiXactId multi = MultiXactState->nextMXact;
1530+
MultiXactOffset offset = MultiXactState->nextOffset;
1531+
int pageno;
1532+
1533+
/*
1534+
* Initialize our idea of the latest page number.
1535+
*/
1536+
pageno = MultiXactIdToOffsetPage(multi);
1537+
MultiXactOffsetCtl->shared->latest_page_number = pageno;
1538+
1539+
/*
1540+
* Initialize our idea of the latest page number.
1541+
*/
1542+
pageno = MXOffsetToMemberPage(offset);
1543+
MultiXactMemberCtl->shared->latest_page_number = pageno;
1544+
}
1545+
1546+
/*
1547+
* This must be called ONCE at the end of startup/recovery.
1548+
*
1549+
* We don't need any locks here, really; the SLRU locks are taken only because
1550+
* slru.c expects to be called with locks held.
1551+
*/
1552+
void
1553+
TrimMultiXact(void)
15231554
{
15241555
MultiXactId multi = MultiXactState->nextMXact;
15251556
MultiXactOffset offset = MultiXactState->nextOffset;
@@ -1530,7 +1561,7 @@ StartupMultiXact(void)
15301561
LWLockAcquire(MultiXactOffsetControlLock, LW_EXCLUSIVE);
15311562

15321563
/*
1533-
* Initialize our idea of the latest page number.
1564+
* (Re-)Initialize our idea of the latest page number.
15341565
*/
15351566
pageno = MultiXactIdToOffsetPage(multi);
15361567
MultiXactOffsetCtl->shared->latest_page_number = pageno;
@@ -1560,7 +1591,7 @@ StartupMultiXact(void)
15601591
LWLockAcquire(MultiXactMemberControlLock, LW_EXCLUSIVE);
15611592

15621593
/*
1563-
* Initialize our idea of the latest page number.
1594+
* (Re-)Initialize our idea of the latest page number.
15641595
*/
15651596
pageno = MXOffsetToMemberPage(offset);
15661597
MultiXactMemberCtl->shared->latest_page_number = pageno;
@@ -1639,14 +1670,9 @@ CheckPointMultiXact(void)
16391670

16401671
/*
16411672
* Truncate the SLRU files. This could be done at any time, but
1642-
* checkpoint seems a reasonable place for it. There is one exception: if
1643-
* we are called during xlog recovery, then shared->latest_page_number
1644-
* isn't valid (because StartupMultiXact hasn't been called yet) and so
1645-
* SimpleLruTruncate would get confused. It seems best not to risk
1646-
* removing any data during recovery anyway, so don't truncate.
1673+
* checkpoint seems a reasonable place for it.
16471674
*/
1648-
if (!RecoveryInProgress())
1649-
TruncateMultiXact();
1675+
TruncateMultiXact();
16501676

16511677
TRACE_POSTGRESQL_MULTIXACT_CHECKPOINT_DONE(true);
16521678
}

0 commit comments

Comments
 (0)