Skip to content

Commit 66235ba

Browse files
committed
Fix rare assertion failure in standby, if primary is restarted
During hot standby, ExpireAllKnownAssignedTransactionIds() and ExpireOldKnownAssignedTransactionIds() functions mark old transactions as no-longer running, but they failed to update xactCompletionCount and latestCompletedXid. AFAICS it would not lead to incorrect query results, because those functions effectively turn in-progress transactions into aborted transactions and an MVCC snapshot considers both as "not visible". But it could surprise GetSnapshotDataReuse() and trigger the "TransactionIdPrecedesOrEquals(TransactionXmin, RecentXmin))" assertion in it, if the apparent xmin in a backend would move backwards. We saw this happen when GetCatalogSnapshot() would reuse an older catalog snapshot, when GetTransactionSnapshot() had already advanced TransactionXmin. The bug goes back all the way to commit 623a9ba in v14 that introduced the snapshot reuse mechanism, but it started to happen more frequently with commit 952365c which removed a GetTransactionSnapshot() call from backend startup. That made it more likely for ExpireOldKnownAssignedTransactionIds() to be called between GetCatalogSnapshot() and the first GetTransactionSnapshot() in a backend. Andres Freund first spotted this assertion failure on buildfarm member 'skink'. Reproduction and analysis by Tomas Vondra. Backpatch-through: 14 Discussion: https://www.postgresql.org/message-id/oey246mcw43cy4qw2hqjmurbd62lfdpcuxyqiu7botx3typpax%40h7o7mfg5zmdj
1 parent c196c61 commit 66235ba

File tree

1 file changed

+24
-0
lines changed

1 file changed

+24
-0
lines changed

src/backend/storage/ipc/procarray.c

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4609,9 +4609,23 @@ ExpireTreeKnownAssignedTransactionIds(TransactionId xid, int nsubxids,
46094609
void
46104610
ExpireAllKnownAssignedTransactionIds(void)
46114611
{
4612+
FullTransactionId latestXid;
4613+
46124614
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
46134615
KnownAssignedXidsRemovePreceding(InvalidTransactionId);
46144616

4617+
/* Reset latestCompletedXid to nextXid - 1 */
4618+
Assert(FullTransactionIdIsValid(ShmemVariableCache->nextXid));
4619+
latestXid = ShmemVariableCache->nextXid;
4620+
FullTransactionIdRetreat(&latestXid);
4621+
ShmemVariableCache->latestCompletedXid = latestXid;
4622+
4623+
/*
4624+
* Any transactions that were in-progress were effectively aborted, so
4625+
* advance xactCompletionCount.
4626+
*/
4627+
ShmemVariableCache->xactCompletionCount++;
4628+
46154629
/*
46164630
* Reset lastOverflowedXid. Currently, lastOverflowedXid has no use after
46174631
* the call of this function. But do this for unification with what
@@ -4629,8 +4643,18 @@ ExpireAllKnownAssignedTransactionIds(void)
46294643
void
46304644
ExpireOldKnownAssignedTransactionIds(TransactionId xid)
46314645
{
4646+
TransactionId latestXid;
4647+
46324648
LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
46334649

4650+
/* As in ProcArrayEndTransaction, advance latestCompletedXid */
4651+
latestXid = xid;
4652+
TransactionIdRetreat(latestXid);
4653+
MaintainLatestCompletedXidRecovery(latestXid);
4654+
4655+
/* ... and xactCompletionCount */
4656+
ShmemVariableCache->xactCompletionCount++;
4657+
46344658
/*
46354659
* Reset lastOverflowedXid if we know all transactions that have been
46364660
* possibly running are being gone. Not doing so could cause an incorrect

0 commit comments

Comments
 (0)