Skip to content

Commit 2393c7d

Browse files
committed
Fix a couple of bugs in MultiXactId freezing
Both heap_freeze_tuple() and heap_tuple_needs_freeze() neglected to look into a multixact to check the members against cutoff_xid. This means that a very old Xid could survive hidden within a multi, possibly outliving its CLOG storage. In the distant future, this would cause clog lookup failures: ERROR: could not access status of transaction 3883960912 DETAIL: Could not open file "pg_clog/0E78": No such file or directory. This mostly was problematic when the updating transaction aborted, since in that case the row wouldn't get pruned away earlier in vacuum and the multixact could possibly survive for a long time. In many cases, data that is inaccessible for this reason way can be brought back heuristically. As a second bug, heap_freeze_tuple() didn't properly handle multixacts that need to be frozen according to cutoff_multi, but whose updater xid is still alive. Instead of preserving the update Xid, it just set Xmax invalid, which leads to both old and new tuple versions becoming visible. This is pretty rare in practice, but a real threat nonetheless. Existing corrupted rows, unfortunately, cannot be repaired in an automated fashion. Existing physical replicas might have already incorrectly frozen tuples because of different behavior than in master, which might only become apparent in the future once pg_multixact/ is truncated; it is recommended that all clones be rebuilt after upgrading. Following code analysis caused by bug report by J Smith in message CADFUPgc5bmtv-yg9znxV-vcfkb+JPRqs7m2OesQXaM_4Z1JpdQ@mail.gmail.com and privately by F-Secure. Backpatch to 9.3, where freezing of MultiXactIds was introduced. Analysis and patch by Andres Freund, with some tweaks by Álvaro.
1 parent 1ce150b commit 2393c7d

File tree

2 files changed

+151
-23
lines changed

2 files changed

+151
-23
lines changed

src/backend/access/heap/heapam.c

Lines changed: 142 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5264,6 +5264,8 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
52645264
* so we need no external state checks to decide what to do. (This is good
52655265
* because this function is applied during WAL recovery, when we don't have
52665266
* access to any such state, and can't depend on the hint bits to be set.)
5267+
* There is an exception we make which is to assume GetMultiXactIdMembers can
5268+
* be called during recovery.
52675269
*
52685270
* Similarly, cutoff_multi must be less than or equal to the smallest
52695271
* MultiXactId used by any transaction currently open.
@@ -5279,14 +5281,19 @@ heap_inplace_update(Relation relation, HeapTuple tuple)
52795281
* exclusive lock ensures no other backend is in process of checking the
52805282
* tuple status. Also, getting exclusive lock makes it safe to adjust the
52815283
* infomask bits.
5284+
*
5285+
* NB: Cannot rely on hint bits here, they might not be set after a crash or
5286+
* on a standby.
52825287
*/
52835288
bool
52845289
heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
52855290
MultiXactId cutoff_multi)
52865291
{
52875292
bool changed = false;
5293+
bool freeze_xmax = false;
52885294
TransactionId xid;
52895295

5296+
/* Process xmin */
52905297
xid = HeapTupleHeaderGetXmin(tuple);
52915298
if (TransactionIdIsNormal(xid) &&
52925299
TransactionIdPrecedes(xid, cutoff_xid))
@@ -5303,16 +5310,112 @@ heap_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
53035310
}
53045311

53055312
/*
5306-
* Note that this code handles IS_MULTI Xmax values, too, but only to mark
5307-
* the tuple as not updated if the multixact is below the cutoff Multixact
5308-
* given; it doesn't remove dead members of a very old multixact.
5313+
* Process xmax. To thoroughly examine the current Xmax value we need to
5314+
* resolve a MultiXactId to its member Xids, in case some of them are
5315+
* below the given cutoff for Xids. In that case, those values might need
5316+
* freezing, too. Also, if a multi needs freezing, we cannot simply take
5317+
* it out --- if there's a live updater Xid, it needs to be kept.
5318+
*
5319+
* Make sure to keep heap_tuple_needs_freeze in sync with this.
53095320
*/
53105321
xid = HeapTupleHeaderGetRawXmax(tuple);
5311-
if ((tuple->t_infomask & HEAP_XMAX_IS_MULTI) ?
5312-
(MultiXactIdIsValid(xid) &&
5313-
MultiXactIdPrecedes(xid, cutoff_multi)) :
5314-
(TransactionIdIsNormal(xid) &&
5315-
TransactionIdPrecedes(xid, cutoff_xid)))
5322+
5323+
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
5324+
{
5325+
if (!MultiXactIdIsValid(xid))
5326+
{
5327+
/* no xmax set, ignore */
5328+
;
5329+
}
5330+
else if (MultiXactIdPrecedes(xid, cutoff_multi))
5331+
{
5332+
/*
5333+
* This old multi cannot possibly be running. If it was a locker
5334+
* only, it can be removed without much further thought; but if it
5335+
* contained an update, we need to preserve it.
5336+
*/
5337+
if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
5338+
freeze_xmax = true;
5339+
else
5340+
{
5341+
TransactionId update_xid;
5342+
5343+
update_xid = HeapTupleGetUpdateXid(tuple);
5344+
5345+
/*
5346+
* The multixact has an update hidden within. Get rid of it.
5347+
*
5348+
* If the update_xid is below the cutoff_xid, it necessarily
5349+
* must be an aborted transaction. In a primary server, such
5350+
* an Xmax would have gotten marked invalid by
5351+
* HeapTupleSatisfiesVacuum, but in a replica that is not
5352+
* called before we are, so deal with it in the same way.
5353+
*
5354+
* If not below the cutoff_xid, then the tuple would have been
5355+
* pruned by vacuum, if the update committed long enough ago,
5356+
* and we wouldn't be freezing it; so it's either recently
5357+
* committed, or in-progress. Deal with this by setting the
5358+
* Xmax to the update Xid directly and remove the IS_MULTI
5359+
* bit. (We know there cannot be running lockers in this
5360+
* multi, because it's below the cutoff_multi value.)
5361+
*/
5362+
5363+
if (TransactionIdPrecedes(update_xid, cutoff_xid))
5364+
{
5365+
Assert(InRecovery || TransactionIdDidAbort(update_xid));
5366+
freeze_xmax = true;
5367+
}
5368+
else
5369+
{
5370+
Assert(InRecovery || !TransactionIdIsInProgress(update_xid));
5371+
tuple->t_infomask &= ~HEAP_XMAX_BITS;
5372+
HeapTupleHeaderSetXmax(tuple, update_xid);
5373+
changed = true;
5374+
}
5375+
}
5376+
}
5377+
else if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
5378+
{
5379+
/* newer than the cutoff, so don't touch it */
5380+
;
5381+
}
5382+
else
5383+
{
5384+
TransactionId update_xid;
5385+
5386+
/*
5387+
* This is a multixact which is not marked LOCK_ONLY, but which
5388+
* is newer than the cutoff_multi. If the update_xid is below the
5389+
* cutoff_xid point, then we can just freeze the Xmax in the
5390+
* tuple, removing it altogether. This seems simple, but there
5391+
* are several underlying assumptions:
5392+
*
5393+
* 1. A tuple marked by an multixact containing a very old
5394+
* committed update Xid would have been pruned away by vacuum; we
5395+
* wouldn't be freezing this tuple at all.
5396+
*
5397+
* 2. There cannot possibly be any live locking members remaining
5398+
* in the multixact. This is because if they were alive, the
5399+
* update's Xid would had been considered, via the lockers'
5400+
* snapshot's Xmin, as part the cutoff_xid.
5401+
*
5402+
* 3. We don't create new MultiXacts via MultiXactIdExpand() that
5403+
* include a very old aborted update Xid: in that function we only
5404+
* include update Xids corresponding to transactions that are
5405+
* committed or in-progress.
5406+
*/
5407+
update_xid = HeapTupleGetUpdateXid(tuple);
5408+
if (TransactionIdPrecedes(update_xid, cutoff_xid))
5409+
freeze_xmax = true;
5410+
}
5411+
}
5412+
else if (TransactionIdIsNormal(xid) &&
5413+
TransactionIdPrecedes(xid, cutoff_xid))
5414+
{
5415+
freeze_xmax = true;
5416+
}
5417+
5418+
if (freeze_xmax)
53165419
{
53175420
HeapTupleHeaderSetXmax(tuple, InvalidTransactionId);
53185421

@@ -5632,6 +5735,9 @@ ConditionalMultiXactIdWait(MultiXactId multi, MultiXactStatus status,
56325735
*
56335736
* It doesn't matter whether the tuple is alive or dead, we are checking
56345737
* to see if a tuple needs to be removed or frozen to avoid wraparound.
5738+
*
5739+
* NB: Cannot rely on hint bits here, they might not be set after a crash or
5740+
* on a standby.
56355741
*/
56365742
bool
56375743
heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
@@ -5644,24 +5750,42 @@ heap_tuple_needs_freeze(HeapTupleHeader tuple, TransactionId cutoff_xid,
56445750
TransactionIdPrecedes(xid, cutoff_xid))
56455751
return true;
56465752

5647-
if (!(tuple->t_infomask & HEAP_XMAX_INVALID))
5753+
/*
5754+
* The considerations for multixacts are complicated; look at
5755+
* heap_freeze_tuple for justifications. This routine had better be in
5756+
* sync with that one!
5757+
*/
5758+
if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
56485759
{
5649-
if (!(tuple->t_infomask & HEAP_XMAX_IS_MULTI))
5760+
MultiXactId multi;
5761+
5762+
multi = HeapTupleHeaderGetRawXmax(tuple);
5763+
if (!MultiXactIdIsValid(multi))
56505764
{
5651-
xid = HeapTupleHeaderGetRawXmax(tuple);
5652-
if (TransactionIdIsNormal(xid) &&
5653-
TransactionIdPrecedes(xid, cutoff_xid))
5654-
return true;
5765+
/* no xmax set, ignore */
5766+
;
5767+
}
5768+
else if (MultiXactIdPrecedes(multi, cutoff_multi))
5769+
return true;
5770+
else if (HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask))
5771+
{
5772+
/* only-locker multis don't need internal examination */
5773+
;
56555774
}
56565775
else
56575776
{
5658-
MultiXactId multi;
5659-
5660-
multi = HeapTupleHeaderGetRawXmax(tuple);
5661-
if (MultiXactIdPrecedes(multi, cutoff_multi))
5777+
if (TransactionIdPrecedes(HeapTupleGetUpdateXid(tuple),
5778+
cutoff_xid))
56625779
return true;
56635780
}
56645781
}
5782+
else
5783+
{
5784+
xid = HeapTupleHeaderGetRawXmax(tuple);
5785+
if (TransactionIdIsNormal(xid) &&
5786+
TransactionIdPrecedes(xid, cutoff_xid))
5787+
return true;
5788+
}
56655789

56665790
if (tuple->t_infomask & HEAP_MOVED)
56675791
{

src/backend/access/transam/multixact.c

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -434,11 +434,14 @@ MultiXactIdExpand(MultiXactId multi, TransactionId xid, MultiXactStatus status)
434434
* Determine which of the members of the MultiXactId are still of
435435
* interest. This is any running transaction, and also any transaction
436436
* that grabbed something stronger than just a lock and was committed. (An
437-
* update that aborted is of no interest here.)
437+
* update that aborted is of no interest here; and having more than one
438+
* update Xid in a multixact would cause errors elsewhere.)
438439
*
439-
* (Removing dead members is just an optimization, but a useful one. Note
440-
* we have the same race condition here as above: j could be 0 at the end
441-
* of the loop.)
440+
* Removing dead members is not just an optimization: freezing of tuples
441+
* whose Xmax are multis depends on this behavior.
442+
*
443+
* Note we have the same race condition here as above: j could be 0 at the
444+
* end of the loop.
442445
*/
443446
newMembers = (MultiXactMember *)
444447
palloc(sizeof(MultiXactMember) * (nmembers + 1));
@@ -1052,7 +1055,8 @@ GetMultiXactIdMembers(MultiXactId multi, MultiXactMember **members,
10521055

10531056
debug_elog3(DEBUG2, "GetMembers: asked for %u", multi);
10541057

1055-
Assert(MultiXactIdIsValid(multi));
1058+
if (!MultiXactIdIsValid(multi))
1059+
return -1;
10561060

10571061
/* See if the MultiXactId is in the local cache */
10581062
length = mXactCacheGetById(multi, members);

0 commit comments

Comments
 (0)