Skip to content

Commit 2e51ae1

Browse files
committed
Fix torn-page, unlogged xid and further risks from heap_update().
When heap_update needs to look for a page for the new tuple version, because the current one doesn't have sufficient free space, or when columns have to be processed by the tuple toaster, it has to release the lock on the old page during that. Otherwise there'd be lock ordering and lock nesting issues. To avoid concurrent sessions from trying to update / delete / lock the tuple while the page's content lock is released, the tuple's xmax is set to the current session's xid. That unfortunately was done without any WAL logging, thereby violating the rule that no XIDs may appear on disk, without an according WAL record. If the database were to crash / fail over when the page level lock is released, and some activity lead to the page being written out to disk, the xid could end up being reused; potentially leading to the row becoming invisible. There might be additional risks by not having t_ctid point at the tuple itself, without having set the appropriate lock infomask fields. To fix, compute the appropriate xmax/infomask combination for locking the tuple, and perform WAL logging using the existing XLOG_HEAP_LOCK record. That allows the fix to be backpatched. This issue has existed for a long time. There appears to have been partial attempts at preventing dangers, but these never have fully been implemented, and were removed a long time ago, in 1191916 (cf. HEAP_XMAX_UNLOGGED). In master / 9.6, there's an additional issue, namely that the visibilitymap's freeze bit isn't reset at that point yet. Since that's a new issue, introduced only in a892234, that'll be fixed in a separate commit. Author: Masahiko Sawada and Andres Freund Reported-By: Different aspects by Thomas Munro, Noah Misch, and others Discussion: CAEepm=3fWAbWryVW9swHyLTY4sXVf0xbLvXqOwUoDiNCx9mBjQ@mail.gmail.com Backpatch: 9.1/all supported versions
1 parent 46acbeb commit 2e51ae1

File tree

1 file changed

+82
-23
lines changed

1 file changed

+82
-23
lines changed

src/backend/access/heap/heapam.c

Lines changed: 82 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -2985,8 +2985,7 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
29852985
newbuf,
29862986
vmbuffer = InvalidBuffer,
29872987
vmbuffer_new = InvalidBuffer;
2988-
bool need_toast,
2989-
already_marked;
2988+
bool need_toast;
29902989
Size newtupsize,
29912990
pagefree;
29922991
bool have_tuple_lock = false;
@@ -3413,8 +3412,8 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
34133412
* on the same page as the old, then we need to release the content lock
34143413
* (but not the pin!) on the old tuple's buffer while we are off doing
34153414
* TOAST and/or table-file-extension work. We must mark the old tuple to
3416-
* show that it's already being updated, else other processes may try to
3417-
* update it themselves.
3415+
* show that it's locked, else other processes may try to update it
3416+
* themselves.
34183417
*
34193418
* We need to invoke the toaster if there are already any out-of-line
34203419
* toasted values present, or if the new tuple is over-threshold.
@@ -3438,19 +3437,82 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
34383437

34393438
if (need_toast || newtupsize > pagefree)
34403439
{
3440+
TransactionId xmax_lock_old_tuple;
3441+
uint16 infomask_lock_old_tuple,
3442+
infomask2_lock_old_tuple;
3443+
3444+
/*
3445+
* To prevent concurrent sessions from updating the tuple, we have to
3446+
* temporarily mark it locked, while we release the lock.
3447+
*
3448+
* To satisfy the rule that any xid potentially appearing in a buffer
3449+
* written out to disk, we unfortunately have to WAL log this
3450+
* temporary modification. We can reuse xl_heap_lock for this
3451+
* purpose. If we crash/error before following through with the
3452+
* actual update, xmax will be of an aborted transaction, allowing
3453+
* other sessions to proceed.
3454+
*/
3455+
3456+
/*
3457+
* Compute xmax / infomask appropriate for locking the tuple. This has
3458+
* to be done separately from the lock, because the potentially
3459+
* created multixact would otherwise be wrong.
3460+
*/
3461+
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
3462+
oldtup.t_data->t_infomask,
3463+
oldtup.t_data->t_infomask2,
3464+
xid, *lockmode, false,
3465+
&xmax_lock_old_tuple, &infomask_lock_old_tuple,
3466+
&infomask2_lock_old_tuple);
3467+
3468+
Assert(HEAP_XMAX_IS_LOCKED_ONLY(infomask_lock_old_tuple));
3469+
3470+
START_CRIT_SECTION();
3471+
34413472
/* Clear obsolete visibility flags ... */
34423473
oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
34433474
oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
34443475
HeapTupleClearHotUpdated(&oldtup);
34453476
/* ... and store info about transaction updating this tuple */
3446-
Assert(TransactionIdIsValid(xmax_old_tuple));
3447-
HeapTupleHeaderSetXmax(oldtup.t_data, xmax_old_tuple);
3448-
oldtup.t_data->t_infomask |= infomask_old_tuple;
3449-
oldtup.t_data->t_infomask2 |= infomask2_old_tuple;
3477+
Assert(TransactionIdIsValid(xmax_lock_old_tuple));
3478+
HeapTupleHeaderSetXmax(oldtup.t_data, xmax_lock_old_tuple);
3479+
oldtup.t_data->t_infomask |= infomask_lock_old_tuple;
3480+
oldtup.t_data->t_infomask2 |= infomask2_lock_old_tuple;
34503481
HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo);
3451-
/* temporarily make it look not-updated */
3482+
3483+
/* temporarily make it look not-updated, but locked */
34523484
oldtup.t_data->t_ctid = oldtup.t_self;
3453-
already_marked = true;
3485+
3486+
MarkBufferDirty(buffer);
3487+
3488+
if (RelationNeedsWAL(relation))
3489+
{
3490+
xl_heap_lock xlrec;
3491+
XLogRecPtr recptr;
3492+
XLogRecData rdata[2];
3493+
3494+
xlrec.target.node = relation->rd_node;
3495+
xlrec.target.tid = oldtup.t_self;
3496+
xlrec.locking_xid = xmax_lock_old_tuple;
3497+
xlrec.infobits_set = compute_infobits(oldtup.t_data->t_infomask,
3498+
oldtup.t_data->t_infomask2);
3499+
rdata[0].data = (char *) &xlrec;
3500+
rdata[0].len = SizeOfHeapLock;
3501+
rdata[0].buffer = InvalidBuffer;
3502+
rdata[0].next = &(rdata[1]);
3503+
3504+
rdata[1].data = NULL;
3505+
rdata[1].len = 0;
3506+
rdata[1].buffer = buffer;
3507+
rdata[1].buffer_std = true;
3508+
rdata[1].next = NULL;
3509+
3510+
recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK, rdata);
3511+
PageSetLSN(page, recptr);
3512+
}
3513+
3514+
END_CRIT_SECTION();
3515+
34543516
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
34553517

34563518
/*
@@ -3521,7 +3583,6 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
35213583
else
35223584
{
35233585
/* No TOAST work needed, and it'll fit on same page */
3524-
already_marked = false;
35253586
newbuf = buffer;
35263587
heaptup = newtup;
35273588
}
@@ -3600,18 +3661,16 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
36003661

36013662
RelationPutHeapTuple(relation, newbuf, heaptup); /* insert new tuple */
36023663

3603-
if (!already_marked)
3604-
{
3605-
/* Clear obsolete visibility flags ... */
3606-
oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
3607-
oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
3608-
/* ... and store info about transaction updating this tuple */
3609-
Assert(TransactionIdIsValid(xmax_old_tuple));
3610-
HeapTupleHeaderSetXmax(oldtup.t_data, xmax_old_tuple);
3611-
oldtup.t_data->t_infomask |= infomask_old_tuple;
3612-
oldtup.t_data->t_infomask2 |= infomask2_old_tuple;
3613-
HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo);
3614-
}
3664+
3665+
/* Clear obsolete visibility flags, possibly set by ourselves above... */
3666+
oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED);
3667+
oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED;
3668+
/* ... and store info about transaction updating this tuple */
3669+
Assert(TransactionIdIsValid(xmax_old_tuple));
3670+
HeapTupleHeaderSetXmax(oldtup.t_data, xmax_old_tuple);
3671+
oldtup.t_data->t_infomask |= infomask_old_tuple;
3672+
oldtup.t_data->t_infomask2 |= infomask2_old_tuple;
3673+
HeapTupleHeaderSetCmax(oldtup.t_data, cid, iscombo);
36153674

36163675
/* record address of new tuple in t_ctid of old one */
36173676
oldtup.t_data->t_ctid = heaptup->t_self;

0 commit comments

Comments
 (0)