Skip to content

Commit 232c7cb

Browse files
committed
Fix freezing of a dead HOT-updated tuple
Vacuum calls page-level HOT prune to remove dead HOT tuples before doing liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But concurrent transaction commit/abort may turn DEAD some of the HOT tuples that survived the prune, before HeapTupleSatisfiesVacuum tests them. This happens to activate the code that decides to freeze the tuple ... which resuscitates it, duplicating data. (This is especially bad if there's any unique constraints, because those are now internally violated due to the duplicate entries, though you won't know until you try to REINDEX or dump/restore the table.) One possible fix would be to simply skip doing anything to the tuple, and hope that the next HOT prune would remove it. But there is a problem: if the tuple is older than freeze horizon, this would leave an unfrozen XID behind, and if no HOT prune happens to clean it up before the containing pg_clog segment is truncated away, it'd later cause an error when the XID is looked up. Fix the problem by having the tuple freezing routines cope with the situation: don't freeze the tuple (and keep it dead). In the cases that the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag so that there is no need to look up the XID in pg_clog later on. An isolation test is included, authored by Michael Paquier, loosely based on Daniel Wood's original reproducer. It only tests one particular scenario, though, not all the possible ways for this problem to surface; it be good to have a more reliable way to test this more fully, but it'd require more work. In message https://postgr.es/m/20170911140103.5akxptyrwgpc25bw@alvherre.pgsql I outlined another test case (more closely matching Dan Wood's) that exposed a few more ways for the problem to occur. Backpatch all the way back to 9.3, where this problem was introduced by multixact juggling. In branches 9.3 and 9.4, this includes a backpatch of commit e5ff9fefcd50 (of 9.5 era), since the original is not correctable without matching the coding pattern in 9.5 up. Reported-by: Daniel Wood Diagnosed-by: Daniel Wood Reviewed-by: Yi Wen Wong, Michaël Paquier Discussion: https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com
1 parent c38575e commit 232c7cb

File tree

5 files changed

+187
-49
lines changed

5 files changed

+187
-49
lines changed

src/backend/access/heap/heapam.c

Lines changed: 50 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -5687,14 +5687,23 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
56875687
Assert(TransactionIdIsValid(xid));
56885688

56895689
/*
5690-
* If the xid is older than the cutoff, it has to have aborted,
5691-
* otherwise the tuple would have gotten pruned away.
5690+
* The updating transaction cannot possibly be still running, but
5691+
* verify whether it has committed, and request to set the
5692+
* COMMITTED flag if so. (We normally don't see any tuples in
5693+
* this state, because they are removed by page pruning before we
5694+
* try to freeze the page; but this can happen if the updating
5695+
* transaction commits after the page is pruned but before
5696+
* HeapTupleSatisfiesVacuum).
56925697
*/
56935698
if (TransactionIdPrecedes(xid, cutoff_xid))
56945699
{
5695-
Assert(!TransactionIdDidCommit(xid));
5696-
*flags |= FRM_INVALIDATE_XMAX;
5697-
xid = InvalidTransactionId; /* not strictly necessary */
5700+
if (TransactionIdDidCommit(xid))
5701+
*flags = FRM_MARK_COMMITTED | FRM_RETURN_IS_XID;
5702+
else
5703+
{
5704+
*flags |= FRM_INVALIDATE_XMAX;
5705+
xid = InvalidTransactionId; /* not strictly necessary */
5706+
}
56985707
}
56995708
else
57005709
{
@@ -5765,65 +5774,51 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
57655774

57665775
/*
57675776
* It's an update; should we keep it? If the transaction is known
5768-
* aborted then it's okay to ignore it, otherwise not. However,
5769-
* if the Xid is older than the cutoff_xid, we must remove it.
5770-
* Note that such an old updater cannot possibly be committed,
5771-
* because HeapTupleSatisfiesVacuum would have returned
5772-
* HEAPTUPLE_DEAD and we would not be trying to freeze the tuple.
5773-
*
5774-
* Note the TransactionIdDidAbort() test is just an optimization
5775-
* and not strictly necessary for correctness.
5777+
* aborted or crashed then it's okay to ignore it, otherwise not.
57765778
*
57775779
* As with all tuple visibility routines, it's critical to test
5778-
* TransactionIdIsInProgress before the transam.c routines,
5780+
* TransactionIdIsInProgress before TransactionIdDidCommit,
57795781
* because of race conditions explained in detail in tqual.c.
5782+
*
5783+
* We normally don't see committed updating transactions earlier
5784+
* than the cutoff xid, because they are removed by page pruning
5785+
* before we try to freeze the page; but it can happen if the
5786+
* updating transaction commits after the page is pruned but
5787+
* before HeapTupleSatisfiesVacuum.
57805788
*/
57815789
if (TransactionIdIsCurrentTransactionId(xid) ||
57825790
TransactionIdIsInProgress(xid))
57835791
{
57845792
Assert(!TransactionIdIsValid(update_xid));
57855793
update_xid = xid;
57865794
}
5787-
else if (!TransactionIdDidAbort(xid))
5795+
else if (TransactionIdDidCommit(xid))
57885796
{
57895797
/*
5790-
* Test whether to tell caller to set HEAP_XMAX_COMMITTED
5791-
* while we have the Xid still in cache. Note this can only
5792-
* be done if the transaction is known not running.
5798+
* The transaction committed, so we can tell caller to set
5799+
* HEAP_XMAX_COMMITTED. (We can only do this because we know
5800+
* the transaction is not running.)
57935801
*/
5794-
if (TransactionIdDidCommit(xid))
5795-
update_committed = true;
57965802
Assert(!TransactionIdIsValid(update_xid));
5803+
update_committed = true;
57975804
update_xid = xid;
57985805
}
57995806

5807+
/*
5808+
* Not in progress, not committed -- must be aborted or crashed;
5809+
* we can ignore it.
5810+
*/
5811+
58005812
/*
58015813
* If we determined that it's an Xid corresponding to an update
58025814
* that must be retained, additionally add it to the list of
5803-
* members of the new Multis, in case we end up using that. (We
5815+
* members of the new Multi, in case we end up using that. (We
58045816
* might still decide to use only an update Xid and not a multi,
58055817
* but it's easier to maintain the list as we walk the old members
58065818
* list.)
5807-
*
5808-
* It is possible to end up with a very old updater Xid that
5809-
* crashed and thus did not mark itself as aborted in pg_clog.
5810-
* That would manifest as a pre-cutoff Xid. Make sure to ignore
5811-
* it.
58125819
*/
58135820
if (TransactionIdIsValid(update_xid))
5814-
{
5815-
if (!TransactionIdPrecedes(update_xid, cutoff_xid))
5816-
{
5817-
newmembers[nnewmembers++] = members[i];
5818-
}
5819-
else
5820-
{
5821-
/* cannot have committed: would be HEAPTUPLE_DEAD */
5822-
Assert(!TransactionIdDidCommit(update_xid));
5823-
update_xid = InvalidTransactionId;
5824-
update_committed = false;
5825-
}
5826-
}
5821+
newmembers[nnewmembers++] = members[i];
58275822
}
58285823
else
58295824
{
@@ -5890,7 +5885,10 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
58905885
*
58915886
* It is assumed that the caller has checked the tuple with
58925887
* HeapTupleSatisfiesVacuum() and determined that it is not HEAPTUPLE_DEAD
5893-
* (else we should be removing the tuple, not freezing it).
5888+
* (else we should be removing the tuple, not freezing it). However, note
5889+
* that we don't remove HOT tuples even if they are dead, and it'd be incorrect
5890+
* to freeze them (because that would make them visible), so we mark them as
5891+
* update-committed, and needing further freezing later on.
58945892
*
58955893
* NB: cutoff_xid *must* be <= the current global xmin, to ensure that any
58965894
* XID older than it could neither be running nor seen as running by any
@@ -5995,7 +5993,18 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
59955993
else if (TransactionIdIsNormal(xid) &&
59965994
TransactionIdPrecedes(xid, cutoff_xid))
59975995
{
5998-
freeze_xmax = true;
5996+
/*
5997+
* Must freeze regular XIDs older than the cutoff. We must not freeze
5998+
* a HOT-updated tuple, though; doing so would bring it back to life.
5999+
*/
6000+
if (HeapTupleHeaderIsHotUpdated(tuple))
6001+
{
6002+
frz->t_infomask |= HEAP_XMAX_COMMITTED;
6003+
changed = true;
6004+
/* must not freeze */
6005+
}
6006+
else
6007+
freeze_xmax = true;
59996008
}
60006009

60016010
if (freeze_xmax)

src/backend/commands/vacuumlazy.c

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1688,15 +1688,15 @@ lazy_record_dead_tuple(LVRelStats *vacrelstats,
16881688
ItemPointer itemptr)
16891689
{
16901690
/*
1691-
* The array shouldn't overflow under normal behavior, but perhaps it
1692-
* could if we are given a really small maintenance_work_mem. In that
1693-
* case, just forget the last few tuples (we'll get 'em next time).
1691+
* The array must never overflow, since we rely on all deletable tuples
1692+
* being removed; inability to remove a tuple might cause an old XID to
1693+
* persist beyond the freeze limit, which could be disastrous later on.
16941694
*/
1695-
if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples)
1696-
{
1697-
vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
1698-
vacrelstats->num_dead_tuples++;
1699-
}
1695+
if (vacrelstats->num_dead_tuples >= vacrelstats->max_dead_tuples)
1696+
elog(ERROR, "dead tuple array overflow");
1697+
1698+
vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
1699+
vacrelstats->num_dead_tuples++;
17001700
}
17011701

17021702
/*
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
Parsed test spec with 2 sessions
2+
3+
starting permutation: s1_update s1_commit s1_vacuum s2_key_share s2_commit
4+
step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
5+
step s1_commit: COMMIT;
6+
step s1_vacuum: VACUUM FREEZE tab_freeze;
7+
step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
8+
id
9+
10+
3
11+
step s2_commit: COMMIT;
12+
13+
starting permutation: s1_update s1_commit s2_key_share s1_vacuum s2_commit
14+
step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
15+
step s1_commit: COMMIT;
16+
step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
17+
id
18+
19+
3
20+
step s1_vacuum: VACUUM FREEZE tab_freeze;
21+
step s2_commit: COMMIT;
22+
23+
starting permutation: s1_update s1_commit s2_key_share s2_commit s1_vacuum
24+
step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
25+
step s1_commit: COMMIT;
26+
step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
27+
id
28+
29+
3
30+
step s2_commit: COMMIT;
31+
step s1_vacuum: VACUUM FREEZE tab_freeze;
32+
33+
starting permutation: s1_update s2_key_share s1_commit s1_vacuum s2_commit
34+
step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
35+
step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
36+
id
37+
38+
3
39+
step s1_commit: COMMIT;
40+
step s1_vacuum: VACUUM FREEZE tab_freeze;
41+
step s2_commit: COMMIT;
42+
43+
starting permutation: s1_update s2_key_share s1_commit s2_commit s1_vacuum
44+
step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
45+
step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
46+
id
47+
48+
3
49+
step s1_commit: COMMIT;
50+
step s2_commit: COMMIT;
51+
step s1_vacuum: VACUUM FREEZE tab_freeze;
52+
53+
starting permutation: s1_update s2_key_share s2_commit s1_commit s1_vacuum
54+
step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
55+
step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
56+
id
57+
58+
3
59+
step s2_commit: COMMIT;
60+
step s1_commit: COMMIT;
61+
step s1_vacuum: VACUUM FREEZE tab_freeze;
62+
63+
starting permutation: s2_key_share s1_update s1_commit s1_vacuum s2_commit
64+
step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
65+
id
66+
67+
3
68+
step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
69+
step s1_commit: COMMIT;
70+
step s1_vacuum: VACUUM FREEZE tab_freeze;
71+
step s2_commit: COMMIT;
72+
73+
starting permutation: s2_key_share s1_update s1_commit s2_commit s1_vacuum
74+
step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
75+
id
76+
77+
3
78+
step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
79+
step s1_commit: COMMIT;
80+
step s2_commit: COMMIT;
81+
step s1_vacuum: VACUUM FREEZE tab_freeze;
82+
83+
starting permutation: s2_key_share s1_update s2_commit s1_commit s1_vacuum
84+
step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
85+
id
86+
87+
3
88+
step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
89+
step s2_commit: COMMIT;
90+
step s1_commit: COMMIT;
91+
step s1_vacuum: VACUUM FREEZE tab_freeze;
92+
93+
starting permutation: s2_key_share s2_commit s1_update s1_commit s1_vacuum
94+
step s2_key_share: SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE;
95+
id
96+
97+
3
98+
step s2_commit: COMMIT;
99+
step s1_update: UPDATE tab_freeze SET x = x + 1 WHERE id = 3;
100+
step s1_commit: COMMIT;
101+
step s1_vacuum: VACUUM FREEZE tab_freeze;

src/test/isolation/isolation_schedule

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ test: lock-committed-keyupdate
3030
test: update-locked-tuple
3131
test: propagate-lock-delete
3232
test: tuplelock-conflict
33+
test: freeze-the-dead
3334
test: nowait
3435
test: nowait-2
3536
test: nowait-3
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Test for interactions of tuple freezing with dead, as well as recently-dead
2+
# tuples using multixacts via FOR KEY SHARE.
3+
setup
4+
{
5+
CREATE TABLE tab_freeze (
6+
id int PRIMARY KEY,
7+
name char(3),
8+
x int);
9+
INSERT INTO tab_freeze VALUES (1, '111', 0);
10+
INSERT INTO tab_freeze VALUES (3, '333', 0);
11+
}
12+
13+
teardown
14+
{
15+
DROP TABLE tab_freeze;
16+
}
17+
18+
session "s1"
19+
setup { BEGIN; }
20+
step "s1_update" { UPDATE tab_freeze SET x = x + 1 WHERE id = 3; }
21+
step "s1_commit" { COMMIT; }
22+
step "s1_vacuum" { VACUUM FREEZE tab_freeze; }
23+
24+
session "s2"
25+
setup { BEGIN; }
26+
step "s2_key_share" { SELECT id FROM tab_freeze WHERE id = 3 FOR KEY SHARE; }
27+
step "s2_commit" { COMMIT; }

0 commit comments

Comments
 (0)