Skip to content

Commit 48064a8

Browse files
nbtree README: Add note about latestRemovedXid.
Point out that index tuple deletion generally needs a latestRemovedXid value for the deletion operation's WAL record. This is bound to be the most expensive part of the whole deletion operation now that it takes place up front, during original execution. This was arguably an oversight in commit 558a916, which moved the work required to generate these values from index deletion REDO routines to original execution of index deletion operations.
1 parent 73aa5e0 commit 48064a8

File tree

1 file changed

+27
-18
lines changed
  • src/backend/access/nbtree

1 file changed

+27
-18
lines changed

src/backend/access/nbtree/README

Lines changed: 27 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -490,24 +490,33 @@ lock on the leaf page).
490490
Once an index tuple has been marked LP_DEAD it can actually be deleted
491491
from the index immediately; since index scans only stop "between" pages,
492492
no scan can lose its place from such a deletion. We separate the steps
493-
because we allow LP_DEAD to be set with only a share lock (it's exactly
494-
like a hint bit for a heap tuple), but physically removing tuples requires
495-
exclusive lock. Also, delaying the deletion often allows us to pick up
496-
extra index tuples that weren't initially safe for index scans to mark
497-
LP_DEAD. We do this with index tuples whose TIDs point to the same table
498-
blocks as an LP_DEAD-marked tuple. They're practically free to check in
499-
passing, and have a pretty good chance of being safe to delete due to
500-
various locality effects.
501-
502-
We only try to delete LP_DEAD tuples (and nearby tuples) when we are
503-
otherwise faced with having to split a page to do an insertion (and hence
504-
have exclusive lock on it already). Deduplication and bottom-up index
505-
deletion can also prevent a page split, but simple deletion is always our
506-
preferred approach. (Note that posting list tuples can only have their
507-
LP_DEAD bit set when every table TID within the posting list is known
508-
dead. This isn't much of a problem in practice because LP_DEAD bits are
509-
just a starting point for simple deletion -- we still manage to perform
510-
granular deletes of posting list TIDs quite often.)
493+
because we allow LP_DEAD to be set with only a share lock (it's like a
494+
hint bit for a heap tuple), but physically deleting tuples requires an
495+
exclusive lock. We also need to generate a latestRemovedXid value for
496+
each deletion operation's WAL record, which requires additional
497+
coordinating with the tableam when the deletion actually takes place.
498+
(This latestRemovedXid value may be used to generate a recovery conflict
499+
during subsequent REDO of the record by a standby.)
500+
501+
Delaying and batching index tuple deletion like this enables a further
502+
optimization: opportunistic checking of "extra" nearby index tuples
503+
(tuples that are not LP_DEAD-set) when they happen to be very cheap to
504+
check in passing (because we already know that the tableam will be
505+
visiting their table block to generate a latestRemovedXid value). Any
506+
index tuples that turn out to be safe to delete will also be deleted.
507+
Simple deletion will behave as if the extra tuples that actually turn
508+
out to be delete-safe had their LP_DEAD bits set right from the start.
509+
510+
Deduplication can also prevent a page split, but index tuple deletion is
511+
our preferred approach. Note that posting list tuples can only have
512+
their LP_DEAD bit set when every table TID within the posting list is
513+
known dead. This isn't much of a problem in practice because LP_DEAD
514+
bits are just a starting point for deletion. What really matters is
515+
that _some_ deletion operation that targets related nearby-in-table TIDs
516+
takes place at some point before the page finally splits. That's all
517+
that's required for the deletion process to perform granular removal of
518+
groups of dead TIDs from posting list tuples (without the situation ever
519+
being allowed to get out of hand).
511520

512521
It's sufficient to have an exclusive lock on the index page, not a
513522
super-exclusive lock, to do deletion of LP_DEAD items. It might seem

0 commit comments

Comments
 (0)