Skip to content

Commit 7adc348

Browse files
Backpatch nbtree page deletion hardening.
Postgres 14 commit 5b861ba taught nbtree VACUUM to tolerate buggy opclasses. VACUUM's inability to locate a to-be-deleted page's downlink in the parent page was logged instead of throwing an error. VACUUM could just press on with vacuuming the index, and vacuuming the table as a whole. There are now anecdotal reports of this error causing problems that were much more disruptive than the underlying index corruption ever could be. Anything that makes VACUUM unable to make forward progress against one table/index ultimately risks making the system enter xidStopLimit mode. There is no good reason to take any chances here, so backpatch the hardening commit. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wzm9HR6Pow=t-iQa57zT8qmX6_M4h14F-pTtb=xFDW5FBA@mail.gmail.com Backpatch: 10-13 (all supported versions that lacked the hardening)
1 parent 844ac09 commit 7adc348

File tree

1 file changed

+19
-2
lines changed

1 file changed

+19
-2
lines changed

src/backend/access/nbtree/nbtpage.c

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1198,8 +1198,25 @@ _bt_lock_branch_parent(Relation rel, BlockNumber child, BTStack stack,
11981198
stack->bts_btentry = child;
11991199
pbuf = _bt_getstackbuf(rel, stack);
12001200
if (pbuf == InvalidBuffer)
1201-
elog(ERROR, "failed to re-find parent key in index \"%s\" for deletion target page %u",
1202-
RelationGetRelationName(rel), child);
1201+
{
1202+
/*
1203+
* Failed to "re-find" a pivot tuple whose downlink matched our child
1204+
* block number on the parent level -- the index must be corrupt.
1205+
* Don't even try to delete the leafbuf subtree. Just report the
1206+
* issue and press on with vacuuming the index.
1207+
*
1208+
* Note: _bt_getstackbuf() recovers from concurrent page splits that
1209+
* take place on the parent level. Its approach is a near-exhaustive
1210+
* linear search. This also gives it a surprisingly good chance of
1211+
* recovering in the event of a buggy or inconsistent opclass. But we
1212+
* don't rely on that here.
1213+
*/
1214+
ereport(LOG,
1215+
(errcode(ERRCODE_INDEX_CORRUPTED),
1216+
errmsg_internal("failed to re-find parent key in index \"%s\" for deletion target page %u",
1217+
RelationGetRelationName(rel), child)));
1218+
return false;
1219+
}
12031220
parent = stack->bts_blkno;
12041221
poffset = stack->bts_offset;
12051222

0 commit comments

Comments
 (0)