Skip to content

Commit 5abff19

Browse files
nbtree VACUUM: cope with right sibling link corruption.
Avoid "right sibling's left-link doesn't match" errors when vacuuming a corrupt nbtree index. Just LOG the issue and press on. That way VACUUM will have a decent chance of finishing off all required processing for the index (and for the table as a whole). This error was seen in the field from time to time (it's more than a theoretical risk), so giving VACUUM the ability to press on like this has real value. Nothing short of a REINDEX is expected to fix the underlying index corruption, so giving up (by throwing an error) risks making a bad situation far worse. Anything that blocks forward progress by VACUUM like this might go unnoticed for a long time. This could eventually lead to a wraparound/xidStopLimit outage. Note that _bt_unlink_halfdead_page() has always been able to bail on page deletion when the target page's left sibling page was in an inconsistent state. It now does the same thing (returns false to back out of the second phase of deletion) when it notices sibling link corruption in the target page's right sibling page. This is similar to the work from commit 5b861ba (later backpatched as commit 43e409c), which taught nbtree to press on with vacuuming an index when page deletion fails to "re-find" a downlink in the target page's parent page. The "re-find" check seems to make VACUUM bail on page deletion more often in practice, but there is no reason to take any chances here. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CAH2-Wzko2q2kP1+UvgJyP9g0mF4hopK0NtQZcxwvMv9_ytGhkQ@mail.gmail.com Backpatch: 11- (all supported versions).
1 parent 991a3df commit 5abff19

File tree

2 files changed

+45
-20
lines changed

2 files changed

+45
-20
lines changed

src/backend/access/nbtree/nbtpage.c

Lines changed: 44 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2404,24 +2404,22 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
24042404

24052405
if (!leftsibvalid)
24062406
{
2407-
if (target != leafblkno)
2408-
{
2409-
/* we have only a pin on target, but pin+lock on leafbuf */
2410-
ReleaseBuffer(buf);
2411-
_bt_relbuf(rel, leafbuf);
2412-
}
2413-
else
2414-
{
2415-
/* we have only a pin on leafbuf */
2416-
ReleaseBuffer(leafbuf);
2417-
}
2418-
2407+
/*
2408+
* This is known to fail in the field; sibling link corruption
2409+
* is relatively common. Press on with vacuuming rather than
2410+
* just throwing an ERROR.
2411+
*/
24192412
ereport(LOG,
24202413
(errcode(ERRCODE_INDEX_CORRUPTED),
24212414
errmsg_internal("valid left sibling for deletion target could not be located: "
2422-
"left sibling %u of target %u with leafblkno %u and scanblkno %u in index \"%s\"",
2415+
"left sibling %u of target %u with leafblkno %u and scanblkno %u on level %u of index \"%s\"",
24232416
leftsib, target, leafblkno, scanblkno,
2424-
RelationGetRelationName(rel))));
2417+
targetlevel, RelationGetRelationName(rel))));
2418+
2419+
/* Must release all pins and locks on failure exit */
2420+
ReleaseBuffer(buf);
2421+
if (target != leafblkno)
2422+
_bt_relbuf(rel, leafbuf);
24252423

24262424
return false;
24272425
}
@@ -2496,13 +2494,40 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
24962494
rbuf = _bt_getbuf(rel, vstate->info->heaprel, rightsib, BT_WRITE);
24972495
page = BufferGetPage(rbuf);
24982496
opaque = BTPageGetOpaque(page);
2497+
2498+
/*
2499+
* Validate target's right sibling page. Its left link must point back to
2500+
* the target page.
2501+
*/
24992502
if (opaque->btpo_prev != target)
2500-
ereport(ERROR,
2503+
{
2504+
/*
2505+
* This is known to fail in the field; sibling link corruption is
2506+
* relatively common. Press on with vacuuming rather than just
2507+
* throwing an ERROR (same approach used for left-sibling's-right-link
2508+
* validation check a moment ago).
2509+
*/
2510+
ereport(LOG,
25012511
(errcode(ERRCODE_INDEX_CORRUPTED),
25022512
errmsg_internal("right sibling's left-link doesn't match: "
2503-
"block %u links to %u instead of expected %u in index \"%s\"",
2504-
rightsib, opaque->btpo_prev, target,
2505-
RelationGetRelationName(rel))));
2513+
"right sibling %u of target %u with leafblkno %u "
2514+
"and scanblkno %u spuriously links to non-target %u "
2515+
"on level %u of index \"%s\"",
2516+
rightsib, target, leafblkno,
2517+
scanblkno, opaque->btpo_prev,
2518+
targetlevel, RelationGetRelationName(rel))));
2519+
2520+
/* Must release all pins and locks on failure exit */
2521+
if (BufferIsValid(lbuf))
2522+
_bt_relbuf(rel, lbuf);
2523+
_bt_relbuf(rel, rbuf);
2524+
_bt_relbuf(rel, buf);
2525+
if (target != leafblkno)
2526+
_bt_relbuf(rel, leafbuf);
2527+
2528+
return false;
2529+
}
2530+
25062531
rightsib_is_rightmost = P_RIGHTMOST(opaque);
25072532
*rightsib_empty = (P_FIRSTDATAKEY(opaque) > PageGetMaxOffsetNumber(page));
25082533

@@ -2727,6 +2752,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, BlockNumber scanblkno,
27272752
*/
27282753
_bt_pendingfsm_add(vstate, target, safexid);
27292754

2755+
/* Success - hold on to lock on leafbuf (might also have been target) */
27302756
return true;
27312757
}
27322758

src/backend/access/nbtree/nbtree.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1093,8 +1093,7 @@ btvacuumpage(BTVacState *vstate, BlockNumber scanblkno)
10931093
* can't be half-dead because only an interrupted VACUUM process can
10941094
* leave pages in that state, so we'd definitely have dealt with it
10951095
* back when the page was the scanblkno page (half-dead pages are
1096-
* always marked fully deleted by _bt_pagedel()). This assumes that
1097-
* there can be only one vacuum process running at a time.
1096+
* always marked fully deleted by _bt_pagedel(), barring corruption).
10981097
*/
10991098
if (!opaque || !P_ISLEAF(opaque) || P_ISHALFDEAD(opaque))
11001099
{

0 commit comments

Comments
 (0)