Skip to content

Commit 98d5f36

Browse files
committed
In B-tree page deletion, clean up properly after page deletion failure.
In _bt_unlink_halfdead_page(), we might fail to find an immediate left sibling of the target page, perhaps because of corruption of the page sibling links. The code intends to cope with this by just abandoning the deletion attempt; but what actually happens is that it fails outright due to releasing the same buffer lock twice. (And error recovery masks a second problem, which is possible leakage of a pin on another page.) Seems to have been introduced by careless refactoring in commit efada2b. Since there are multiple cases to consider, let's make releasing the buffer lock in the failure case the responsibility of _bt_unlink_halfdead_page() not its caller. Also, avoid fetching the leaf page's left-link again after we've dropped lock on the page. This is probably harmless, but it's not exactly good coding practice. Per report from Kyotaro Horiguchi. Back-patch to 9.4 where the faulty code was introduced. Discussion: <20160803.173116.111915228.horiguchi.kyotaro@lab.ntt.co.jp>
1 parent a3cd60e commit 98d5f36

File tree

1 file changed

+23
-4
lines changed

1 file changed

+23
-4
lines changed

src/backend/access/nbtree/nbtpage.c

Lines changed: 23 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1318,7 +1318,7 @@ _bt_pagedel(Relation rel, Buffer buf)
13181318
{
13191319
if (!_bt_unlink_halfdead_page(rel, buf, &rightsib_empty))
13201320
{
1321-
_bt_relbuf(rel, buf);
1321+
/* _bt_unlink_halfdead_page already released buffer */
13221322
return ndeleted;
13231323
}
13241324
ndeleted++;
@@ -1542,6 +1542,11 @@ _bt_mark_page_halfdead(Relation rel, Buffer leafbuf, BTStack stack)
15421542
* Returns 'false' if the page could not be unlinked (shouldn't happen).
15431543
* If the (new) right sibling of the page is empty, *rightsib_empty is set
15441544
* to true.
1545+
*
1546+
* Must hold pin and lock on leafbuf at entry (read or write doesn't matter).
1547+
* On success exit, we'll be holding pin and write lock. On failure exit,
1548+
* we'll release both pin and lock before returning (we define it that way
1549+
* to avoid having to reacquire a lock we already released).
15451550
*/
15461551
static bool
15471552
_bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
@@ -1584,11 +1589,13 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
15841589
/*
15851590
* If the leaf page still has a parent pointing to it (or a chain of
15861591
* parents), we don't unlink the leaf page yet, but the topmost remaining
1587-
* parent in the branch.
1592+
* parent in the branch. Set 'target' and 'buf' to reference the page
1593+
* actually being unlinked.
15881594
*/
15891595
if (ItemPointerIsValid(leafhikey))
15901596
{
15911597
target = ItemPointerGetBlockNumber(leafhikey);
1598+
Assert(target != leafblkno);
15921599

15931600
/* fetch the block number of the topmost parent's left sibling */
15941601
buf = _bt_getbuf(rel, target, BT_READ);
@@ -1608,7 +1615,7 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
16081615
target = leafblkno;
16091616

16101617
buf = leafbuf;
1611-
leftsib = opaque->btpo_prev;
1618+
leftsib = leafleftsib;
16121619
targetlevel = 0;
16131620
}
16141621

@@ -1639,8 +1646,20 @@ _bt_unlink_halfdead_page(Relation rel, Buffer leafbuf, bool *rightsib_empty)
16391646
_bt_relbuf(rel, lbuf);
16401647
if (leftsib == P_NONE)
16411648
{
1642-
elog(LOG, "no left sibling (concurrent deletion?) in \"%s\"",
1649+
elog(LOG, "no left sibling (concurrent deletion?) of block %u in \"%s\"",
1650+
target,
16431651
RelationGetRelationName(rel));
1652+
if (target != leafblkno)
1653+
{
1654+
/* we have only a pin on target, but pin+lock on leafbuf */
1655+
ReleaseBuffer(buf);
1656+
_bt_relbuf(rel, leafbuf);
1657+
}
1658+
else
1659+
{
1660+
/* we have only a pin on leafbuf */
1661+
ReleaseBuffer(leafbuf);
1662+
}
16441663
return false;
16451664
}
16461665
lbuf = _bt_getbuf(rel, leftsib, BT_WRITE);

0 commit comments

Comments
 (0)