Skip to content

Commit 074251d

Browse files
committed
Adjustments to the btree fastpath optimization.
This optimization was introduced in commit 2b27273. The changes include some additional comments and documentation, and also these more substantive changes: . ensure the optimization is only applied on the leaf node of a tree whose root is on level 2 or more. It's of little value on small trees. . Delay calling RelationSetTargetBlock() until after the critical section of _bt_insertonpg . ensure the optimization is also applied to unlogged tables. Pavan Deolasee and Peter Geoghegan with some very light editing from me. Discussion: https://postgr.es/m/CABOikdO8jhRarNC60nZLktZYhxt+TK8z_V97+Ny499YQdyAfug@mail.gmail.com
1 parent 31f1f0b commit 074251d

File tree

2 files changed

+71
-11
lines changed

2 files changed

+71
-11
lines changed

src/backend/access/nbtree/README

+19
Original file line numberDiff line numberDiff line change
@@ -375,6 +375,25 @@ positives, so long as it never gives a false negative. This makes it
375375
possible to implement the test with a small counter value stored on each
376376
index page.
377377

378+
Fastpath For Index Insertion
379+
----------------------------
380+
381+
We optimize for a common case of insertion of increasing index key
382+
values by caching the last page to which this backend inserted the last
383+
value, if this page was the rightmost leaf page. For the next insert, we
384+
can then quickly check if the cached page is still the rightmost leaf
385+
page and also the correct place to hold the current value. We can avoid
386+
the cost of walking down the tree in such common cases.
387+
388+
The optimization works on the assumption that there can only be one
389+
non-ignorable leaf rightmost page, and so even a RecentGlobalXmin style
390+
interlock isn't required. We cannot fail to detect that our hint was
391+
invalidated, because there can only be one such page in the B-Tree at
392+
any time. It's possible that the page will be deleted and recycled
393+
without a backend's cached page also being detected as invalidated, but
394+
only when we happen to recycle a block that once again gets recycled as the
395+
rightmost leaf page.
396+
378397
On-the-Fly Deletion Of Index Tuples
379398
-----------------------------------
380399

src/backend/access/nbtree/nbtinsert.c

+52-11
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@
2626
#include "storage/smgr.h"
2727
#include "utils/tqual.h"
2828

29+
/* Minimum tree height for application of fastpath optimization */
30+
#define BTREE_FASTPATH_MIN_LEVEL 2
2931

3032
typedef struct
3133
{
@@ -125,7 +127,7 @@ _bt_doinsert(Relation rel, IndexTuple itup,
125127
/*
126128
* It's very common to have an index on an auto-incremented or
127129
* monotonically increasing value. In such cases, every insertion happens
128-
* towards the end of the index. We try to optimise that case by caching
130+
* towards the end of the index. We try to optimize that case by caching
129131
* the right-most leaf of the index. If our cached block is still the
130132
* rightmost leaf, has enough free space to accommodate a new entry and
131133
* the insertion key is strictly greater than the first key in this page,
@@ -176,13 +178,17 @@ _bt_doinsert(Relation rel, IndexTuple itup,
176178
* the first key on the page.
177179
*/
178180
if (P_ISLEAF(lpageop) && P_RIGHTMOST(lpageop) &&
179-
!P_INCOMPLETE_SPLIT(lpageop) &&
180181
!P_IGNORE(lpageop) &&
181182
(PageGetFreeSpace(page) > itemsz) &&
182183
PageGetMaxOffsetNumber(page) >= P_FIRSTDATAKEY(lpageop) &&
183184
_bt_compare(rel, indnkeyatts, itup_scankey, page,
184185
P_FIRSTDATAKEY(lpageop)) > 0)
185186
{
187+
/*
188+
* The right-most block should never have incomplete split. But
189+
* be paranoid and check for it anyway.
190+
*/
191+
Assert(!P_INCOMPLETE_SPLIT(lpageop));
186192
fastpath = true;
187193
}
188194
else
@@ -868,6 +874,24 @@ _bt_insertonpg(Relation rel,
868874
bool newitemonleft;
869875
Buffer rbuf;
870876

877+
/*
878+
* If we're here then a pagesplit is needed. We should never reach here
879+
* if we're using the fastpath since we should have checked for all the
880+
* required conditions, including the fact that this page has enough
881+
* freespace. Note that this routine can in theory deal with the
882+
* situation where a NULL stack pointer is passed (that's what would
883+
* happen if the fastpath is taken), like it does during crash
884+
* recovery. But that path is much slower, defeating the very purpose
885+
* of the optimization. The following assertion should protect us from
886+
* any future code changes that invalidate those assumptions.
887+
*
888+
* Note that whenever we fail to take the fastpath, we clear the
889+
* cached block. Checking for a valid cached block at this point is
890+
* enough to decide whether we're in a fastpath or not.
891+
*/
892+
Assert(!(P_ISLEAF(lpageop) &&
893+
BlockNumberIsValid(RelationGetTargetBlock(rel))));
894+
871895
/* Choose the split point */
872896
firstright = _bt_findsplitloc(rel, page,
873897
newitemoff, itemsz,
@@ -905,6 +929,7 @@ _bt_insertonpg(Relation rel,
905929
BTMetaPageData *metad = NULL;
906930
OffsetNumber itup_off;
907931
BlockNumber itup_blkno;
932+
BlockNumber cachedBlock = InvalidBlockNumber;
908933

909934
itup_off = newitemoff;
910935
itup_blkno = BufferGetBlockNumber(buf);
@@ -962,6 +987,15 @@ _bt_insertonpg(Relation rel,
962987
MarkBufferDirty(cbuf);
963988
}
964989

990+
/*
991+
* Cache the block information if we just inserted into the rightmost
992+
* leaf page of the index and it's not the root page. For very small
993+
* index where root is also the leaf, there is no point trying for any
994+
* optimization.
995+
*/
996+
if (P_RIGHTMOST(lpageop) && P_ISLEAF(lpageop) && !P_ISROOT(lpageop))
997+
cachedBlock = BufferGetBlockNumber(buf);
998+
965999
/* XLOG stuff */
9661000
if (RelationNeedsWAL(rel))
9671001
{
@@ -977,16 +1011,7 @@ _bt_insertonpg(Relation rel,
9771011
XLogRegisterData((char *) &xlrec, SizeOfBtreeInsert);
9781012

9791013
if (P_ISLEAF(lpageop))
980-
{
9811014
xlinfo = XLOG_BTREE_INSERT_LEAF;
982-
983-
/*
984-
* Cache the block information if we just inserted into the
985-
* rightmost leaf page of the index.
986-
*/
987-
if (P_RIGHTMOST(lpageop))
988-
RelationSetTargetBlock(rel, BufferGetBlockNumber(buf));
989-
}
9901015
else
9911016
{
9921017
/*
@@ -1048,6 +1073,22 @@ _bt_insertonpg(Relation rel,
10481073
if (BufferIsValid(cbuf))
10491074
_bt_relbuf(rel, cbuf);
10501075
_bt_relbuf(rel, buf);
1076+
1077+
/*
1078+
* If we decided to cache the insertion target block, then set it now.
1079+
* But before that, check for the height of the tree and don't go for
1080+
* the optimization for small indexes. We defer that check to this
1081+
* point to ensure that we don't call _bt_getrootheight while holding
1082+
* lock on any other block.
1083+
*
1084+
* We do this after dropping locks on all buffers. So the information
1085+
* about whether the insertion block is still the rightmost block or
1086+
* not may have changed in between. But we will deal with that during
1087+
* next insert operation. No special care is required while setting it.
1088+
*/
1089+
if (BlockNumberIsValid(cachedBlock) &&
1090+
_bt_getrootheight(rel) >= BTREE_FASTPATH_MIN_LEVEL)
1091+
RelationSetTargetBlock(rel, cachedBlock);
10511092
}
10521093
}
10531094

0 commit comments

Comments
 (0)