Skip to content

Commit 28d3c2d

Browse files
committed
Fix another bug in parent page splitting during GiST index build.
Yet another bug in the ilk of commits a7ee7c8 and 741b884. In 741b884, we took care to clear the memorized location of the downlink when we split the parent page, because splitting the parent page can move the downlink. But we missed that even *updating* a tuple on the parent can move it, because updating a tuple on a gist page is implemented as a delete+insert, so the updated tuple gets moved to the end of the page. This commit fixes the bug in two different ways (belt and suspenders): 1. Clear the downlink when we update a tuple on the parent page, even if it's not split. This the same approach as in commits a7ee7c8 and 741b884. I also noticed that gistFindCorrectParent did not clear the 'downlinkoffnum' when it stepped to the right sibling. Fix that too, as it seems like a clear bug even though I haven't been able to find a test case to hit that. 2. Change gistFindCorrectParent so that it treats 'downlinkoffnum' merely as a hint. It now always first checks if the downlink is still at that location, and if not, it scans the page like before. That's more robust if there are still more cases where we fail to clear 'downlinkoffnum' that we haven't yet uncovered. With this, it's no longer necessary to meticulously clear 'downlinkoffnum', so this makes the previous fixes unnecessary, but I didn't revert them because it still seems nice to clear it when we know that the downlink has moved. Also add the test case using the same test data that Alexander posted. I tried to reduce it to a smaller test, and I also tried to reproduce this with different test data, but I was not able to, so let's just include what we have. Backpatch to v12, like the previous fixes. Reported-by: Alexander Lakhin Discussion: https://www.postgresql.org/message-id/18129-caca016eaf0c3702@postgresql.org
1 parent 64b7876 commit 28d3c2d

File tree

3 files changed

+226
-80
lines changed

3 files changed

+226
-80
lines changed

contrib/intarray/expected/_int.out

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -876,4 +876,95 @@ SELECT count(*) from test__int WHERE a @@ '!20 & !21';
876876
6343
877877
(1 row)
878878

879+
DROP INDEX text_idx;
880+
-- Repeat the same queries with an extended data set. The data set is the
881+
-- same that we used before, except that each element in the array is
882+
-- repeated three times, offset by 1000 and 2000. For example, {1, 5}
883+
-- becomes {1, 1001, 2001, 5, 1005, 2005}.
884+
--
885+
-- That has proven to be unreasonably effective at exercising codepaths in
886+
-- core GiST code related to splitting parent pages, which is not covered by
887+
-- other tests. This is a bit out-of-place as the point is to test core GiST
888+
-- code rather than this extension, but there is no suitable GiST opclass in
889+
-- core that would reach the same codepaths.
890+
CREATE TABLE more__int AS SELECT
891+
-- Leave alone NULLs, empty arrays and the one row that we use to test
892+
-- equality
893+
CASE WHEN a IS NULL OR a = '{}' OR a = '{73,23,20}' THEN a ELSE
894+
(select array_agg(u) || array_agg(u + 1000) || array_agg(u + 2000) from (select unnest(a) u) x)
895+
END AS a, a as b
896+
FROM test__int;
897+
CREATE INDEX ON more__int using gist (a gist__int_ops(numranges = 252));
898+
SELECT count(*) from more__int WHERE a && '{23,50}';
899+
count
900+
-------
901+
403
902+
(1 row)
903+
904+
SELECT count(*) from more__int WHERE a @@ '23|50';
905+
count
906+
-------
907+
403
908+
(1 row)
909+
910+
SELECT count(*) from more__int WHERE a @> '{23,50}';
911+
count
912+
-------
913+
12
914+
(1 row)
915+
916+
SELECT count(*) from more__int WHERE a @@ '23&50';
917+
count
918+
-------
919+
12
920+
(1 row)
921+
922+
SELECT count(*) from more__int WHERE a @> '{20,23}';
923+
count
924+
-------
925+
12
926+
(1 row)
927+
928+
SELECT count(*) from more__int WHERE a <@ '{73,23,20}';
929+
count
930+
-------
931+
10
932+
(1 row)
933+
934+
SELECT count(*) from more__int WHERE a = '{73,23,20}';
935+
count
936+
-------
937+
1
938+
(1 row)
939+
940+
SELECT count(*) from more__int WHERE a @@ '50&68';
941+
count
942+
-------
943+
9
944+
(1 row)
945+
946+
SELECT count(*) from more__int WHERE a @> '{20,23}' or a @> '{50,68}';
947+
count
948+
-------
949+
21
950+
(1 row)
951+
952+
SELECT count(*) from more__int WHERE a @@ '(20&23)|(50&68)';
953+
count
954+
-------
955+
21
956+
(1 row)
957+
958+
SELECT count(*) from more__int WHERE a @@ '20 | !21';
959+
count
960+
-------
961+
6566
962+
(1 row)
963+
964+
SELECT count(*) from more__int WHERE a @@ '!20 & !21';
965+
count
966+
-------
967+
6343
968+
(1 row)
969+
879970
RESET enable_seqscan;

contrib/intarray/sql/_int.sql

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,4 +195,39 @@ SELECT count(*) from test__int WHERE a @@ '(20&23)|(50&68)';
195195
SELECT count(*) from test__int WHERE a @@ '20 | !21';
196196
SELECT count(*) from test__int WHERE a @@ '!20 & !21';
197197

198+
DROP INDEX text_idx;
199+
200+
-- Repeat the same queries with an extended data set. The data set is the
201+
-- same that we used before, except that each element in the array is
202+
-- repeated three times, offset by 1000 and 2000. For example, {1, 5}
203+
-- becomes {1, 1001, 2001, 5, 1005, 2005}.
204+
--
205+
-- That has proven to be unreasonably effective at exercising codepaths in
206+
-- core GiST code related to splitting parent pages, which is not covered by
207+
-- other tests. This is a bit out-of-place as the point is to test core GiST
208+
-- code rather than this extension, but there is no suitable GiST opclass in
209+
-- core that would reach the same codepaths.
210+
CREATE TABLE more__int AS SELECT
211+
-- Leave alone NULLs, empty arrays and the one row that we use to test
212+
-- equality
213+
CASE WHEN a IS NULL OR a = '{}' OR a = '{73,23,20}' THEN a ELSE
214+
(select array_agg(u) || array_agg(u + 1000) || array_agg(u + 2000) from (select unnest(a) u) x)
215+
END AS a, a as b
216+
FROM test__int;
217+
CREATE INDEX ON more__int using gist (a gist__int_ops(numranges = 252));
218+
219+
SELECT count(*) from more__int WHERE a && '{23,50}';
220+
SELECT count(*) from more__int WHERE a @@ '23|50';
221+
SELECT count(*) from more__int WHERE a @> '{23,50}';
222+
SELECT count(*) from more__int WHERE a @@ '23&50';
223+
SELECT count(*) from more__int WHERE a @> '{20,23}';
224+
SELECT count(*) from more__int WHERE a <@ '{73,23,20}';
225+
SELECT count(*) from more__int WHERE a = '{73,23,20}';
226+
SELECT count(*) from more__int WHERE a @@ '50&68';
227+
SELECT count(*) from more__int WHERE a @> '{20,23}' or a @> '{50,68}';
228+
SELECT count(*) from more__int WHERE a @@ '(20&23)|(50&68)';
229+
SELECT count(*) from more__int WHERE a @@ '20 | !21';
230+
SELECT count(*) from more__int WHERE a @@ '!20 & !21';
231+
232+
198233
RESET enable_seqscan;

src/backend/access/gist/gist.c

Lines changed: 100 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -1018,95 +1018,114 @@ gistFindPath(Relation r, BlockNumber child, OffsetNumber *downlinkoffnum)
10181018
* remain so at exit, but it might not be the same page anymore.
10191019
*/
10201020
static void
1021-
gistFindCorrectParent(Relation r, GISTInsertStack *child)
1021+
gistFindCorrectParent(Relation r, GISTInsertStack *child, bool is_build)
10221022
{
10231023
GISTInsertStack *parent = child->parent;
1024+
ItemId iid;
1025+
IndexTuple idxtuple;
1026+
OffsetNumber maxoff;
1027+
GISTInsertStack *ptr;
10241028

10251029
gistcheckpage(r, parent->buffer);
10261030
parent->page = (Page) BufferGetPage(parent->buffer);
1031+
maxoff = PageGetMaxOffsetNumber(parent->page);
10271032

1028-
/* here we don't need to distinguish between split and page update */
1029-
if (child->downlinkoffnum == InvalidOffsetNumber ||
1030-
parent->lsn != PageGetLSN(parent->page))
1033+
/* Check if the downlink is still where it was before */
1034+
if (child->downlinkoffnum != InvalidOffsetNumber && child->downlinkoffnum <= maxoff)
10311035
{
1032-
/* parent is changed, look child in right links until found */
1033-
OffsetNumber i,
1034-
maxoff;
1035-
ItemId iid;
1036-
IndexTuple idxtuple;
1037-
GISTInsertStack *ptr;
1036+
iid = PageGetItemId(parent->page, child->downlinkoffnum);
1037+
idxtuple = (IndexTuple) PageGetItem(parent->page, iid);
1038+
if (ItemPointerGetBlockNumber(&(idxtuple->t_tid)) == child->blkno)
1039+
return; /* still there */
1040+
}
10381041

1039-
while (true)
1040-
{
1041-
maxoff = PageGetMaxOffsetNumber(parent->page);
1042-
for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
1043-
{
1044-
iid = PageGetItemId(parent->page, i);
1045-
idxtuple = (IndexTuple) PageGetItem(parent->page, iid);
1046-
if (ItemPointerGetBlockNumber(&(idxtuple->t_tid)) == child->blkno)
1047-
{
1048-
/* yes!!, found */
1049-
child->downlinkoffnum = i;
1050-
return;
1051-
}
1052-
}
1042+
/*
1043+
* The page has changed since we looked. During normal operation, every
1044+
* update of a page changes its LSN, so the LSN we memorized should have
1045+
* changed too. During index build, however, we don't WAL-log the changes
1046+
* until we have built the index, so the LSN doesn't change. There is no
1047+
* concurrent activity during index build, but we might have changed the
1048+
* parent ourselves.
1049+
*/
1050+
Assert(parent->lsn != PageGetLSN(parent->page) || is_build);
1051+
1052+
/*
1053+
* Scan the page to re-find the downlink. If the page was split, it might
1054+
* have moved to a different page, so follow the right links until we find
1055+
* it.
1056+
*/
1057+
while (true)
1058+
{
1059+
OffsetNumber i;
10531060

1054-
parent->blkno = GistPageGetOpaque(parent->page)->rightlink;
1055-
UnlockReleaseBuffer(parent->buffer);
1056-
if (parent->blkno == InvalidBlockNumber)
1061+
maxoff = PageGetMaxOffsetNumber(parent->page);
1062+
for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
1063+
{
1064+
iid = PageGetItemId(parent->page, i);
1065+
idxtuple = (IndexTuple) PageGetItem(parent->page, iid);
1066+
if (ItemPointerGetBlockNumber(&(idxtuple->t_tid)) == child->blkno)
10571067
{
1058-
/*
1059-
* End of chain and still didn't find parent. It's a very-very
1060-
* rare situation when root splitted.
1061-
*/
1062-
break;
1068+
/* yes!!, found */
1069+
child->downlinkoffnum = i;
1070+
return;
10631071
}
1064-
parent->buffer = ReadBuffer(r, parent->blkno);
1065-
LockBuffer(parent->buffer, GIST_EXCLUSIVE);
1066-
gistcheckpage(r, parent->buffer);
1067-
parent->page = (Page) BufferGetPage(parent->buffer);
10681072
}
10691073

1070-
/*
1071-
* awful!!, we need search tree to find parent ... , but before we
1072-
* should release all old parent
1073-
*/
1074-
1075-
ptr = child->parent->parent; /* child->parent already released
1076-
* above */
1077-
while (ptr)
1074+
parent->blkno = GistPageGetOpaque(parent->page)->rightlink;
1075+
parent->downlinkoffnum = InvalidOffsetNumber;
1076+
UnlockReleaseBuffer(parent->buffer);
1077+
if (parent->blkno == InvalidBlockNumber)
10781078
{
1079-
ReleaseBuffer(ptr->buffer);
1080-
ptr = ptr->parent;
1079+
/*
1080+
* End of chain and still didn't find parent. It's a very-very
1081+
* rare situation when root splitted.
1082+
*/
1083+
break;
10811084
}
1085+
parent->buffer = ReadBuffer(r, parent->blkno);
1086+
LockBuffer(parent->buffer, GIST_EXCLUSIVE);
1087+
gistcheckpage(r, parent->buffer);
1088+
parent->page = (Page) BufferGetPage(parent->buffer);
1089+
}
10821090

1083-
/* ok, find new path */
1084-
ptr = parent = gistFindPath(r, child->blkno, &child->downlinkoffnum);
1091+
/*
1092+
* awful!!, we need search tree to find parent ... , but before we should
1093+
* release all old parent
1094+
*/
10851095

1086-
/* read all buffers as expected by caller */
1087-
/* note we don't lock them or gistcheckpage them here! */
1088-
while (ptr)
1089-
{
1090-
ptr->buffer = ReadBuffer(r, ptr->blkno);
1091-
ptr->page = (Page) BufferGetPage(ptr->buffer);
1092-
ptr = ptr->parent;
1093-
}
1096+
ptr = child->parent->parent; /* child->parent already released above */
1097+
while (ptr)
1098+
{
1099+
ReleaseBuffer(ptr->buffer);
1100+
ptr = ptr->parent;
1101+
}
10941102

1095-
/* install new chain of parents to stack */
1096-
child->parent = parent;
1103+
/* ok, find new path */
1104+
ptr = parent = gistFindPath(r, child->blkno, &child->downlinkoffnum);
10971105

1098-
/* make recursive call to normal processing */
1099-
LockBuffer(child->parent->buffer, GIST_EXCLUSIVE);
1100-
gistFindCorrectParent(r, child);
1106+
/* read all buffers as expected by caller */
1107+
/* note we don't lock them or gistcheckpage them here! */
1108+
while (ptr)
1109+
{
1110+
ptr->buffer = ReadBuffer(r, ptr->blkno);
1111+
ptr->page = (Page) BufferGetPage(ptr->buffer);
1112+
ptr = ptr->parent;
11011113
}
1114+
1115+
/* install new chain of parents to stack */
1116+
child->parent = parent;
1117+
1118+
/* make recursive call to normal processing */
1119+
LockBuffer(child->parent->buffer, GIST_EXCLUSIVE);
1120+
gistFindCorrectParent(r, child, is_build);
11021121
}
11031122

11041123
/*
11051124
* Form a downlink pointer for the page in 'buf'.
11061125
*/
11071126
static IndexTuple
11081127
gistformdownlink(Relation rel, Buffer buf, GISTSTATE *giststate,
1109-
GISTInsertStack *stack)
1128+
GISTInsertStack *stack, bool is_build)
11101129
{
11111130
Page page = BufferGetPage(buf);
11121131
OffsetNumber maxoff;
@@ -1147,7 +1166,7 @@ gistformdownlink(Relation rel, Buffer buf, GISTSTATE *giststate,
11471166
ItemId iid;
11481167

11491168
LockBuffer(stack->parent->buffer, GIST_EXCLUSIVE);
1150-
gistFindCorrectParent(rel, stack);
1169+
gistFindCorrectParent(rel, stack, is_build);
11511170
iid = PageGetItemId(stack->parent->page, stack->downlinkoffnum);
11521171
downlink = (IndexTuple) PageGetItem(stack->parent->page, iid);
11531172
downlink = CopyIndexTuple(downlink);
@@ -1193,7 +1212,7 @@ gistfixsplit(GISTInsertState *state, GISTSTATE *giststate)
11931212
page = BufferGetPage(buf);
11941213

11951214
/* Form the new downlink tuples to insert to parent */
1196-
downlink = gistformdownlink(state->r, buf, giststate, stack);
1215+
downlink = gistformdownlink(state->r, buf, giststate, stack, state->is_build);
11971216

11981217
si->buf = buf;
11991218
si->downlink = downlink;
@@ -1347,7 +1366,7 @@ gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
13471366
right = (GISTPageSplitInfo *) list_nth(splitinfo, pos);
13481367
left = (GISTPageSplitInfo *) list_nth(splitinfo, pos - 1);
13491368

1350-
gistFindCorrectParent(state->r, stack);
1369+
gistFindCorrectParent(state->r, stack, state->is_build);
13511370
if (gistinserttuples(state, stack->parent, giststate,
13521371
&right->downlink, 1,
13531372
InvalidOffsetNumber,
@@ -1372,21 +1391,22 @@ gistfinishsplit(GISTInsertState *state, GISTInsertStack *stack,
13721391
*/
13731392
tuples[0] = left->downlink;
13741393
tuples[1] = right->downlink;
1375-
gistFindCorrectParent(state->r, stack);
1376-
if (gistinserttuples(state, stack->parent, giststate,
1377-
tuples, 2,
1378-
stack->downlinkoffnum,
1379-
left->buf, right->buf,
1380-
true, /* Unlock parent */
1381-
unlockbuf /* Unlock stack->buffer if caller wants
1382-
* that */
1383-
))
1384-
{
1385-
/*
1386-
* If the parent page was split, the downlink might have moved.
1387-
*/
1388-
stack->downlinkoffnum = InvalidOffsetNumber;
1389-
}
1394+
gistFindCorrectParent(state->r, stack, state->is_build);
1395+
(void) gistinserttuples(state, stack->parent, giststate,
1396+
tuples, 2,
1397+
stack->downlinkoffnum,
1398+
left->buf, right->buf,
1399+
true, /* Unlock parent */
1400+
unlockbuf /* Unlock stack->buffer if caller
1401+
* wants that */
1402+
);
1403+
1404+
/*
1405+
* The downlink might have moved when we updated it. Even if the page
1406+
* wasn't split, because gistinserttuples() implements updating the old
1407+
* tuple by removing and re-inserting it!
1408+
*/
1409+
stack->downlinkoffnum = InvalidOffsetNumber;
13901410

13911411
Assert(left->buf == stack->buffer);
13921412

0 commit comments

Comments
 (0)