Skip to content

Commit 639f5ad

Browse files
committed
Further cleanup of gistsplit.c.
After further reflection I was unconvinced that the existing coding is guaranteed to return valid union datums in every code path for multi-column indexes. Fix that by forcing a gistunionsubkey() call at the end of the recursion. Having done that, we can remove some clearly-redundant calls elsewhere. This should be a little faster for multi-column indexes (since the previous coding would uselessly do such a call for each column while unwinding the recursion), as well as much harder to break. Also, simplify the handling of cases where one side or the other of a primary split contains only don't-care tuples. The previous coding used a very ugly hack in removeDontCares() that essentially forced one random tuple to be treated as non-don't-care, providing a random initial choice of seed datum for the secondary split. It seems unlikely that that method will give better-than-random splits. Instead, treat such a split as degenerate and just let the next column determine the split, the same way that we handle fully degenerate cases where the two sides produce identical union datums.
1 parent 782365a commit 639f5ad

File tree

1 file changed

+93
-64
lines changed

1 file changed

+93
-64
lines changed

src/backend/access/gist/gistsplit.c

Lines changed: 93 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -163,23 +163,16 @@ findDontCares(Relation r, GISTSTATE *giststate, GISTENTRY *valvec,
163163
* Remove tuples that are marked don't-cares from the tuple index array a[]
164164
* of length *len. This is applied separately to the spl_left and spl_right
165165
* arrays.
166-
*
167-
* Corner case: we do not wish to reduce the index array to zero length.
168-
* (If we did, then the union key for this side would be null, and having just
169-
* one of spl_ldatum_exists and spl_rdatum_exists be TRUE might confuse
170-
* user-defined PickSplit methods.) To avoid that, we'll forcibly redefine
171-
* one tuple as non-don't-care if necessary. Hence, we must be able to adjust
172-
* caller's NumDontCare count.
173166
*/
174167
static void
175-
removeDontCares(OffsetNumber *a, int *len, bool *dontcare, int *NumDontCare)
168+
removeDontCares(OffsetNumber *a, int *len, const bool *dontcare)
176169
{
177170
int origlen,
178-
curlen,
171+
newlen,
179172
i;
180173
OffsetNumber *curwpos;
181174

182-
origlen = curlen = *len;
175+
origlen = newlen = *len;
183176
curwpos = a;
184177
for (i = 0; i < origlen; i++)
185178
{
@@ -191,18 +184,11 @@ removeDontCares(OffsetNumber *a, int *len, bool *dontcare, int *NumDontCare)
191184
*curwpos = ai;
192185
curwpos++;
193186
}
194-
else if (curlen == 1)
195-
{
196-
/* corner case: don't let array become empty */
197-
dontcare[ai] = FALSE; /* mark item as non-dont-care */
198-
*NumDontCare -= 1;
199-
i--; /* reprocess item on next iteration */
200-
}
201187
else
202-
curlen--;
188+
newlen--;
203189
}
204190

205-
*len = curlen;
191+
*len = newlen;
206192
}
207193

208194
/*
@@ -305,8 +291,14 @@ supportSecondarySplit(Relation r, GISTSTATE *giststate, int attno,
305291
penalty2;
306292

307293
/*
308-
* there is only one previously defined union, so we just choose swap
309-
* or not by lowest penalty
294+
* There is only one previously defined union, so we just choose swap
295+
* or not by lowest penalty for that side. We can only get here if a
296+
* secondary split happened to have all NULLs in its column in the
297+
* tuples that the outer recursion level had assigned to one side.
298+
* (Note that the null checks in gistSplitByKey don't prevent the
299+
* case, because they'll only be checking tuples that were considered
300+
* don't-cares at the outer recursion level, not the tuples that went
301+
* into determining the passed-down left and right union keys.)
310302
*/
311303
penalty1 = gistpenalty(giststate, attno, entry1, false, &entrySL, false);
312304
penalty2 = gistpenalty(giststate, attno, entry1, false, &entrySR, false);
@@ -404,13 +396,19 @@ genericPickSplit(GISTSTATE *giststate, GistEntryVector *entryvec, GIST_SPLITVEC
404396
* two vectors.
405397
*
406398
* Returns FALSE if split is complete (there are no more index columns, or
407-
* there is no need to consider them). Note that in this case the union
408-
* keys for all columns must be computed here.
409-
* Returns TRUE and v->spl_dontcare = NULL if left and right unions of attno
410-
* column are the same, so we should split on next column instead.
399+
* there is no need to consider them because split is optimal already).
400+
*
401+
* Returns TRUE and v->spl_dontcare = NULL if the picksplit result is
402+
* degenerate (all tuples seem to be don't-cares), so we should just
403+
* disregard this column and split on the next column(s) instead.
404+
*
411405
* Returns TRUE and v->spl_dontcare != NULL if there are don't-care tuples
412406
* that could be relocated based on the next column(s). The don't-care
413407
* tuples have been removed from the split and must be reinserted by caller.
408+
* There is at least one non-don't-care tuple on each side of the split,
409+
* and union keys for all columns are updated to include just those tuples.
410+
*
411+
* A TRUE result implies there is at least one more index column.
414412
*/
415413
static bool
416414
gistUserPicksplit(Relation r, GistEntryVector *entryvec, int attno, GistSplitVector *v,
@@ -481,49 +479,57 @@ gistUserPicksplit(Relation r, GistEntryVector *entryvec, int attno, GistSplitVec
481479

482480
/*
483481
* If index columns remain, then consider whether we can improve the split
484-
* by using them. Even if we can't, we must compute union keys for those
485-
* columns before we can return FALSE.
482+
* by using them.
486483
*/
487484
v->spl_dontcare = NULL;
488485

489486
if (attno + 1 < giststate->tupdesc->natts)
490487
{
491488
int NumDontCare;
492489

490+
/*
491+
* Make a quick check to see if left and right union keys are equal;
492+
* if so, the split is certainly degenerate, so tell caller to
493+
* re-split with the next column.
494+
*/
493495
if (gistKeyIsEQ(giststate, attno, sv->spl_ldatum, sv->spl_rdatum))
494-
{
495-
/*
496-
* Left and right union keys are equal, so we can get better split
497-
* by considering next column.
498-
*/
499496
return true;
500-
}
501497

502498
/*
503-
* Locate don't-care tuples, if any
499+
* Locate don't-care tuples, if any. If there are none, the split is
500+
* optimal, so just fall out and return false.
504501
*/
505502
v->spl_dontcare = (bool *) palloc0(sizeof(bool) * (entryvec->n + 1));
506503

507504
NumDontCare = findDontCares(r, giststate, entryvec->vector, v, attno);
508505

509-
if (NumDontCare == 0)
506+
if (NumDontCare > 0)
510507
{
511508
/*
512-
* There are no don't-cares, so just compute the union keys for
513-
* remaining columns and we're done.
509+
* Remove don't-cares from spl_left[] and spl_right[].
514510
*/
515-
gistunionsubkey(giststate, itup, v);
516-
}
517-
else
518-
{
511+
removeDontCares(sv->spl_left, &sv->spl_nleft, v->spl_dontcare);
512+
removeDontCares(sv->spl_right, &sv->spl_nright, v->spl_dontcare);
513+
519514
/*
520-
* Remove don't-cares from spl_left[] and spl_right[]. NOTE: this
521-
* could reduce NumDontCare to zero.
515+
* If all tuples on either side were don't-cares, the split is
516+
* degenerate, and we're best off to ignore it and split on the
517+
* next column. (We used to try to press on with a secondary
518+
* split by forcing a random tuple on each side to be treated as
519+
* non-don't-care, but it seems unlikely that that technique
520+
* really gives a better result. Note that we don't want to try a
521+
* secondary split with empty left or right primary split sides,
522+
* because then there is no union key on that side for the
523+
* PickSplit function to try to expand, so it can have no good
524+
* figure of merit for what it's doing. Also note that this check
525+
* ensures we can't produce a bogus one-side-only split in the
526+
* NumDontCare == 1 special case below.)
522527
*/
523-
removeDontCares(sv->spl_left, &sv->spl_nleft,
524-
v->spl_dontcare, &NumDontCare);
525-
removeDontCares(sv->spl_right, &sv->spl_nright,
526-
v->spl_dontcare, &NumDontCare);
528+
if (sv->spl_nleft == 0 || sv->spl_nright == 0)
529+
{
530+
v->spl_dontcare = NULL;
531+
return true;
532+
}
527533

528534
/*
529535
* Recompute union keys, considering only non-don't-care tuples.
@@ -539,7 +545,9 @@ gistUserPicksplit(Relation r, GistEntryVector *entryvec, int attno, GistSplitVec
539545
/*
540546
* If there's only one don't-care tuple then we can't do a
541547
* PickSplit on it, so just choose whether to send it left or
542-
* right by comparing penalties.
548+
* right by comparing penalties. We needed the
549+
* gistunionsubkey step anyway so that we have appropriate
550+
* union keys for figuring the penalties.
543551
*/
544552
OffsetNumber toMove;
545553

@@ -554,13 +562,14 @@ gistUserPicksplit(Relation r, GistEntryVector *entryvec, int attno, GistSplitVec
554562
/* ... and assign it to cheaper side */
555563
placeOne(r, giststate, v, itup[toMove - 1], toMove, attno + 1);
556564

557-
/* recompute the union keys including this tuple */
558-
v->spl_dontcare = NULL;
559-
gistunionsubkey(giststate, itup, v);
565+
/*
566+
* At this point the union keys are wrong, but we don't care
567+
* because we're done splitting. The outermost recursion
568+
* level of gistSplitByKey will fix things before returning.
569+
*/
560570
}
561-
else if (NumDontCare > 1)
571+
else
562572
return true;
563-
/* else NumDontCare is now zero; handle same as above */
564573
}
565574
}
566575

@@ -706,7 +715,7 @@ gistSplitByKey(Relation r, Page page, IndexTuple *itup, int len,
706715
*/
707716
v->spl_risnull[attno] = v->spl_lisnull[attno] = TRUE;
708717

709-
if (attno + 1 < r->rd_att->natts)
718+
if (attno + 1 < giststate->tupdesc->natts)
710719
gistSplitByKey(r, page, itup, len, giststate, v, attno + 1);
711720
else
712721
gistSplitHalf(&v->splitVector, len);
@@ -731,28 +740,31 @@ gistSplitByKey(Relation r, Page page, IndexTuple *itup, int len,
731740
else
732741
v->splitVector.spl_left[v->splitVector.spl_nleft++] = i;
733742

734-
/* Must compute union keys for this and any following columns */
735-
v->spl_dontcare = NULL;
736-
gistunionsubkey(giststate, itup, v);
743+
/* Compute union keys, unless outer recursion level will handle it */
744+
if (attno == 0 && giststate->tupdesc->natts == 1)
745+
{
746+
v->spl_dontcare = NULL;
747+
gistunionsubkey(giststate, itup, v);
748+
}
737749
}
738750
else
739751
{
740752
/*
741-
* all keys are not-null, so apply user-defined PickSplit method
753+
* All keys are not-null, so apply user-defined PickSplit method
742754
*/
743755
if (gistUserPicksplit(r, entryvec, attno, v, itup, len, giststate))
744756
{
745757
/*
746758
* Splitting on attno column is not optimal, so consider
747759
* redistributing don't-care tuples according to the next column
748760
*/
749-
Assert(attno + 1 < r->rd_att->natts);
761+
Assert(attno + 1 < giststate->tupdesc->natts);
750762

751763
if (v->spl_dontcare == NULL)
752764
{
753765
/*
754-
* Simple case: left and right keys for attno column are
755-
* equal, so just split according to the next column.
766+
* This split was actually degenerate, so ignore it altogether
767+
* and just split according to the next column.
756768
*/
757769
gistSplitByKey(r, page, itup, len, giststate, v, attno + 1);
758770
}
@@ -799,10 +811,27 @@ gistSplitByKey(Relation r, Page page, IndexTuple *itup, int len,
799811
backupSplit.spl_right[backupSplit.spl_nright++] = map[v->splitVector.spl_right[i] - 1];
800812

801813
v->splitVector = backupSplit;
802-
803-
/* recompute left and right union datums */
804-
gistunionsubkey(giststate, itup, v);
805814
}
806815
}
807816
}
817+
818+
/*
819+
* If we're handling a multicolumn index, at the end of the recursion
820+
* recompute the left and right union datums for all index columns. This
821+
* makes sure we hand back correct union datums in all corner cases,
822+
* including when we haven't processed all columns to start with, or when
823+
* a secondary split moved "don't care" tuples from one side to the other
824+
* (we really shouldn't assume that that didn't change the union datums).
825+
*
826+
* Note: when we're in an internal recursion (attno > 0), we do not worry
827+
* about whether the union datums we return with are sensible, since
828+
* calling levels won't care. Also, in a single-column index, we expect
829+
* that PickSplit (or the special cases above) produced correct union
830+
* datums.
831+
*/
832+
if (attno == 0 && giststate->tupdesc->natts > 1)
833+
{
834+
v->spl_dontcare = NULL;
835+
gistunionsubkey(giststate, itup, v);
836+
}
808837
}

0 commit comments

Comments
 (0)