Skip to content

Commit 54d2002

Browse files
committed
Fix some problems with selectivity estimation for partial indexes.
First, genericcostestimate() was being way too liberal about including partial-index conditions in its selectivity estimate, resulting in substantial underestimates for situations such as an indexqual "x = 42" used with an index on x "WHERE x >= 40 AND x < 50". While the code is intentionally set up to favor selecting partial indexes when available, this was too much... Second, choose_bitmap_and() was likewise easily fooled by cases of this type, since it would similarly think that the partial index had selectivity independent of the indexqual. Fixed by using predicate_implied_by() rather than simple equality checks to determine redundancy. This is a good deal more expensive but I don't see much alternative. At least the extra cost is only paid when there's actually a partial index under consideration. Per report from Jeff Davis. I'm not going to risk back-patching this, though.
1 parent 2b49e5d commit 54d2002

File tree

4 files changed

+120
-60
lines changed

4 files changed

+120
-60
lines changed

src/backend/optimizer/path/indxpath.c

Lines changed: 75 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
*
1010
*
1111
* IDENTIFICATION
12-
* $PostgreSQL: pgsql/src/backend/optimizer/path/indxpath.c,v 1.217 2007/03/17 00:11:04 tgl Exp $
12+
* $PostgreSQL: pgsql/src/backend/optimizer/path/indxpath.c,v 1.218 2007/03/21 22:18:12 tgl Exp $
1313
*
1414
*-------------------------------------------------------------------------
1515
*/
@@ -57,8 +57,8 @@ static Path *choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel,
5757
static int bitmap_path_comparator(const void *a, const void *b);
5858
static Cost bitmap_and_cost_est(PlannerInfo *root, RelOptInfo *rel,
5959
List *paths, RelOptInfo *outer_rel);
60-
static List *pull_indexpath_quals(Path *bitmapqual);
61-
static bool lists_intersect_ptr(List *list1, List *list2);
60+
static void find_indexpath_quals(Path *bitmapqual, List **quals, List **preds);
61+
static bool lists_intersect(List *list1, List *list2);
6262
static bool match_clause_to_indexcol(IndexOptInfo *index,
6363
int indexcol, Oid opfamily,
6464
RestrictInfo *rinfo,
@@ -562,6 +562,7 @@ choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel,
562562
Path **patharray;
563563
Cost costsofar;
564564
List *qualsofar;
565+
List *firstpred;
565566
ListCell *lastcell;
566567
int i;
567568
ListCell *l;
@@ -586,8 +587,7 @@ choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel,
586587
* consider an index redundant if any of its index conditions were already
587588
* used by earlier indexes. (We could use predicate_implied_by to have a
588589
* more intelligent, but much more expensive, check --- but in most cases
589-
* simple pointer equality should suffice, since after all the index
590-
* conditions are all coming from the same RestrictInfo lists.)
590+
* simple equality should suffice.)
591591
*
592592
* You might think the condition for redundancy should be "all index
593593
* conditions already used", not "any", but this turns out to be wrong.
@@ -597,10 +597,27 @@ choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel,
597597
* non-selective. In any case, we'd surely be drastically misestimating
598598
* the selectivity if we count the same condition twice.
599599
*
600-
* We include index predicate conditions in the redundancy test. Because
601-
* the test is just for pointer equality and not equal(), the effect is
602-
* that use of the same partial index in two different AND elements is
603-
* considered redundant. (XXX is this too strong?)
600+
* We must also consider index predicate conditions in checking for
601+
* redundancy, because the estimated selectivity of a partial index
602+
* includes its predicate even if the explicit index conditions don't.
603+
* Here we have to work harder than just checking expression equality:
604+
* we check to see if any of the predicate clauses are implied by
605+
* index conditions or predicate clauses of previous paths. This covers
606+
* cases such as a condition "x = 42" used with a plain index, followed
607+
* by a clauseless scan of a partial index "WHERE x >= 40 AND x < 50".
608+
* Also, we reject indexes that have a qual condition matching any
609+
* previously-used index's predicate (by including predicate conditions
610+
* into qualsofar). It should be sufficient to check equality in this
611+
* case, not implication, since we've sorted the paths by selectivity
612+
* and so tighter conditions are seen first --- only for exactly equal
613+
* cases might the partial index come first.
614+
*
615+
* XXX the reason we need all these redundancy checks is that costsize.c
616+
* and clausesel.c aren't very smart about redundant clauses: they will
617+
* usually double-count the redundant clauses, producing a too-small
618+
* selectivity that makes a redundant AND look like it reduces the total
619+
* cost. Perhaps someday that code will be smarter and we can remove
620+
* these heuristics.
604621
*
605622
* Note: outputting the selected sub-paths in selectivity order is a good
606623
* thing even if we weren't using that as part of the selection method,
@@ -619,26 +636,46 @@ choose_bitmap_and(PlannerInfo *root, RelOptInfo *rel,
619636

620637
paths = list_make1(patharray[0]);
621638
costsofar = bitmap_and_cost_est(root, rel, paths, outer_rel);
622-
qualsofar = pull_indexpath_quals(patharray[0]);
639+
find_indexpath_quals(patharray[0], &qualsofar, &firstpred);
640+
qualsofar = list_concat(qualsofar, firstpred);
623641
lastcell = list_head(paths); /* for quick deletions */
624642

625643
for (i = 1; i < npaths; i++)
626644
{
627645
Path *newpath = patharray[i];
628646
List *newqual;
647+
List *newpred;
629648
Cost newcost;
630649

631-
newqual = pull_indexpath_quals(newpath);
632-
if (lists_intersect_ptr(newqual, qualsofar))
650+
find_indexpath_quals(newpath, &newqual, &newpred);
651+
if (lists_intersect(newqual, qualsofar))
633652
continue; /* consider it redundant */
653+
if (newpred)
654+
{
655+
bool redundant = false;
656+
657+
/* we check each predicate clause separately */
658+
foreach(l, newpred)
659+
{
660+
Node *np = (Node *) lfirst(l);
661+
662+
if (predicate_implied_by(list_make1(np), qualsofar))
663+
{
664+
redundant = true;
665+
break; /* out of inner loop */
666+
}
667+
}
668+
if (redundant)
669+
continue;
670+
}
634671
/* tentatively add newpath to paths, so we can estimate cost */
635672
paths = lappend(paths, newpath);
636673
newcost = bitmap_and_cost_est(root, rel, paths, outer_rel);
637674
if (newcost < costsofar)
638675
{
639676
/* keep newpath in paths, update subsidiary variables */
640677
costsofar = newcost;
641-
qualsofar = list_concat(qualsofar, newqual);
678+
qualsofar = list_concat(list_concat(qualsofar, newqual), newpred);
642679
lastcell = lnext(lastcell);
643680
}
644681
else
@@ -710,35 +747,39 @@ bitmap_and_cost_est(PlannerInfo *root, RelOptInfo *rel,
710747
}
711748

712749
/*
713-
* pull_indexpath_quals
750+
* find_indexpath_quals
714751
*
715-
* Given the Path structure for a plain or bitmap indexscan, extract a list
752+
* Given the Path structure for a plain or bitmap indexscan, extract lists
716753
* of all the indexquals and index predicate conditions used in the Path.
717754
*
718755
* This is sort of a simplified version of make_restrictinfo_from_bitmapqual;
719756
* here, we are not trying to produce an accurate representation of the AND/OR
720757
* semantics of the Path, but just find out all the base conditions used.
721758
*
722-
* The result list contains pointers to the expressions used in the Path,
759+
* The result lists contain pointers to the expressions used in the Path,
723760
* but all the list cells are freshly built, so it's safe to destructively
724-
* modify the list (eg, by concat'ing it with other lists).
761+
* modify the lists (eg, by concat'ing with other lists).
725762
*/
726-
static List *
727-
pull_indexpath_quals(Path *bitmapqual)
763+
static void
764+
find_indexpath_quals(Path *bitmapqual, List **quals, List **preds)
728765
{
729-
List *result = NIL;
730766
ListCell *l;
731767

768+
*quals = NIL;
769+
*preds = NIL;
770+
732771
if (IsA(bitmapqual, BitmapAndPath))
733772
{
734773
BitmapAndPath *apath = (BitmapAndPath *) bitmapqual;
735774

736775
foreach(l, apath->bitmapquals)
737776
{
738-
List *sublist;
777+
List *subquals;
778+
List *subpreds;
739779

740-
sublist = pull_indexpath_quals((Path *) lfirst(l));
741-
result = list_concat(result, sublist);
780+
find_indexpath_quals((Path *) lfirst(l), &subquals, &subpreds);
781+
*quals = list_concat(*quals, subquals);
782+
*preds = list_concat(*preds, subpreds);
742783
}
743784
}
744785
else if (IsA(bitmapqual, BitmapOrPath))
@@ -747,36 +788,36 @@ pull_indexpath_quals(Path *bitmapqual)
747788

748789
foreach(l, opath->bitmapquals)
749790
{
750-
List *sublist;
791+
List *subquals;
792+
List *subpreds;
751793

752-
sublist = pull_indexpath_quals((Path *) lfirst(l));
753-
result = list_concat(result, sublist);
794+
find_indexpath_quals((Path *) lfirst(l), &subquals, &subpreds);
795+
*quals = list_concat(*quals, subquals);
796+
*preds = list_concat(*preds, subpreds);
754797
}
755798
}
756799
else if (IsA(bitmapqual, IndexPath))
757800
{
758801
IndexPath *ipath = (IndexPath *) bitmapqual;
759802

760-
result = get_actual_clauses(ipath->indexclauses);
761-
result = list_concat(result, list_copy(ipath->indexinfo->indpred));
803+
*quals = get_actual_clauses(ipath->indexclauses);
804+
*preds = list_copy(ipath->indexinfo->indpred);
762805
}
763806
else
764807
elog(ERROR, "unrecognized node type: %d", nodeTag(bitmapqual));
765-
766-
return result;
767808
}
768809

769810

770811
/*
771-
* lists_intersect_ptr
812+
* lists_intersect
772813
* Detect whether two lists have a nonempty intersection,
773-
* using pointer equality to compare members.
814+
* using equal() to compare members.
774815
*
775816
* This possibly should go into list.c, but it doesn't yet have any use
776817
* except in choose_bitmap_and.
777818
*/
778819
static bool
779-
lists_intersect_ptr(List *list1, List *list2)
820+
lists_intersect(List *list1, List *list2)
780821
{
781822
ListCell *cell1;
782823

@@ -787,7 +828,7 @@ lists_intersect_ptr(List *list1, List *list2)
787828

788829
foreach(cell2, list2)
789830
{
790-
if (lfirst(cell2) == datum1)
831+
if (equal(lfirst(cell2), datum1))
791832
return true;
792833
}
793834
}

src/backend/utils/adt/selfuncs.c

Lines changed: 27 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
*
1616
*
1717
* IDENTIFICATION
18-
* $PostgreSQL: pgsql/src/backend/utils/adt/selfuncs.c,v 1.229 2007/03/17 00:11:05 tgl Exp $
18+
* $PostgreSQL: pgsql/src/backend/utils/adt/selfuncs.c,v 1.230 2007/03/21 22:18:12 tgl Exp $
1919
*
2020
*-------------------------------------------------------------------------
2121
*/
@@ -86,6 +86,7 @@
8686
#include "optimizer/pathnode.h"
8787
#include "optimizer/paths.h"
8888
#include "optimizer/plancat.h"
89+
#include "optimizer/predtest.h"
8990
#include "optimizer/restrictinfo.h"
9091
#include "optimizer/var.h"
9192
#include "parser/parse_coerce.h"
@@ -4775,36 +4776,38 @@ genericcostestimate(PlannerInfo *root,
47754776
List *selectivityQuals;
47764777
ListCell *l;
47774778

4778-
/*
4779+
/*----------
47794780
* If the index is partial, AND the index predicate with the explicitly
47804781
* given indexquals to produce a more accurate idea of the index
4781-
* selectivity. This may produce redundant clauses. We get rid of exact
4782-
* duplicates in the code below. We expect that most cases of partial
4783-
* redundancy (such as "x < 4" from the qual and "x < 5" from the
4784-
* predicate) will be recognized and handled correctly by
4785-
* clauselist_selectivity(). This assumption is somewhat fragile, since
4786-
* it depends on predicate_implied_by() and clauselist_selectivity()
4787-
* having similar capabilities, and there are certainly many cases where
4788-
* we will end up with a too-low selectivity estimate. This will bias the
4789-
* system in favor of using partial indexes where possible, which is not
4790-
* necessarily a bad thing. But it'd be nice to do better someday.
4782+
* selectivity. However, we need to be careful not to insert redundant
4783+
* clauses, because clauselist_selectivity() is easily fooled into
4784+
* computing a too-low selectivity estimate. Our approach is to add
4785+
* only the index predicate clause(s) that cannot be proven to be implied
4786+
* by the given indexquals. This successfully handles cases such as a
4787+
* qual "x = 42" used with a partial index "WHERE x >= 40 AND x < 50".
4788+
* There are many other cases where we won't detect redundancy, leading
4789+
* to a too-low selectivity estimate, which will bias the system in favor
4790+
* of using partial indexes where possible. That is not necessarily bad
4791+
* though.
47914792
*
4792-
* Note that index->indpred and indexQuals are both in implicit-AND form,
4793-
* so ANDing them together just takes merging the lists. However,
4794-
* eliminating duplicates is a bit trickier because indexQuals contains
4795-
* RestrictInfo nodes and the indpred does not. It is okay to pass a
4796-
* mixed list to clauselist_selectivity, but we have to work a bit to
4797-
* generate a list without logical duplicates. (We could just list_union
4798-
* indpred and strippedQuals, but then we'd not get caching of per-qual
4799-
* selectivity estimates.)
4793+
* Note that indexQuals contains RestrictInfo nodes while the indpred
4794+
* does not. This is OK for both predicate_implied_by() and
4795+
* clauselist_selectivity().
4796+
*----------
48004797
*/
48014798
if (index->indpred != NIL)
48024799
{
4803-
List *strippedQuals;
4804-
List *predExtraQuals;
4800+
List *predExtraQuals = NIL;
4801+
4802+
foreach(l, index->indpred)
4803+
{
4804+
Node *predQual = (Node *) lfirst(l);
4805+
List *oneQual = list_make1(predQual);
48054806

4806-
strippedQuals = get_actual_clauses(indexQuals);
4807-
predExtraQuals = list_difference(index->indpred, strippedQuals);
4807+
if (!predicate_implied_by(oneQual, indexQuals))
4808+
predExtraQuals = list_concat(predExtraQuals, oneQual);
4809+
}
4810+
/* list_concat avoids modifying the passed-in indexQuals list */
48084811
selectivityQuals = list_concat(predExtraQuals, indexQuals);
48094812
}
48104813
else

src/test/regress/expected/select.out

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -209,9 +209,13 @@ SELECT onek.unique1, onek.string4 FROM onek
209209
-- test partial btree indexes
210210
--
211211
-- As of 7.2, planner probably won't pick an indexscan without stats,
212-
-- so ANALYZE first.
212+
-- so ANALYZE first. Also, we want to prevent it from picking a bitmapscan
213+
-- followed by sort, because that could hide index ordering problems.
213214
--
214215
ANALYZE onek2;
216+
SET enable_seqscan TO off;
217+
SET enable_bitmapscan TO off;
218+
SET enable_sort TO off;
215219
--
216220
-- awk '{if($1<10){print $0;}else{next;}}' onek.data | sort +0n -1
217221
--
@@ -288,6 +292,9 @@ SELECT onek2.unique1, onek2.stringu1 FROM onek2
288292
999 | LMAAAA
289293
(19 rows)
290294

295+
RESET enable_seqscan;
296+
RESET enable_bitmapscan;
297+
RESET enable_sort;
291298
SELECT two, stringu1, ten, string4
292299
INTO TABLE tmp
293300
FROM onek;

src/test/regress/sql/select.sql

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,10 +59,15 @@ SELECT onek.unique1, onek.string4 FROM onek
5959
-- test partial btree indexes
6060
--
6161
-- As of 7.2, planner probably won't pick an indexscan without stats,
62-
-- so ANALYZE first.
62+
-- so ANALYZE first. Also, we want to prevent it from picking a bitmapscan
63+
-- followed by sort, because that could hide index ordering problems.
6364
--
6465
ANALYZE onek2;
6566

67+
SET enable_seqscan TO off;
68+
SET enable_bitmapscan TO off;
69+
SET enable_sort TO off;
70+
6671
--
6772
-- awk '{if($1<10){print $0;}else{next;}}' onek.data | sort +0n -1
6873
--
@@ -81,6 +86,10 @@ SELECT onek2.unique1, onek2.stringu1 FROM onek2
8186
SELECT onek2.unique1, onek2.stringu1 FROM onek2
8287
WHERE onek2.unique1 > 980;
8388

89+
RESET enable_seqscan;
90+
RESET enable_bitmapscan;
91+
RESET enable_sort;
92+
8493

8594
SELECT two, stringu1, ten, string4
8695
INTO TABLE tmp

0 commit comments

Comments
 (0)