Skip to content

Commit ad1c36b

Browse files
committed
Fix foreign-key selectivity estimation in the presence of constants.
get_foreign_key_join_selectivity() looks for join clauses that equate the two sides of the FK constraint. However, if we have a query like "WHERE fktab.a = pktab.a and fktab.a = 1", it won't find any such join clause, because equivclass.c replaces the given clauses with "fktab.a = 1 and pktab.a = 1", which can be enforced at the scan level, leaving nothing to be done for column "a" at the join level. We can fix that expectation without much trouble, but then a new problem arises: applying the foreign-key-based selectivity rule produces a rowcount underestimate, because we're effectively double-counting the selectivity of the "fktab.a = 1" clause. So we have to cancel that selectivity out of the estimate. To fix, refactor process_implied_equality() so that it can pass back the new RestrictInfo to its callers in equivclass.c, allowing the generated "fktab.a = 1" clause to be saved in the EquivalenceClass's ec_derives list. Then it's not much trouble to dig out the relevant RestrictInfo when we need to adjust an FK selectivity estimate. (While at it, we can also remove the expensive use of initialize_mergeclause_eclasses() to set up the new RestrictInfo's left_ec and right_ec pointers. The equivclass.c code can set those basically for free.) This seems like clearly a bug fix, but I'm hesitant to back-patch it, first because there's some API/ABI risk for extensions and second because we're usually loath to destabilize plan choices in stable branches. Per report from Sigrid Ehrenreich. Discussion: https://postgr.es/m/1019549.1603770457@sss.pgh.pa.us Discussion: https://postgr.es/m/AM6PR02MB5287A0ADD936C1FA80973E72AB190@AM6PR02MB5287.eurprd02.prod.outlook.com
1 parent ce7f772 commit ad1c36b

File tree

10 files changed

+366
-103
lines changed

10 files changed

+366
-103
lines changed

src/backend/nodes/outfuncs.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2352,6 +2352,7 @@ _outForeignKeyOptInfo(StringInfo str, const ForeignKeyOptInfo *node)
23522352
WRITE_ATTRNUMBER_ARRAY(confkey, node->nkeys);
23532353
WRITE_OID_ARRAY(conpfeqop, node->nkeys);
23542354
WRITE_INT_FIELD(nmatched_ec);
2355+
WRITE_INT_FIELD(nconst_ec);
23552356
WRITE_INT_FIELD(nmatched_rcols);
23562357
WRITE_INT_FIELD(nmatched_ri);
23572358
/* for compactness, just print the number of matches per column: */

src/backend/optimizer/path/costsize.c

Lines changed: 52 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5066,9 +5066,16 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
50665066
* remove back into the worklist.
50675067
*
50685068
* Since the matching clauses are known not outerjoin-delayed, they
5069-
* should certainly have appeared in the initial joinclause list. If
5070-
* we didn't find them, they must have been matched to, and removed
5071-
* by, some other FK in a previous iteration of this loop. (A likely
5069+
* would normally have appeared in the initial joinclause list. If we
5070+
* didn't find them, there are two possibilities:
5071+
*
5072+
* 1. If the FK match is based on an EC that is ec_has_const, it won't
5073+
* have generated any join clauses at all. We discount such ECs while
5074+
* checking to see if we have "all" the clauses. (Below, we'll adjust
5075+
* the selectivity estimate for this case.)
5076+
*
5077+
* 2. The clauses were matched to some other FK in a previous
5078+
* iteration of this loop, and thus removed from worklist. (A likely
50725079
* case is that two FKs are matched to the same EC; there will be only
50735080
* one EC-derived clause in the initial list, so the first FK will
50745081
* consume it.) Applying both FKs' selectivity independently risks
@@ -5078,8 +5085,9 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
50785085
* Later we might think of a reasonable way to combine the estimates,
50795086
* but for now, just punt, since this is a fairly uncommon situation.
50805087
*/
5081-
if (list_length(removedlist) !=
5082-
(fkinfo->nmatched_ec + fkinfo->nmatched_ri))
5088+
if (removedlist == NIL ||
5089+
list_length(removedlist) !=
5090+
(fkinfo->nmatched_ec - fkinfo->nconst_ec + fkinfo->nmatched_ri))
50835091
{
50845092
worklist = list_concat(worklist, removedlist);
50855093
continue;
@@ -5138,9 +5146,48 @@ get_foreign_key_join_selectivity(PlannerInfo *root,
51385146

51395147
fkselec *= 1.0 / ref_tuples;
51405148
}
5149+
5150+
/*
5151+
* If any of the FK columns participated in ec_has_const ECs, then
5152+
* equivclass.c will have generated "var = const" restrictions for
5153+
* each side of the join, thus reducing the sizes of both input
5154+
* relations. Taking the fkselec at face value would amount to
5155+
* double-counting the selectivity of the constant restriction for the
5156+
* referencing Var. Hence, look for the restriction clause(s) that
5157+
* were applied to the referencing Var(s), and divide out their
5158+
* selectivity to correct for this.
5159+
*/
5160+
if (fkinfo->nconst_ec > 0)
5161+
{
5162+
for (int i = 0; i < fkinfo->nkeys; i++)
5163+
{
5164+
EquivalenceClass *ec = fkinfo->eclass[i];
5165+
5166+
if (ec && ec->ec_has_const)
5167+
{
5168+
EquivalenceMember *em = fkinfo->fk_eclass_member[i];
5169+
RestrictInfo *rinfo = find_derived_clause_for_ec_member(ec,
5170+
em);
5171+
5172+
if (rinfo)
5173+
{
5174+
Selectivity s0;
5175+
5176+
s0 = clause_selectivity(root,
5177+
(Node *) rinfo,
5178+
0,
5179+
jointype,
5180+
sjinfo);
5181+
if (s0 > 0)
5182+
fkselec /= s0;
5183+
}
5184+
}
5185+
}
5186+
}
51415187
}
51425188

51435189
*restrictlist = worklist;
5190+
CLAMP_PROBABILITY(fkselec);
51445191
return fkselec;
51455192
}
51465193

src/backend/optimizer/path/equivclass.c

Lines changed: 96 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -840,10 +840,8 @@ find_em_expr_for_rel(EquivalenceClass *ec, RelOptInfo *rel)
840840
* scanning of the quals and before Path construction begins.
841841
*
842842
* We make no attempt to avoid generating duplicate RestrictInfos here: we
843-
* don't search ec_sources for matches, nor put the created RestrictInfos
844-
* into ec_derives. Doing so would require some slightly ugly changes in
845-
* initsplan.c's API, and there's no real advantage, because the clauses
846-
* generated here can't duplicate anything we will generate for joins anyway.
843+
* don't search ec_sources or ec_derives for matches. It doesn't really
844+
* seem worth the trouble to do so.
847845
*/
848846
void
849847
generate_base_implied_equalities(PlannerInfo *root)
@@ -969,6 +967,7 @@ generate_base_implied_equalities_const(PlannerInfo *root,
969967
{
970968
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc);
971969
Oid eq_op;
970+
RestrictInfo *rinfo;
972971

973972
Assert(!cur_em->em_is_child); /* no children yet */
974973
if (cur_em == const_em)
@@ -982,14 +981,31 @@ generate_base_implied_equalities_const(PlannerInfo *root,
982981
ec->ec_broken = true;
983982
break;
984983
}
985-
process_implied_equality(root, eq_op, ec->ec_collation,
986-
cur_em->em_expr, const_em->em_expr,
987-
bms_copy(ec->ec_relids),
988-
bms_union(cur_em->em_nullable_relids,
989-
const_em->em_nullable_relids),
990-
ec->ec_min_security,
991-
ec->ec_below_outer_join,
992-
cur_em->em_is_const);
984+
rinfo = process_implied_equality(root, eq_op, ec->ec_collation,
985+
cur_em->em_expr, const_em->em_expr,
986+
bms_copy(ec->ec_relids),
987+
bms_union(cur_em->em_nullable_relids,
988+
const_em->em_nullable_relids),
989+
ec->ec_min_security,
990+
ec->ec_below_outer_join,
991+
cur_em->em_is_const);
992+
993+
/*
994+
* If the clause didn't degenerate to a constant, fill in the correct
995+
* markings for a mergejoinable clause, and save it in ec_derives. (We
996+
* will not re-use such clauses directly, but selectivity estimation
997+
* may consult the list later. Note that this use of ec_derives does
998+
* not overlap with its use for join clauses, since we never generate
999+
* join clauses from an ec_has_const eclass.)
1000+
*/
1001+
if (rinfo && rinfo->mergeopfamilies)
1002+
{
1003+
/* it's not redundant, so don't set parent_ec */
1004+
rinfo->left_ec = rinfo->right_ec = ec;
1005+
rinfo->left_em = cur_em;
1006+
rinfo->right_em = const_em;
1007+
ec->ec_derives = lappend(ec->ec_derives, rinfo);
1008+
}
9931009
}
9941010
}
9951011

@@ -1028,6 +1044,7 @@ generate_base_implied_equalities_no_const(PlannerInfo *root,
10281044
{
10291045
EquivalenceMember *prev_em = prev_ems[relid];
10301046
Oid eq_op;
1047+
RestrictInfo *rinfo;
10311048

10321049
eq_op = select_equality_operator(ec,
10331050
prev_em->em_datatype,
@@ -1038,14 +1055,29 @@ generate_base_implied_equalities_no_const(PlannerInfo *root,
10381055
ec->ec_broken = true;
10391056
break;
10401057
}
1041-
process_implied_equality(root, eq_op, ec->ec_collation,
1042-
prev_em->em_expr, cur_em->em_expr,
1043-
bms_copy(ec->ec_relids),
1044-
bms_union(prev_em->em_nullable_relids,
1045-
cur_em->em_nullable_relids),
1046-
ec->ec_min_security,
1047-
ec->ec_below_outer_join,
1048-
false);
1058+
rinfo = process_implied_equality(root, eq_op, ec->ec_collation,
1059+
prev_em->em_expr, cur_em->em_expr,
1060+
bms_copy(ec->ec_relids),
1061+
bms_union(prev_em->em_nullable_relids,
1062+
cur_em->em_nullable_relids),
1063+
ec->ec_min_security,
1064+
ec->ec_below_outer_join,
1065+
false);
1066+
1067+
/*
1068+
* If the clause didn't degenerate to a constant, fill in the
1069+
* correct markings for a mergejoinable clause. We don't put it
1070+
* in ec_derives however; we don't currently need to re-find such
1071+
* clauses, and we don't want to clutter that list with non-join
1072+
* clauses.
1073+
*/
1074+
if (rinfo && rinfo->mergeopfamilies)
1075+
{
1076+
/* it's not redundant, so don't set parent_ec */
1077+
rinfo->left_ec = rinfo->right_ec = ec;
1078+
rinfo->left_em = prev_em;
1079+
rinfo->right_em = cur_em;
1080+
}
10491081
}
10501082
prev_ems[relid] = cur_em;
10511083
}
@@ -2151,6 +2183,10 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
21512183
* we ignore that fine point here.) This is much like exprs_known_equal,
21522184
* except that we insist on the comparison operator matching the eclass, so
21532185
* that the result is definite not approximate.
2186+
*
2187+
* On success, we also set fkinfo->eclass[colno] to the matching eclass,
2188+
* and set fkinfo->fk_eclass_member[colno] to the eclass member for the
2189+
* referencing Var.
21542190
*/
21552191
EquivalenceClass *
21562192
match_eclasses_to_foreign_key_col(PlannerInfo *root,
@@ -2180,8 +2216,8 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
21802216
{
21812217
EquivalenceClass *ec = (EquivalenceClass *) list_nth(root->eq_classes,
21822218
i);
2183-
bool item1member = false;
2184-
bool item2member = false;
2219+
EquivalenceMember *item1_em = NULL;
2220+
EquivalenceMember *item2_em = NULL;
21852221
ListCell *lc2;
21862222

21872223
/* Never match to a volatile EC */
@@ -2206,12 +2242,12 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
22062242

22072243
/* Match? */
22082244
if (var->varno == var1varno && var->varattno == var1attno)
2209-
item1member = true;
2245+
item1_em = em;
22102246
else if (var->varno == var2varno && var->varattno == var2attno)
2211-
item2member = true;
2247+
item2_em = em;
22122248

22132249
/* Have we found both PK and FK column in this EC? */
2214-
if (item1member && item2member)
2250+
if (item1_em && item2_em)
22152251
{
22162252
/*
22172253
* Succeed if eqop matches EC's opfamilies. We could test
@@ -2221,7 +2257,11 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
22212257
if (opfamilies == NIL) /* compute if we didn't already */
22222258
opfamilies = get_mergejoin_opfamilies(eqop);
22232259
if (equal(opfamilies, ec->ec_opfamilies))
2260+
{
2261+
fkinfo->eclass[colno] = ec;
2262+
fkinfo->fk_eclass_member[colno] = item2_em;
22242263
return ec;
2264+
}
22252265
/* Otherwise, done with this EC, move on to the next */
22262266
break;
22272267
}
@@ -2230,6 +2270,37 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
22302270
return NULL;
22312271
}
22322272

2273+
/*
2274+
* find_derived_clause_for_ec_member
2275+
* Search for a previously-derived clause mentioning the given EM.
2276+
*
2277+
* The eclass should be an ec_has_const EC, of which the EM is a non-const
2278+
* member. This should ensure there is just one derived clause mentioning
2279+
* the EM (and equating it to a constant).
2280+
* Returns NULL if no such clause can be found.
2281+
*/
2282+
RestrictInfo *
2283+
find_derived_clause_for_ec_member(EquivalenceClass *ec,
2284+
EquivalenceMember *em)
2285+
{
2286+
ListCell *lc;
2287+
2288+
Assert(ec->ec_has_const);
2289+
Assert(!em->em_is_const);
2290+
foreach(lc, ec->ec_derives)
2291+
{
2292+
RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
2293+
2294+
/*
2295+
* generate_base_implied_equalities_const will have put non-const
2296+
* members on the left side of derived clauses.
2297+
*/
2298+
if (rinfo->left_em == em)
2299+
return rinfo;
2300+
}
2301+
return NULL;
2302+
}
2303+
22332304

22342305
/*
22352306
* add_child_rel_equivalences

0 commit comments

Comments
 (0)