Skip to content

Commit e3ffd05

Browse files
committed
Weaken the planner's tests for relevant joinclauses.
We should be willing to cross-join two small relations if that allows us to use an inner indexscan on a large relation (that is, the potential indexqual for the large table requires both smaller relations). This worked in simple cases but fell apart as soon as there was a join clause to a fourth relation, because the existence of any two-relation join clause caused the planner to not consider clauseless joins between other base relations. The added regression test shows an example case adapted from a recent complaint from Benoit Delbosc. Adjust have_relevant_joinclause, have_relevant_eclass_joinclause, and has_relevant_eclass_joinclause to consider that a join clause mentioning three or more relations is sufficient grounds for joining any subset of those relations, even if we have to do so via a cartesian join. Since such clauses are relatively uncommon, this shouldn't affect planning speed on typical queries; in fact it should help a bit, because the latter two functions in particular get significantly simpler. Although this is arguably a bug fix, I'm not going to risk back-patching it, since it might have currently-unforeseen consequences.
1 parent c0cc526 commit e3ffd05

File tree

5 files changed

+110
-93
lines changed

5 files changed

+110
-93
lines changed

src/backend/optimizer/path/equivclass.c

Lines changed: 19 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -2035,7 +2035,7 @@ get_parent_relid(PlannerInfo *root, RelOptInfo *rel)
20352035
/*
20362036
* have_relevant_eclass_joinclause
20372037
* Detect whether there is an EquivalenceClass that could produce
2038-
* a joinclause between the two given relations.
2038+
* a joinclause involving the two given relations.
20392039
*
20402040
* This is essentially a very cut-down version of
20412041
* generate_join_implied_equalities(). Note it's OK to occasionally say "yes"
@@ -2051,9 +2051,6 @@ have_relevant_eclass_joinclause(PlannerInfo *root,
20512051
foreach(lc1, root->eq_classes)
20522052
{
20532053
EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc1);
2054-
bool has_rel1;
2055-
bool has_rel2;
2056-
ListCell *lc2;
20572054

20582055
/*
20592056
* Won't generate joinclauses if single-member (this test covers the
@@ -2063,45 +2060,27 @@ have_relevant_eclass_joinclause(PlannerInfo *root,
20632060
continue;
20642061

20652062
/*
2063+
* We do not need to examine the individual members of the EC, because
2064+
* all that we care about is whether each rel overlaps the relids of
2065+
* at least one member, and a test on ec_relids is sufficient to prove
2066+
* that. (As with have_relevant_joinclause(), it is not necessary
2067+
* that the EC be able to form a joinclause relating exactly the two
2068+
* given rels, only that it be able to form a joinclause mentioning
2069+
* both, and this will surely be true if both of them overlap
2070+
* ec_relids.)
2071+
*
20662072
* Note we don't test ec_broken; if we did, we'd need a separate code
2067-
* path to look through ec_sources. Checking the members anyway is OK
2068-
* as a possibly-overoptimistic heuristic.
2073+
* path to look through ec_sources. Checking the membership anyway is
2074+
* OK as a possibly-overoptimistic heuristic.
20692075
*
20702076
* We don't test ec_has_const either, even though a const eclass won't
20712077
* generate real join clauses. This is because if we had "WHERE a.x =
20722078
* b.y and a.x = 42", it is worth considering a join between a and b,
20732079
* since the join result is likely to be small even though it'll end
20742080
* up being an unqualified nestloop.
20752081
*/
2076-
2077-
/* Needn't scan if it couldn't contain members from each rel */
2078-
if (!bms_overlap(rel1->relids, ec->ec_relids) ||
2079-
!bms_overlap(rel2->relids, ec->ec_relids))
2080-
continue;
2081-
2082-
/* Scan the EC to see if it has member(s) in each rel */
2083-
has_rel1 = has_rel2 = false;
2084-
foreach(lc2, ec->ec_members)
2085-
{
2086-
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
2087-
2088-
if (cur_em->em_is_const || cur_em->em_is_child)
2089-
continue; /* ignore consts and children here */
2090-
if (bms_is_subset(cur_em->em_relids, rel1->relids))
2091-
{
2092-
has_rel1 = true;
2093-
if (has_rel2)
2094-
break;
2095-
}
2096-
if (bms_is_subset(cur_em->em_relids, rel2->relids))
2097-
{
2098-
has_rel2 = true;
2099-
if (has_rel1)
2100-
break;
2101-
}
2102-
}
2103-
2104-
if (has_rel1 && has_rel2)
2082+
if (bms_overlap(rel1->relids, ec->ec_relids) &&
2083+
bms_overlap(rel2->relids, ec->ec_relids))
21052084
return true;
21062085
}
21072086

@@ -2112,7 +2091,7 @@ have_relevant_eclass_joinclause(PlannerInfo *root,
21122091
/*
21132092
* has_relevant_eclass_joinclause
21142093
* Detect whether there is an EquivalenceClass that could produce
2115-
* a joinclause between the given relation and anything else.
2094+
* a joinclause involving the given relation and anything else.
21162095
*
21172096
* This is the same as have_relevant_eclass_joinclause with the other rel
21182097
* implicitly defined as "everything else in the query".
@@ -2125,9 +2104,6 @@ has_relevant_eclass_joinclause(PlannerInfo *root, RelOptInfo *rel1)
21252104
foreach(lc1, root->eq_classes)
21262105
{
21272106
EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc1);
2128-
bool has_rel1;
2129-
bool has_rel2;
2130-
ListCell *lc2;
21312107

21322108
/*
21332109
* Won't generate joinclauses if single-member (this test covers the
@@ -2137,45 +2113,11 @@ has_relevant_eclass_joinclause(PlannerInfo *root, RelOptInfo *rel1)
21372113
continue;
21382114

21392115
/*
2140-
* Note we don't test ec_broken; if we did, we'd need a separate code
2141-
* path to look through ec_sources. Checking the members anyway is OK
2142-
* as a possibly-overoptimistic heuristic.
2143-
*
2144-
* We don't test ec_has_const either, even though a const eclass won't
2145-
* generate real join clauses. This is because if we had "WHERE a.x =
2146-
* b.y and a.x = 42", it is worth considering a join between a and b,
2147-
* since the join result is likely to be small even though it'll end
2148-
* up being an unqualified nestloop.
2116+
* Per the comment in have_relevant_eclass_joinclause, it's sufficient
2117+
* to find an EC that mentions both this rel and some other rel.
21492118
*/
2150-
2151-
/* Needn't scan if it couldn't contain members from each rel */
2152-
if (!bms_overlap(rel1->relids, ec->ec_relids) ||
2153-
bms_is_subset(ec->ec_relids, rel1->relids))
2154-
continue;
2155-
2156-
/* Scan the EC to see if it has member(s) in each rel */
2157-
has_rel1 = has_rel2 = false;
2158-
foreach(lc2, ec->ec_members)
2159-
{
2160-
EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2);
2161-
2162-
if (cur_em->em_is_const || cur_em->em_is_child)
2163-
continue; /* ignore consts and children here */
2164-
if (bms_is_subset(cur_em->em_relids, rel1->relids))
2165-
{
2166-
has_rel1 = true;
2167-
if (has_rel2)
2168-
break;
2169-
}
2170-
if (!bms_overlap(cur_em->em_relids, rel1->relids))
2171-
{
2172-
has_rel2 = true;
2173-
if (has_rel1)
2174-
break;
2175-
}
2176-
}
2177-
2178-
if (has_rel1 && has_rel2)
2119+
if (bms_overlap(rel1->relids, ec->ec_relids) &&
2120+
!bms_is_subset(ec->ec_relids, rel1->relids))
21792121
return true;
21802122
}
21812123

src/backend/optimizer/path/joinrels.c

Lines changed: 5 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -88,13 +88,9 @@ join_search_one_level(PlannerInfo *root, int level)
8888
has_join_restriction(root, old_rel))
8989
{
9090
/*
91-
* Note that if all available join clauses for this rel require
92-
* more than one other rel, we will fail to make any joins against
93-
* it here. In most cases that's OK; it'll be considered by
94-
* "bushy plan" join code in a higher-level pass where we have
95-
* those other rels collected into a join rel.
96-
*
97-
* See also the last-ditch case below.
91+
* There are relevant join clauses or join order restrictions,
92+
* so consider joins between this rel and (only) those rels it is
93+
* linked to by a clause or restriction.
9894
*/
9995
make_rels_by_clause_joins(root,
10096
old_rel,
@@ -160,8 +156,8 @@ join_search_one_level(PlannerInfo *root, int level)
160156
{
161157
/*
162158
* OK, we can build a rel of the right level from this
163-
* pair of rels. Do so if there is at least one usable
164-
* join clause or a relevant join restriction.
159+
* pair of rels. Do so if there is at least one relevant
160+
* join clause or join order restriction.
165161
*/
166162
if (have_relevant_joinclause(root, old_rel, new_rel) ||
167163
have_join_order_restriction(root, old_rel, new_rel))

src/backend/optimizer/util/joininfo.c

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,34 +21,46 @@
2121

2222
/*
2323
* have_relevant_joinclause
24-
* Detect whether there is a joinclause that can be used to join
24+
* Detect whether there is a joinclause that involves
2525
* the two given relations.
26+
*
27+
* Note: the joinclause does not have to be evaluatable with only these two
28+
* relations. This is intentional. For example consider
29+
* SELECT * FROM a, b, c WHERE a.x = (b.y + c.z)
30+
* If a is much larger than the other tables, it may be worthwhile to
31+
* cross-join b and c and then use an inner indexscan on a.x. Therefore
32+
* we should consider this joinclause as reason to join b to c, even though
33+
* it can't be applied at that join step.
2634
*/
2735
bool
2836
have_relevant_joinclause(PlannerInfo *root,
2937
RelOptInfo *rel1, RelOptInfo *rel2)
3038
{
3139
bool result = false;
32-
Relids join_relids;
3340
List *joininfo;
41+
Relids other_relids;
3442
ListCell *l;
3543

36-
join_relids = bms_union(rel1->relids, rel2->relids);
37-
3844
/*
3945
* We could scan either relation's joininfo list; may as well use the
4046
* shorter one.
4147
*/
4248
if (list_length(rel1->joininfo) <= list_length(rel2->joininfo))
49+
{
4350
joininfo = rel1->joininfo;
51+
other_relids = rel2->relids;
52+
}
4453
else
54+
{
4555
joininfo = rel2->joininfo;
56+
other_relids = rel1->relids;
57+
}
4658

4759
foreach(l, joininfo)
4860
{
4961
RestrictInfo *rinfo = (RestrictInfo *) lfirst(l);
5062

51-
if (bms_is_subset(rinfo->required_relids, join_relids))
63+
if (bms_overlap(other_relids, rinfo->required_relids))
5264
{
5365
result = true;
5466
break;
@@ -62,8 +74,6 @@ have_relevant_joinclause(PlannerInfo *root,
6274
if (!result && rel1->has_eclass_joins && rel2->has_eclass_joins)
6375
result = have_relevant_eclass_joinclause(root, rel1, rel2);
6476

65-
bms_free(join_relids);
66-
6777
return result;
6878
}
6979

src/test/regress/expected/join.out

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2666,6 +2666,57 @@ select * from int4_tbl a full join int4_tbl b on false;
26662666
-2147483647 |
26672667
(10 rows)
26682668

2669+
--
2670+
-- test for ability to use a cartesian join when necessary
2671+
--
2672+
explain (costs off)
2673+
select * from
2674+
tenk1 join int4_tbl on f1 = twothousand,
2675+
int4(sin(1)) q1,
2676+
int4(sin(0)) q2
2677+
where q1 = thousand or q2 = thousand;
2678+
QUERY PLAN
2679+
-----------------------------------------------------------------------------
2680+
Hash Join
2681+
Hash Cond: (tenk1.twothousand = int4_tbl.f1)
2682+
-> Nested Loop
2683+
Join Filter: ((q1.q1 = tenk1.thousand) OR (q2.q2 = tenk1.thousand))
2684+
-> Nested Loop
2685+
-> Function Scan on q1
2686+
-> Function Scan on q2
2687+
-> Bitmap Heap Scan on tenk1
2688+
Recheck Cond: ((q1.q1 = thousand) OR (q2.q2 = thousand))
2689+
-> BitmapOr
2690+
-> Bitmap Index Scan on tenk1_thous_tenthous
2691+
Index Cond: (q1.q1 = thousand)
2692+
-> Bitmap Index Scan on tenk1_thous_tenthous
2693+
Index Cond: (q2.q2 = thousand)
2694+
-> Hash
2695+
-> Seq Scan on int4_tbl
2696+
(16 rows)
2697+
2698+
explain (costs off)
2699+
select * from
2700+
tenk1 join int4_tbl on f1 = twothousand,
2701+
int4(sin(1)) q1,
2702+
int4(sin(0)) q2
2703+
where thousand = (q1 + q2);
2704+
QUERY PLAN
2705+
--------------------------------------------------------------
2706+
Hash Join
2707+
Hash Cond: (tenk1.twothousand = int4_tbl.f1)
2708+
-> Nested Loop
2709+
-> Nested Loop
2710+
-> Function Scan on q1
2711+
-> Function Scan on q2
2712+
-> Bitmap Heap Scan on tenk1
2713+
Recheck Cond: (thousand = (q1.q1 + q2.q2))
2714+
-> Bitmap Index Scan on tenk1_thous_tenthous
2715+
Index Cond: (thousand = (q1.q1 + q2.q2))
2716+
-> Hash
2717+
-> Seq Scan on int4_tbl
2718+
(12 rows)
2719+
26692720
--
26702721
-- test join removal
26712722
--

src/test/regress/sql/join.sql

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -689,6 +689,24 @@ order by 1,2;
689689
select * from int4_tbl a full join int4_tbl b on true;
690690
select * from int4_tbl a full join int4_tbl b on false;
691691

692+
--
693+
-- test for ability to use a cartesian join when necessary
694+
--
695+
696+
explain (costs off)
697+
select * from
698+
tenk1 join int4_tbl on f1 = twothousand,
699+
int4(sin(1)) q1,
700+
int4(sin(0)) q2
701+
where q1 = thousand or q2 = thousand;
702+
703+
explain (costs off)
704+
select * from
705+
tenk1 join int4_tbl on f1 = twothousand,
706+
int4(sin(1)) q1,
707+
int4(sin(0)) q2
708+
where thousand = (q1 + q2);
709+
692710
--
693711
-- test join removal
694712
--

0 commit comments

Comments
 (0)