Skip to content

Commit 9b282a9

Browse files
author
Richard Guo
committed
Fix partitionwise join with partially-redundant join clauses
To determine if the two relations being joined can use partitionwise join, we need to verify the existence of equi-join conditions involving pairs of matching partition keys for all partition keys. Currently we do that by looking through the join's restriction clauses. However, it has been discovered that this approach is insufficient, because there might be partition keys known equal by a specific EC, but they do not form a join clause because it happens that other members of the EC than the partition keys are constrained to become a join clause. To address this issue, in addition to examining the join's restriction clauses, we also check if any partition keys are known equal by ECs, by leveraging function exprs_known_equal(). To accomplish this, we enhance exprs_known_equal() to check equality per the semantics of the opfamily, if provided. It could be argued that exprs_known_equal() could be called O(N^2) times, where N is the number of partition key expressions, resulting in noticeable performance costs if there are a lot of partition key expressions. But I think this is not a problem. The number of a joinrel's partition key expressions would only be equal to the join degree, since each base relation within the join contributes only one partition key expression. That is to say, it does not scale with the number of partitions. A benchmark with a query involving 5-way joins of partitioned tables, each with 3 partition keys and 1000 partitions, shows that the planning time is not significantly affected by this patch (within the margin of error), particularly when compared to the impact caused by partitionwise join. Thanks to Tom Lane for the idea of leveraging exprs_known_equal() to check if partition keys are known equal by ECs. Author: Richard Guo, Tom Lane Reviewed-by: Tom Lane, Ashutosh Bapat, Robert Haas Discussion: https://postgr.es/m/CAN_9JTzo_2F5dKLqXVtDX5V6dwqB0Xk+ihstpKEt3a1LT6X78A@mail.gmail.com
1 parent 2309eff commit 9b282a9

File tree

6 files changed

+214
-21
lines changed

6 files changed

+214
-21
lines changed

src/backend/optimizer/path/equivclass.c

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2443,15 +2443,17 @@ find_join_domain(PlannerInfo *root, Relids relids)
24432443
* Detect whether two expressions are known equal due to equivalence
24442444
* relationships.
24452445
*
2446-
* Actually, this only shows that the expressions are equal according
2447-
* to some opfamily's notion of equality --- but we only use it for
2448-
* selectivity estimation, so a fuzzy idea of equality is OK.
2446+
* If opfamily is given, the expressions must be known equal per the semantics
2447+
* of that opfamily (note it has to be a btree opfamily, since those are the
2448+
* only opfamilies equivclass.c deals with). If opfamily is InvalidOid, we'll
2449+
* return true if they're equal according to any opfamily, which is fuzzy but
2450+
* OK for estimation purposes.
24492451
*
24502452
* Note: does not bother to check for "equal(item1, item2)"; caller must
24512453
* check that case if it's possible to pass identical items.
24522454
*/
24532455
bool
2454-
exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
2456+
exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
24552457
{
24562458
ListCell *lc1;
24572459

@@ -2466,6 +2468,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
24662468
if (ec->ec_has_volatile)
24672469
continue;
24682470

2471+
/*
2472+
* It's okay to consider ec_broken ECs here. Brokenness just means we
2473+
* couldn't derive all the implied clauses we'd have liked to; it does
2474+
* not invalidate our knowledge that the members are equal.
2475+
*/
2476+
2477+
/* Ignore if this EC doesn't use specified opfamily */
2478+
if (OidIsValid(opfamily) &&
2479+
!list_member_oid(ec->ec_opfamilies, opfamily))
2480+
continue;
2481+
24692482
foreach(lc2, ec->ec_members)
24702483
{
24712484
EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2494,8 +2507,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
24942507
* (In principle there might be more than one matching eclass if multiple
24952508
* collations are involved, but since collation doesn't matter for equality,
24962509
* we ignore that fine point here.) This is much like exprs_known_equal,
2497-
* except that we insist on the comparison operator matching the eclass, so
2498-
* that the result is definite not approximate.
2510+
* except for the format of the input.
24992511
*
25002512
* On success, we also set fkinfo->eclass[colno] to the matching eclass,
25012513
* and set fkinfo->fk_eclass_member[colno] to the eclass member for the
@@ -2536,7 +2548,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
25362548
/* Never match to a volatile EC */
25372549
if (ec->ec_has_volatile)
25382550
continue;
2539-
/* Note: it seems okay to match to "broken" eclasses here */
2551+
/* It's okay to consider "broken" ECs here, see exprs_known_equal */
25402552

25412553
foreach(lc2, ec->ec_members)
25422554
{

src/backend/optimizer/util/relnode.c

Lines changed: 92 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2080,10 +2080,9 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
20802080
JoinType jointype, List *restrictlist)
20812081
{
20822082
PartitionScheme part_scheme = rel1->part_scheme;
2083+
bool pk_known_equal[PARTITION_MAX_KEYS];
2084+
int num_equal_pks;
20832085
ListCell *lc;
2084-
int cnt_pks;
2085-
bool pk_has_clause[PARTITION_MAX_KEYS];
2086-
bool strict_op;
20872086

20882087
/*
20892088
* This function must only be called when the joined relations have same
@@ -2092,13 +2091,19 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
20922091
Assert(rel1->part_scheme == rel2->part_scheme);
20932092
Assert(part_scheme);
20942093

2095-
memset(pk_has_clause, 0, sizeof(pk_has_clause));
2094+
/* We use a bool array to track which partkey columns are known equal */
2095+
memset(pk_known_equal, 0, sizeof(pk_known_equal));
2096+
/* ... as well as a count of how many are known equal */
2097+
num_equal_pks = 0;
2098+
2099+
/* First, look through the join's restriction clauses */
20962100
foreach(lc, restrictlist)
20972101
{
20982102
RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
20992103
OpExpr *opexpr;
21002104
Expr *expr1;
21012105
Expr *expr2;
2106+
bool strict_op;
21022107
int ipk1;
21032108
int ipk2;
21042109

@@ -2176,11 +2181,15 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
21762181
if (ipk1 != ipk2)
21772182
continue;
21782183

2184+
/* Ignore clause if we already proved these keys equal. */
2185+
if (pk_known_equal[ipk1])
2186+
continue;
2187+
21792188
/*
21802189
* The clause allows partitionwise join only if it uses the same
21812190
* operator family as that specified by the partition key.
21822191
*/
2183-
if (rel1->part_scheme->strategy == PARTITION_STRATEGY_HASH)
2192+
if (part_scheme->strategy == PARTITION_STRATEGY_HASH)
21842193
{
21852194
if (!OidIsValid(rinfo->hashjoinoperator) ||
21862195
!op_in_opfamily(rinfo->hashjoinoperator,
@@ -2192,17 +2201,89 @@ have_partkey_equi_join(PlannerInfo *root, RelOptInfo *joinrel,
21922201
continue;
21932202

21942203
/* Mark the partition key as having an equi-join clause. */
2195-
pk_has_clause[ipk1] = true;
2204+
pk_known_equal[ipk1] = true;
2205+
2206+
/* We can stop examining clauses once we prove all keys equal. */
2207+
if (++num_equal_pks == part_scheme->partnatts)
2208+
return true;
21962209
}
21972210

2198-
/* Check whether every partition key has an equi-join condition. */
2199-
for (cnt_pks = 0; cnt_pks < part_scheme->partnatts; cnt_pks++)
2211+
/*
2212+
* Also check to see if any keys are known equal by equivclass.c. In most
2213+
* cases there would have been a join restriction clause generated from
2214+
* any EC that had such knowledge, but there might be no such clause, or
2215+
* it might happen to constrain other members of the ECs than the ones we
2216+
* are looking for.
2217+
*/
2218+
for (int ipk = 0; ipk < part_scheme->partnatts; ipk++)
22002219
{
2201-
if (!pk_has_clause[cnt_pks])
2202-
return false;
2220+
Oid btree_opfamily;
2221+
2222+
/* Ignore if we already proved these keys equal. */
2223+
if (pk_known_equal[ipk])
2224+
continue;
2225+
2226+
/*
2227+
* We need a btree opfamily to ask equivclass.c about. If the
2228+
* partopfamily is a hash opfamily, look up its equality operator, and
2229+
* select some btree opfamily that that operator is part of. (Any
2230+
* such opfamily should be good enough, since equivclass.c will track
2231+
* multiple opfamilies as appropriate.)
2232+
*/
2233+
if (part_scheme->strategy == PARTITION_STRATEGY_HASH)
2234+
{
2235+
Oid eq_op;
2236+
List *eq_opfamilies;
2237+
2238+
eq_op = get_opfamily_member(part_scheme->partopfamily[ipk],
2239+
part_scheme->partopcintype[ipk],
2240+
part_scheme->partopcintype[ipk],
2241+
HTEqualStrategyNumber);
2242+
if (!OidIsValid(eq_op))
2243+
break; /* we're not going to succeed */
2244+
eq_opfamilies = get_mergejoin_opfamilies(eq_op);
2245+
if (eq_opfamilies == NIL)
2246+
break; /* we're not going to succeed */
2247+
btree_opfamily = linitial_oid(eq_opfamilies);
2248+
}
2249+
else
2250+
btree_opfamily = part_scheme->partopfamily[ipk];
2251+
2252+
/*
2253+
* We consider only non-nullable partition keys here; nullable ones
2254+
* would not be treated as part of the same equivalence classes as
2255+
* non-nullable ones.
2256+
*/
2257+
foreach(lc, rel1->partexprs[ipk])
2258+
{
2259+
Node *expr1 = (Node *) lfirst(lc);
2260+
ListCell *lc2;
2261+
2262+
foreach(lc2, rel2->partexprs[ipk])
2263+
{
2264+
Node *expr2 = (Node *) lfirst(lc2);
2265+
2266+
if (exprs_known_equal(root, expr1, expr2, btree_opfamily))
2267+
{
2268+
pk_known_equal[ipk] = true;
2269+
break;
2270+
}
2271+
}
2272+
if (pk_known_equal[ipk])
2273+
break;
2274+
}
2275+
2276+
if (pk_known_equal[ipk])
2277+
{
2278+
/* We can stop examining keys once we prove all keys equal. */
2279+
if (++num_equal_pks == part_scheme->partnatts)
2280+
return true;
2281+
}
2282+
else
2283+
break; /* no chance to succeed, give up */
22032284
}
22042285

2205-
return true;
2286+
return false;
22062287
}
22072288

22082289
/*

src/backend/utils/adt/selfuncs.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
33133313

33143314
/*
33153315
* Drop known-equal vars, but only if they belong to different
3316-
* relations (see comments for estimate_num_groups)
3316+
* relations (see comments for estimate_num_groups). We aren't too
3317+
* fussy about the semantics of "equal" here.
33173318
*/
33183319
if (vardata->rel != varinfo->rel &&
3319-
exprs_known_equal(root, var, varinfo->var))
3320+
exprs_known_equal(root, var, varinfo->var, InvalidOid))
33203321
{
33213322
if (varinfo->ndistinct <= ndistinct)
33223323
{

src/include/optimizer/paths.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
158158
Relids join_relids,
159159
Relids outer_relids,
160160
RelOptInfo *inner_rel);
161-
extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
161+
extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
162+
Oid opfamily);
162163
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
163164
ForeignKeyOptInfo *fkinfo,
164165
int colno);

src/test/regress/expected/partition_join.out

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,45 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b =
6262
450 | 0450 | 450 | 0450
6363
(4 rows)
6464

65+
-- inner join with partially-redundant join clauses
66+
EXPLAIN (COSTS OFF)
67+
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a AND t1.a = t2.b ORDER BY t1.a, t2.b;
68+
QUERY PLAN
69+
---------------------------------------------------------------
70+
Sort
71+
Sort Key: t1.a
72+
-> Append
73+
-> Merge Join
74+
Merge Cond: (t1_1.a = t2_1.a)
75+
-> Index Scan using iprt1_p1_a on prt1_p1 t1_1
76+
-> Sort
77+
Sort Key: t2_1.b
78+
-> Seq Scan on prt2_p1 t2_1
79+
Filter: (a = b)
80+
-> Hash Join
81+
Hash Cond: (t1_2.a = t2_2.a)
82+
-> Seq Scan on prt1_p2 t1_2
83+
-> Hash
84+
-> Seq Scan on prt2_p2 t2_2
85+
Filter: (a = b)
86+
-> Hash Join
87+
Hash Cond: (t1_3.a = t2_3.a)
88+
-> Seq Scan on prt1_p3 t1_3
89+
-> Hash
90+
-> Seq Scan on prt2_p3 t2_3
91+
Filter: (a = b)
92+
(22 rows)
93+
94+
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a AND t1.a = t2.b ORDER BY t1.a, t2.b;
95+
a | c | b | c
96+
----+------+----+------
97+
0 | 0000 | 0 | 0000
98+
6 | 0006 | 6 | 0006
99+
12 | 0012 | 12 | 0012
100+
18 | 0018 | 18 | 0018
101+
24 | 0024 | 24 | 0024
102+
(5 rows)
103+
65104
-- left outer join, 3-way
66105
EXPLAIN (COSTS OFF)
67106
SELECT COUNT(*) FROM prt1 t1
@@ -1803,6 +1842,55 @@ SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1, prt2_l t2 WHERE t1.a = t2.b AND t1
18031842
450 | 0002 | 450 | 0002
18041843
(4 rows)
18051844

1845+
-- inner join with partially-redundant join clauses
1846+
EXPLAIN (COSTS OFF)
1847+
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1, prt2_l t2 WHERE t1.a = t2.a AND t1.a = t2.b AND t1.c = t2.c ORDER BY t1.a, t2.b;
1848+
QUERY PLAN
1849+
------------------------------------------------------------------------------------
1850+
Sort
1851+
Sort Key: t1.a
1852+
-> Append
1853+
-> Hash Join
1854+
Hash Cond: ((t1_1.a = t2_1.a) AND ((t1_1.c)::text = (t2_1.c)::text))
1855+
-> Seq Scan on prt1_l_p1 t1_1
1856+
-> Hash
1857+
-> Seq Scan on prt2_l_p1 t2_1
1858+
Filter: (a = b)
1859+
-> Hash Join
1860+
Hash Cond: ((t1_2.a = t2_2.a) AND ((t1_2.c)::text = (t2_2.c)::text))
1861+
-> Seq Scan on prt1_l_p2_p1 t1_2
1862+
-> Hash
1863+
-> Seq Scan on prt2_l_p2_p1 t2_2
1864+
Filter: (a = b)
1865+
-> Hash Join
1866+
Hash Cond: ((t1_3.a = t2_3.a) AND ((t1_3.c)::text = (t2_3.c)::text))
1867+
-> Seq Scan on prt1_l_p2_p2 t1_3
1868+
-> Hash
1869+
-> Seq Scan on prt2_l_p2_p2 t2_3
1870+
Filter: (a = b)
1871+
-> Hash Join
1872+
Hash Cond: ((t1_5.a = t2_5.a) AND ((t1_5.c)::text = (t2_5.c)::text))
1873+
-> Append
1874+
-> Seq Scan on prt1_l_p3_p1 t1_5
1875+
-> Seq Scan on prt1_l_p3_p2 t1_6
1876+
-> Hash
1877+
-> Append
1878+
-> Seq Scan on prt2_l_p3_p1 t2_5
1879+
Filter: (a = b)
1880+
-> Seq Scan on prt2_l_p3_p2 t2_6
1881+
Filter: (a = b)
1882+
(32 rows)
1883+
1884+
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1, prt2_l t2 WHERE t1.a = t2.a AND t1.a = t2.b AND t1.c = t2.c ORDER BY t1.a, t2.b;
1885+
a | c | b | c
1886+
----+------+----+------
1887+
0 | 0000 | 0 | 0000
1888+
6 | 0002 | 6 | 0002
1889+
12 | 0000 | 12 | 0000
1890+
18 | 0002 | 18 | 0002
1891+
24 | 0000 | 24 | 0000
1892+
(5 rows)
1893+
18061894
-- left join
18071895
EXPLAIN (COSTS OFF)
18081896
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 LEFT JOIN prt2_l t2 ON t1.a = t2.b AND t1.c = t2.c WHERE t1.b = 0 ORDER BY t1.a, t2.b;

src/test/regress/sql/partition_join.sql

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,11 @@ EXPLAIN (COSTS OFF)
3434
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;
3535
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;
3636

37+
-- inner join with partially-redundant join clauses
38+
EXPLAIN (COSTS OFF)
39+
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a AND t1.a = t2.b ORDER BY t1.a, t2.b;
40+
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1 t1, prt2 t2 WHERE t1.a = t2.a AND t1.a = t2.b ORDER BY t1.a, t2.b;
41+
3742
-- left outer join, 3-way
3843
EXPLAIN (COSTS OFF)
3944
SELECT COUNT(*) FROM prt1 t1
@@ -386,6 +391,11 @@ EXPLAIN (COSTS OFF)
386391
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1, prt2_l t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;
387392
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1, prt2_l t2 WHERE t1.a = t2.b AND t1.b = 0 ORDER BY t1.a, t2.b;
388393

394+
-- inner join with partially-redundant join clauses
395+
EXPLAIN (COSTS OFF)
396+
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1, prt2_l t2 WHERE t1.a = t2.a AND t1.a = t2.b AND t1.c = t2.c ORDER BY t1.a, t2.b;
397+
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1, prt2_l t2 WHERE t1.a = t2.a AND t1.a = t2.b AND t1.c = t2.c ORDER BY t1.a, t2.b;
398+
389399
-- left join
390400
EXPLAIN (COSTS OFF)
391401
SELECT t1.a, t1.c, t2.b, t2.c FROM prt1_l t1 LEFT JOIN prt2_l t2 ON t1.a = t2.b AND t1.c = t2.c WHERE t1.b = 0 ORDER BY t1.a, t2.b;

0 commit comments

Comments
 (0)