Skip to content

Commit 7ad6498

Browse files
committed
Avoid crash in partitionwise join planning under GEQO.
While trying to plan a partitionwise join, we may be faced with cases where one or both input partitions for a particular segment of the join have been pruned away. In HEAD and v11, this is problematic because earlier processing didn't bother to make a pruned RelOptInfo fully valid. With an upcoming patch to make partition pruning more efficient, this'll be even more problematic because said RelOptInfo won't exist at all. The existing code attempts to deal with this by retroactively making the RelOptInfo fully valid, but that causes crashes under GEQO because join planning is done in a short-lived memory context. In v11 we could probably have fixed this by switching to the planner's main context while fixing up the RelOptInfo, but that idea doesn't scale well to the upcoming patch. It would be better not to mess with the base-relation data structures during join planning, anyway --- that's just a recipe for order-of-operations bugs. In many cases, though, we don't actually need the child RelOptInfo, because if the input is certainly empty then the join segment's result is certainly empty, so we can skip making a join plan altogether. (The existing code ultimately arrives at the same conclusion, but only after doing a lot more work.) This approach works except when the pruned-away partition is on the nullable side of a LEFT, ANTI, or FULL join, and the other side isn't pruned. But in those cases the existing code leaves a lot to be desired anyway --- the correct output is just the result of the unpruned side of the join, but we were emitting a useless outer join against a dummy Result. Pending somebody writing code to handle that more nicely, let's just abandon the partitionwise-join optimization in such cases. When the modified code skips making a join plan, it doesn't make a join RelOptInfo either; this requires some upper-level code to cope with nulls in part_rels[] arrays. We would have had to have that anyway after the upcoming patch. Back-patch to v11 since the crash is demonstrable there. Discussion: https://postgr.es/m/8305.1553884377@sss.pgh.pa.us
1 parent ef6576f commit 7ad6498

File tree

7 files changed

+157
-165
lines changed

7 files changed

+157
-165
lines changed

src/backend/optimizer/path/allpaths.c

+8-6
Original file line numberDiff line numberDiff line change
@@ -1112,11 +1112,11 @@ set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
11121112
* for partitioned child rels.
11131113
*
11141114
* Note: here we abuse the consider_partitionwise_join flag by setting
1115-
* it *even* for child rels that are not partitioned. In that case,
1116-
* we set it to tell try_partitionwise_join() that it doesn't need to
1117-
* generate their targetlists and EC entries as they have already been
1118-
* generated here, as opposed to the dummy child rels for which the
1119-
* flag is left set to false so that it will generate them.
1115+
* it for child rels that are not themselves partitioned. We do so to
1116+
* tell try_partitionwise_join() that the child rel is sufficiently
1117+
* valid to be used as a per-partition input, even if it later gets
1118+
* proven to be dummy. (It's not usable until we've set up the
1119+
* reltarget and EC entries, which we just did.)
11201120
*/
11211121
if (rel->consider_partitionwise_join)
11221122
childrel->consider_partitionwise_join = true;
@@ -3564,7 +3564,9 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
35643564
{
35653565
RelOptInfo *child_rel = part_rels[cnt_parts];
35663566

3567-
Assert(child_rel != NULL);
3567+
/* If it's been pruned entirely, it's certainly dummy. */
3568+
if (child_rel == NULL)
3569+
continue;
35683570

35693571
/* Add partitionwise join paths for partitioned child-joins. */
35703572
generate_partitionwise_join_paths(root, child_rel);

src/backend/optimizer/path/joinrels.c

+56-33
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,7 @@
1515
#include "postgres.h"
1616

1717
#include "miscadmin.h"
18-
#include "nodes/nodeFuncs.h"
1918
#include "optimizer/appendinfo.h"
20-
#include "optimizer/clauses.h"
2119
#include "optimizer/joininfo.h"
2220
#include "optimizer/pathnode.h"
2321
#include "optimizer/paths.h"
@@ -44,8 +42,6 @@ static void try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1,
4442
RelOptInfo *rel2, RelOptInfo *joinrel,
4543
SpecialJoinInfo *parent_sjinfo,
4644
List *parent_restrictlist);
47-
static void update_child_rel_info(PlannerInfo *root,
48-
RelOptInfo *rel, RelOptInfo *childrel);
4945
static SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
5046
SpecialJoinInfo *parent_sjinfo,
5147
Relids left_relids, Relids right_relids);
@@ -1405,6 +1401,10 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
14051401
{
14061402
RelOptInfo *child_rel1 = rel1->part_rels[cnt_parts];
14071403
RelOptInfo *child_rel2 = rel2->part_rels[cnt_parts];
1404+
bool rel1_empty = (child_rel1 == NULL ||
1405+
IS_DUMMY_REL(child_rel1));
1406+
bool rel2_empty = (child_rel2 == NULL ||
1407+
IS_DUMMY_REL(child_rel2));
14081408
SpecialJoinInfo *child_sjinfo;
14091409
List *child_restrictlist;
14101410
RelOptInfo *child_joinrel;
@@ -1413,24 +1413,69 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
14131413
int nappinfos;
14141414

14151415
/*
1416-
* If a child table has consider_partitionwise_join=false, it means
1416+
* Check for cases where we can prove that this segment of the join
1417+
* returns no rows, due to one or both inputs being empty (including
1418+
* inputs that have been pruned away entirely). If so just ignore it.
1419+
* These rules are equivalent to populate_joinrel_with_paths's rules
1420+
* for dummy input relations.
1421+
*/
1422+
switch (parent_sjinfo->jointype)
1423+
{
1424+
case JOIN_INNER:
1425+
case JOIN_SEMI:
1426+
if (rel1_empty || rel2_empty)
1427+
continue; /* ignore this join segment */
1428+
break;
1429+
case JOIN_LEFT:
1430+
case JOIN_ANTI:
1431+
if (rel1_empty)
1432+
continue; /* ignore this join segment */
1433+
break;
1434+
case JOIN_FULL:
1435+
if (rel1_empty && rel2_empty)
1436+
continue; /* ignore this join segment */
1437+
break;
1438+
default:
1439+
/* other values not expected here */
1440+
elog(ERROR, "unrecognized join type: %d",
1441+
(int) parent_sjinfo->jointype);
1442+
break;
1443+
}
1444+
1445+
/*
1446+
* If a child has been pruned entirely then we can't generate paths
1447+
* for it, so we have to reject partitionwise joining unless we were
1448+
* able to eliminate this partition above.
1449+
*/
1450+
if (child_rel1 == NULL || child_rel2 == NULL)
1451+
{
1452+
/*
1453+
* Mark the joinrel as unpartitioned so that later functions treat
1454+
* it correctly.
1455+
*/
1456+
joinrel->nparts = 0;
1457+
return;
1458+
}
1459+
1460+
/*
1461+
* If a leaf relation has consider_partitionwise_join=false, it means
14171462
* that it's a dummy relation for which we skipped setting up tlist
1418-
* expressions and adding EC members in set_append_rel_size(), so do
1419-
* that now for use later.
1463+
* expressions and adding EC members in set_append_rel_size(), so
1464+
* again we have to fail here.
14201465
*/
14211466
if (rel1_is_simple && !child_rel1->consider_partitionwise_join)
14221467
{
14231468
Assert(child_rel1->reloptkind == RELOPT_OTHER_MEMBER_REL);
14241469
Assert(IS_DUMMY_REL(child_rel1));
1425-
update_child_rel_info(root, rel1, child_rel1);
1426-
child_rel1->consider_partitionwise_join = true;
1470+
joinrel->nparts = 0;
1471+
return;
14271472
}
14281473
if (rel2_is_simple && !child_rel2->consider_partitionwise_join)
14291474
{
14301475
Assert(child_rel2->reloptkind == RELOPT_OTHER_MEMBER_REL);
14311476
Assert(IS_DUMMY_REL(child_rel2));
1432-
update_child_rel_info(root, rel2, child_rel2);
1433-
child_rel2->consider_partitionwise_join = true;
1477+
joinrel->nparts = 0;
1478+
return;
14341479
}
14351480

14361481
/* We should never try to join two overlapping sets of rels. */
@@ -1474,28 +1519,6 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
14741519
}
14751520
}
14761521

1477-
/*
1478-
* Set up tlist expressions for the childrel, and add EC members referencing
1479-
* the childrel.
1480-
*/
1481-
static void
1482-
update_child_rel_info(PlannerInfo *root,
1483-
RelOptInfo *rel, RelOptInfo *childrel)
1484-
{
1485-
AppendRelInfo *appinfo = root->append_rel_array[childrel->relid];
1486-
1487-
/* Make child tlist expressions */
1488-
childrel->reltarget->exprs = (List *)
1489-
adjust_appendrel_attrs(root,
1490-
(Node *) rel->reltarget->exprs,
1491-
1, &appinfo);
1492-
1493-
/* Make child entries in the EquivalenceClass as well */
1494-
if (rel->has_eclass_joins || has_useful_pathkeys(root, rel))
1495-
add_child_rel_equivalences(root, appinfo, rel, childrel);
1496-
childrel->has_eclass_joins = rel->has_eclass_joins;
1497-
}
1498-
14991522
/*
15001523
* Construct the SpecialJoinInfo for a child-join by translating
15011524
* SpecialJoinInfo for the join between parents. left_relids and right_relids

src/backend/optimizer/plan/planner.c

+7-10
Original file line numberDiff line numberDiff line change
@@ -6993,6 +6993,10 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
69936993
List *child_scanjoin_targets = NIL;
69946994
ListCell *lc;
69956995

6996+
/* Pruned or dummy children can be ignored. */
6997+
if (child_rel == NULL || IS_DUMMY_REL(child_rel))
6998+
continue;
6999+
69967000
/* Translate scan/join targets for this child. */
69977001
appinfos = find_appinfos_by_relids(root, child_rel->relids,
69987002
&nappinfos);
@@ -7093,8 +7097,9 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
70937097
RelOptInfo *child_grouped_rel;
70947098
RelOptInfo *child_partially_grouped_rel;
70957099

7096-
/* Input child rel must have a path */
7097-
Assert(child_input_rel->pathlist != NIL);
7100+
/* Pruned or dummy children can be ignored. */
7101+
if (child_input_rel == NULL || IS_DUMMY_REL(child_input_rel))
7102+
continue;
70987103

70997104
/*
71007105
* Copy the given "extra" structure as is and then override the
@@ -7136,14 +7141,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
71367141
extra->target_parallel_safe,
71377142
child_extra.havingQual);
71387143

7139-
/* Ignore empty children. They contribute nothing. */
7140-
if (IS_DUMMY_REL(child_input_rel))
7141-
{
7142-
mark_dummy_rel(child_grouped_rel);
7143-
7144-
continue;
7145-
}
7146-
71477144
/* Create grouping paths for this child relation. */
71487145
create_ordinary_grouping_paths(root, child_input_rel,
71497146
child_grouped_rel,

src/test/regress/expected/partition_aggregate.out

+39-59
Original file line numberDiff line numberDiff line change
@@ -716,37 +716,33 @@ SELECT a.x, sum(b.x) FROM pagg_tab1 a FULL OUTER JOIN pagg_tab2 b ON a.x = b.y G
716716
| 500
717717
(16 rows)
718718

719-
-- LEFT JOIN, with dummy relation on right side,
719+
-- LEFT JOIN, with dummy relation on right side, ideally
720720
-- should produce full partitionwise aggregation plan as GROUP BY is on
721-
-- non-nullable columns
721+
-- non-nullable columns.
722+
-- But right now we are unable to do partitionwise join in this case.
722723
EXPLAIN (COSTS OFF)
723724
SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a LEFT JOIN (SELECT * FROM pagg_tab2 WHERE y > 10) b ON a.x = b.y WHERE a.x > 5 or b.y < 20 GROUP BY a.x, b.y ORDER BY 1, 2;
724-
QUERY PLAN
725-
-----------------------------------------------------------------------------
725+
QUERY PLAN
726+
-----------------------------------------------------------------------
726727
Sort
727-
Sort Key: pagg_tab1_p1.x, y
728-
-> Append
729-
-> HashAggregate
730-
Group Key: pagg_tab1_p1.x, y
731-
-> Hash Left Join
732-
Hash Cond: (pagg_tab1_p1.x = y)
733-
Filter: ((pagg_tab1_p1.x > 5) OR (y < 20))
728+
Sort Key: pagg_tab1_p1.x, pagg_tab2_p2.y
729+
-> HashAggregate
730+
Group Key: pagg_tab1_p1.x, pagg_tab2_p2.y
731+
-> Hash Left Join
732+
Hash Cond: (pagg_tab1_p1.x = pagg_tab2_p2.y)
733+
Filter: ((pagg_tab1_p1.x > 5) OR (pagg_tab2_p2.y < 20))
734+
-> Append
734735
-> Seq Scan on pagg_tab1_p1
735736
Filter: (x < 20)
736-
-> Hash
737-
-> Result
738-
One-Time Filter: false
739-
-> HashAggregate
740-
Group Key: pagg_tab1_p2.x, pagg_tab2_p2.y
741-
-> Hash Left Join
742-
Hash Cond: (pagg_tab1_p2.x = pagg_tab2_p2.y)
743-
Filter: ((pagg_tab1_p2.x > 5) OR (pagg_tab2_p2.y < 20))
744737
-> Seq Scan on pagg_tab1_p2
745738
Filter: (x < 20)
746-
-> Hash
739+
-> Hash
740+
-> Append
747741
-> Seq Scan on pagg_tab2_p2
748742
Filter: (y > 10)
749-
(23 rows)
743+
-> Seq Scan on pagg_tab2_p3
744+
Filter: (y > 10)
745+
(18 rows)
750746

751747
SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a LEFT JOIN (SELECT * FROM pagg_tab2 WHERE y > 10) b ON a.x = b.y WHERE a.x > 5 or b.y < 20 GROUP BY a.x, b.y ORDER BY 1, 2;
752748
x | y | count
@@ -760,49 +756,33 @@ SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a LEFT JOI
760756
18 | 18 | 100
761757
(7 rows)
762758

763-
-- FULL JOIN, with dummy relations on both sides,
759+
-- FULL JOIN, with dummy relations on both sides, ideally
764760
-- should produce partial partitionwise aggregation plan as GROUP BY is on
765-
-- nullable columns
761+
-- nullable columns.
762+
-- But right now we are unable to do partitionwise join in this case.
766763
EXPLAIN (COSTS OFF)
767764
SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a FULL JOIN (SELECT * FROM pagg_tab2 WHERE y > 10) b ON a.x = b.y WHERE a.x > 5 or b.y < 20 GROUP BY a.x, b.y ORDER BY 1, 2;
768-
QUERY PLAN
769-
-----------------------------------------------------------------------------------
770-
Finalize GroupAggregate
771-
Group Key: pagg_tab1_p1.x, y
772-
-> Sort
773-
Sort Key: pagg_tab1_p1.x, y
774-
-> Append
775-
-> Partial HashAggregate
776-
Group Key: pagg_tab1_p1.x, y
777-
-> Hash Full Join
778-
Hash Cond: (pagg_tab1_p1.x = y)
779-
Filter: ((pagg_tab1_p1.x > 5) OR (y < 20))
780-
-> Seq Scan on pagg_tab1_p1
781-
Filter: (x < 20)
782-
-> Hash
783-
-> Result
784-
One-Time Filter: false
785-
-> Partial HashAggregate
786-
Group Key: pagg_tab1_p2.x, pagg_tab2_p2.y
787-
-> Hash Full Join
788-
Hash Cond: (pagg_tab1_p2.x = pagg_tab2_p2.y)
789-
Filter: ((pagg_tab1_p2.x > 5) OR (pagg_tab2_p2.y < 20))
790-
-> Seq Scan on pagg_tab1_p2
791-
Filter: (x < 20)
792-
-> Hash
793-
-> Seq Scan on pagg_tab2_p2
794-
Filter: (y > 10)
795-
-> Partial HashAggregate
796-
Group Key: x, pagg_tab2_p3.y
797-
-> Hash Full Join
798-
Hash Cond: (pagg_tab2_p3.y = x)
799-
Filter: ((x > 5) OR (pagg_tab2_p3.y < 20))
765+
QUERY PLAN
766+
-----------------------------------------------------------------------
767+
Sort
768+
Sort Key: pagg_tab1_p1.x, pagg_tab2_p2.y
769+
-> HashAggregate
770+
Group Key: pagg_tab1_p1.x, pagg_tab2_p2.y
771+
-> Hash Full Join
772+
Hash Cond: (pagg_tab1_p1.x = pagg_tab2_p2.y)
773+
Filter: ((pagg_tab1_p1.x > 5) OR (pagg_tab2_p2.y < 20))
774+
-> Append
775+
-> Seq Scan on pagg_tab1_p1
776+
Filter: (x < 20)
777+
-> Seq Scan on pagg_tab1_p2
778+
Filter: (x < 20)
779+
-> Hash
780+
-> Append
781+
-> Seq Scan on pagg_tab2_p2
782+
Filter: (y > 10)
800783
-> Seq Scan on pagg_tab2_p3
801784
Filter: (y > 10)
802-
-> Hash
803-
-> Result
804-
One-Time Filter: false
805-
(35 rows)
785+
(18 rows)
806786

807787
SELECT a.x, b.y, count(*) FROM (SELECT * FROM pagg_tab1 WHERE x < 20) a FULL JOIN (SELECT * FROM pagg_tab2 WHERE y > 10) b ON a.x = b.y WHERE a.x > 5 or b.y < 20 GROUP BY a.x, b.y ORDER BY 1, 2;
808788
x | y | count

0 commit comments

Comments
 (0)