Skip to content

Commit 85e5e22

Browse files
committed
Fix a PlaceHolderVar-related oversight in star-schema planning patch.
In commit b514a74, I changed the planner so that it would allow nestloop paths to remain partially parameterized, ie the inner relation might need parameters from both the current outer relation and some upper-level outer relation. That's fine so long as we're talking about distinct parameters; but the patch also allowed creation of nestloop paths for cases where the inner relation's parameter was a PlaceHolderVar whose eval_at set included the current outer relation and some upper-level one. That does *not* work. In principle we could allow such a PlaceHolderVar to be evaluated at the lower join node using values passed down from the upper relation along with values from the join's own outer relation. However, nodeNestloop.c only supports simple Vars not arbitrary expressions as nestloop parameters. createplan.c is also a few bricks shy of being able to handle such cases; it misplaces the PlaceHolderVar parameters in the plan tree, which is why the visible symptoms of this bug are "plan should not reference subplan's variable" and "failed to assign all NestLoopParams to plan nodes" planner errors. Adding the necessary complexity to make this work doesn't seem like it would be repaid in significantly better plans, because in cases where such a PHV exists, there is probably a corresponding join order constraint that would allow a good plan to be found without using the star-schema exception. Furthermore, adding complexity to nodeNestloop.c would create a run-time penalty even for plans where this whole consideration is irrelevant. So let's just reject such paths instead. Per fuzz testing by Andreas Seltenreich; the added regression test is based on his example query. Back-patch to 9.2, like the previous patch.
1 parent 369342c commit 85e5e22

File tree

3 files changed

+158
-26
lines changed

3 files changed

+158
-26
lines changed

src/backend/optimizer/path/joinpath.c

+76-26
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ add_paths_to_joinrel(PlannerInfo *root,
118118
* is usually no need to create a parameterized result path unless there
119119
* is a join order restriction that prevents joining one of our input rels
120120
* directly to the parameter source rel instead of joining to the other
121-
* input rel. (But see exception in try_nestloop_path.) This restriction
121+
* input rel. (But see allow_star_schema_join().) This restriction
122122
* reduces the number of parameterized paths we have to deal with at
123123
* higher join levels, without compromising the quality of the resulting
124124
* plan. We express the restriction as a Relids set that must overlap the
@@ -270,6 +270,74 @@ add_paths_to_joinrel(PlannerInfo *root,
270270
jointype, &extra);
271271
}
272272

273+
/*
274+
* We override the param_source_rels heuristic to accept nestloop paths in
275+
* which the outer rel satisfies some but not all of the inner path's
276+
* parameterization. This is necessary to get good plans for star-schema
277+
* scenarios, in which a parameterized path for a large table may require
278+
* parameters from multiple small tables that will not get joined directly to
279+
* each other. We can handle that by stacking nestloops that have the small
280+
* tables on the outside; but this breaks the rule the param_source_rels
281+
* heuristic is based on, namely that parameters should not be passed down
282+
* across joins unless there's a join-order-constraint-based reason to do so.
283+
* So we ignore the param_source_rels restriction when this case applies.
284+
*
285+
* However, there's a pitfall: suppose the inner rel (call it A) has a
286+
* parameter that is a PlaceHolderVar, and that PHV's minimum eval_at set
287+
* includes the outer rel (B) and some third rel (C). If we treat this as a
288+
* star-schema case and create a B/A nestloop join that's parameterized by C,
289+
* we would end up with a plan in which the PHV's expression has to be
290+
* evaluated as a nestloop parameter at the B/A join; and the executor is only
291+
* set up to handle simple Vars as NestLoopParams. Rather than add complexity
292+
* and overhead to the executor for such corner cases, it seems better to
293+
* forbid the join. (Note that existence of such a PHV probably means there
294+
* is a join order constraint that will cause us to consider joining B and C
295+
* directly; so we can still make use of A's parameterized path, and there is
296+
* no need for the star-schema exception.) To implement this exception to the
297+
* exception, we check whether any PHVs used in the query could pose such a
298+
* hazard. We don't have any simple way of checking whether a risky PHV would
299+
* actually be used in the inner plan, and the case is so unusual that it
300+
* doesn't seem worth working very hard on it.
301+
*
302+
* allow_star_schema_join() returns TRUE if the param_source_rels restriction
303+
* should be overridden, ie, it's okay to perform this join.
304+
*/
305+
static bool
306+
allow_star_schema_join(PlannerInfo *root,
307+
Path *outer_path,
308+
Path *inner_path)
309+
{
310+
Relids innerparams = PATH_REQ_OUTER(inner_path);
311+
Relids outerrelids = outer_path->parent->relids;
312+
ListCell *lc;
313+
314+
/*
315+
* It's not a star-schema case unless the outer rel provides some but not
316+
* all of the inner rel's parameterization.
317+
*/
318+
if (!(bms_overlap(innerparams, outerrelids) &&
319+
bms_nonempty_difference(innerparams, outerrelids)))
320+
return false;
321+
322+
/* Check for hazardous PHVs */
323+
foreach(lc, root->placeholder_list)
324+
{
325+
PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc);
326+
327+
if (!bms_is_subset(phinfo->ph_eval_at, innerparams))
328+
continue; /* ignore, could not be a nestloop param */
329+
if (!bms_overlap(phinfo->ph_eval_at, outerrelids))
330+
continue; /* ignore, not relevant to this join */
331+
if (bms_is_subset(phinfo->ph_eval_at, outerrelids))
332+
continue; /* safe, it can be eval'd within outerrel */
333+
/* Otherwise, it's potentially unsafe, so reject the join */
334+
return false;
335+
}
336+
337+
/* OK to perform the join */
338+
return true;
339+
}
340+
273341
/*
274342
* try_nestloop_path
275343
* Consider a nestloop join path; if it appears useful, push it into
@@ -289,36 +357,18 @@ try_nestloop_path(PlannerInfo *root,
289357

290358
/*
291359
* Check to see if proposed path is still parameterized, and reject if the
292-
* parameterization wouldn't be sensible.
360+
* parameterization wouldn't be sensible --- unless allow_star_schema_join
361+
* says to allow it anyway.
293362
*/
294363
required_outer = calc_nestloop_required_outer(outer_path,
295364
inner_path);
296365
if (required_outer &&
297-
!bms_overlap(required_outer, extra->param_source_rels))
366+
!bms_overlap(required_outer, extra->param_source_rels) &&
367+
!allow_star_schema_join(root, outer_path, inner_path))
298368
{
299-
/*
300-
* We override the param_source_rels heuristic to accept nestloop
301-
* paths in which the outer rel satisfies some but not all of the
302-
* inner path's parameterization. This is necessary to get good plans
303-
* for star-schema scenarios, in which a parameterized path for a
304-
* large table may require parameters from multiple small tables that
305-
* will not get joined directly to each other. We can handle that by
306-
* stacking nestloops that have the small tables on the outside; but
307-
* this breaks the rule the param_source_rels heuristic is based on,
308-
* namely that parameters should not be passed down across joins
309-
* unless there's a join-order-constraint-based reason to do so. So
310-
* ignore the param_source_rels restriction when this case applies.
311-
*/
312-
Relids outerrelids = outer_path->parent->relids;
313-
Relids innerparams = PATH_REQ_OUTER(inner_path);
314-
315-
if (!(bms_overlap(innerparams, outerrelids) &&
316-
bms_nonempty_difference(innerparams, outerrelids)))
317-
{
318-
/* Waste no memory when we reject a path here */
319-
bms_free(required_outer);
320-
return;
321-
}
369+
/* Waste no memory when we reject a path here */
370+
bms_free(required_outer);
371+
return;
322372
}
323373

324374
/*

src/test/regress/expected/join.out

+51
Original file line numberDiff line numberDiff line change
@@ -2949,6 +2949,57 @@ where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
29492949
Index Cond: ((thousand = a.q1) AND (tenthous = b.q1))
29502950
(8 rows)
29512951

2952+
--
2953+
-- test a corner case in which we shouldn't apply the star-schema optimization
2954+
--
2955+
explain (costs off)
2956+
select t1.unique2, t1.stringu1, t2.unique1, t2.stringu2 from
2957+
tenk1 t1
2958+
inner join int4_tbl i1
2959+
left join (select v1.x2, v2.y1, 11 AS d1
2960+
from (values(1,0)) v1(x1,x2)
2961+
left join (values(3,1)) v2(y1,y2)
2962+
on v1.x1 = v2.y2) subq1
2963+
on (i1.f1 = subq1.x2)
2964+
on (t1.unique2 = subq1.d1)
2965+
left join tenk1 t2
2966+
on (subq1.y1 = t2.unique1)
2967+
where t1.unique2 < 42 and t1.stringu1 > t2.stringu2;
2968+
QUERY PLAN
2969+
-----------------------------------------------------------------------
2970+
Nested Loop
2971+
Join Filter: (t1.stringu1 > t2.stringu2)
2972+
-> Nested Loop
2973+
Join Filter: ((0) = i1.f1)
2974+
-> Nested Loop
2975+
-> Nested Loop
2976+
Join Filter: ((1) = (1))
2977+
-> Result
2978+
-> Result
2979+
-> Index Scan using tenk1_unique2 on tenk1 t1
2980+
Index Cond: ((unique2 = (11)) AND (unique2 < 42))
2981+
-> Seq Scan on int4_tbl i1
2982+
-> Index Scan using tenk1_unique1 on tenk1 t2
2983+
Index Cond: (unique1 = (3))
2984+
(14 rows)
2985+
2986+
select t1.unique2, t1.stringu1, t2.unique1, t2.stringu2 from
2987+
tenk1 t1
2988+
inner join int4_tbl i1
2989+
left join (select v1.x2, v2.y1, 11 AS d1
2990+
from (values(1,0)) v1(x1,x2)
2991+
left join (values(3,1)) v2(y1,y2)
2992+
on v1.x1 = v2.y2) subq1
2993+
on (i1.f1 = subq1.x2)
2994+
on (t1.unique2 = subq1.d1)
2995+
left join tenk1 t2
2996+
on (subq1.y1 = t2.unique1)
2997+
where t1.unique2 < 42 and t1.stringu1 > t2.stringu2;
2998+
unique2 | stringu1 | unique1 | stringu2
2999+
---------+----------+---------+----------
3000+
11 | WFAAAA | 3 | LKIAAA
3001+
(1 row)
3002+
29523003
--
29533004
-- test extraction of restriction OR clauses from join OR clause
29543005
-- (we used to only do this for indexable clauses)

src/test/regress/sql/join.sql

+31
Original file line numberDiff line numberDiff line change
@@ -843,6 +843,37 @@ select * from
843843
tenk1, int8_tbl a, int8_tbl b
844844
where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
845845

846+
--
847+
-- test a corner case in which we shouldn't apply the star-schema optimization
848+
--
849+
850+
explain (costs off)
851+
select t1.unique2, t1.stringu1, t2.unique1, t2.stringu2 from
852+
tenk1 t1
853+
inner join int4_tbl i1
854+
left join (select v1.x2, v2.y1, 11 AS d1
855+
from (values(1,0)) v1(x1,x2)
856+
left join (values(3,1)) v2(y1,y2)
857+
on v1.x1 = v2.y2) subq1
858+
on (i1.f1 = subq1.x2)
859+
on (t1.unique2 = subq1.d1)
860+
left join tenk1 t2
861+
on (subq1.y1 = t2.unique1)
862+
where t1.unique2 < 42 and t1.stringu1 > t2.stringu2;
863+
864+
select t1.unique2, t1.stringu1, t2.unique1, t2.stringu2 from
865+
tenk1 t1
866+
inner join int4_tbl i1
867+
left join (select v1.x2, v2.y1, 11 AS d1
868+
from (values(1,0)) v1(x1,x2)
869+
left join (values(3,1)) v2(y1,y2)
870+
on v1.x1 = v2.y2) subq1
871+
on (i1.f1 = subq1.x2)
872+
on (t1.unique2 = subq1.d1)
873+
left join tenk1 t2
874+
on (subq1.y1 = t2.unique1)
875+
where t1.unique2 < 42 and t1.stringu1 > t2.stringu2;
876+
846877
--
847878
-- test extraction of restriction OR clauses from join OR clause
848879
-- (we used to only do this for indexable clauses)

0 commit comments

Comments
 (0)