Skip to content

Commit b514a74

Browse files
committed
Fix planning of star-schema-style queries.
Part of the intent of the parameterized-path mechanism was to handle star-schema queries efficiently, but some overly-restrictive search limiting logic added in commit e2fa76d prevented such cases from working as desired. Fix that and add a regression test about it. Per gripe from Marc Cousin. This is arguably a bug rather than a new feature, so back-patch to 9.2 where parameterized paths were introduced.
1 parent c4f4c7c commit b514a74

File tree

4 files changed

+86
-17
lines changed

4 files changed

+86
-17
lines changed

src/backend/optimizer/README

+27-7
Original file line numberDiff line numberDiff line change
@@ -702,17 +702,37 @@ intermediate layers of joins, for example:
702702
-> Index Scan using C_Z_IDX on C
703703
Index Condition: C.Z = A.X
704704

705-
If all joins are plain inner joins then this is unnecessary, because
706-
it's always possible to reorder the joins so that a parameter is used
705+
If all joins are plain inner joins then this is usually unnecessary,
706+
because it's possible to reorder the joins so that a parameter is used
707707
immediately below the nestloop node that provides it. But in the
708-
presence of outer joins, join reordering may not be possible, and then
709-
this option can be critical. Before version 9.2, Postgres used ad-hoc
710-
methods for planning and executing such queries, and those methods could
711-
not handle passing parameters down through multiple join levels.
708+
presence of outer joins, such join reordering may not be possible.
709+
710+
Also, the bottom-level scan might require parameters from more than one
711+
other relation. In principle we could join the other relations first
712+
so that all the parameters are supplied from a single nestloop level.
713+
But if those other relations have no join clause in common (which is
714+
common in star-schema queries for instance), the planner won't consider
715+
joining them directly to each other. In such a case we need to be able
716+
to create a plan like
717+
718+
NestLoop
719+
-> Seq Scan on SmallTable1 A
720+
NestLoop
721+
-> Seq Scan on SmallTable2 B
722+
NestLoop
723+
-> Index Scan using XYIndex on LargeTable C
724+
Index Condition: C.X = A.AID and C.Y = B.BID
725+
726+
so we should be willing to pass down A.AID through a join even though
727+
there is no join order constraint forcing the plan to look like this.
728+
729+
Before version 9.2, Postgres used ad-hoc methods for planning and
730+
executing nestloop queries of this kind, and those methods could not
731+
handle passing parameters down through multiple join levels.
712732

713733
To plan such queries, we now use a notion of a "parameterized path",
714734
which is a path that makes use of a join clause to a relation that's not
715-
scanned by the path. In the example just above, we would construct a
735+
scanned by the path. In the example two above, we would construct a
716736
path representing the possibility of doing this:
717737

718738
-> Index Scan using C_Z_IDX on C

src/backend/optimizer/path/joinpath.c

+31-10
Original file line numberDiff line numberDiff line change
@@ -117,13 +117,14 @@ add_paths_to_joinrel(PlannerInfo *root,
117117
/*
118118
* Decide whether it's sensible to generate parameterized paths for this
119119
* joinrel, and if so, which relations such paths should require. There
120-
* is no need to create a parameterized result path unless there is a join
121-
* order restriction that prevents joining one of our input rels directly
122-
* to the parameter source rel instead of joining to the other input rel.
123-
* This restriction reduces the number of parameterized paths we have to
124-
* deal with at higher join levels, without compromising the quality of
125-
* the resulting plan. We express the restriction as a Relids set that
126-
* must overlap the parameterization of any proposed join path.
120+
* is usually no need to create a parameterized result path unless there
121+
* is a join order restriction that prevents joining one of our input rels
122+
* directly to the parameter source rel instead of joining to the other
123+
* input rel. (But see exception in try_nestloop_path.) This restriction
124+
* reduces the number of parameterized paths we have to deal with at
125+
* higher join levels, without compromising the quality of the resulting
126+
* plan. We express the restriction as a Relids set that must overlap the
127+
* parameterization of any proposed join path.
127128
*/
128129
foreach(lc, root->join_info_list)
129130
{
@@ -291,9 +292,29 @@ try_nestloop_path(PlannerInfo *root,
291292
if (required_outer &&
292293
!bms_overlap(required_outer, param_source_rels))
293294
{
294-
/* Waste no memory when we reject a path here */
295-
bms_free(required_outer);
296-
return;
295+
/*
296+
* We override the param_source_rels heuristic to accept nestloop
297+
* paths in which the outer rel satisfies some but not all of the
298+
* inner path's parameterization. This is necessary to get good plans
299+
* for star-schema scenarios, in which a parameterized path for a
300+
* large table may require parameters from multiple small tables that
301+
* will not get joined directly to each other. We can handle that by
302+
* stacking nestloops that have the small tables on the outside; but
303+
* this breaks the rule the param_source_rels heuristic is based on,
304+
* namely that parameters should not be passed down across joins
305+
* unless there's a join-order-constraint-based reason to do so. So
306+
* ignore the param_source_rels restriction when this case applies.
307+
*/
308+
Relids outerrelids = outer_path->parent->relids;
309+
Relids innerparams = PATH_REQ_OUTER(inner_path);
310+
311+
if (!(bms_overlap(innerparams, outerrelids) &&
312+
bms_nonempty_difference(innerparams, outerrelids)))
313+
{
314+
/* Waste no memory when we reject a path here */
315+
bms_free(required_outer);
316+
return;
317+
}
297318
}
298319

299320
/*

src/test/regress/expected/join.out

+19
Original file line numberDiff line numberDiff line change
@@ -2780,6 +2780,25 @@ where thousand = (q1 + q2);
27802780
-> Seq Scan on int4_tbl
27812781
(12 rows)
27822782

2783+
--
2784+
-- test ability to generate a suitable plan for a star-schema query
2785+
--
2786+
explain (costs off)
2787+
select * from
2788+
tenk1, int8_tbl a, int8_tbl b
2789+
where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
2790+
QUERY PLAN
2791+
---------------------------------------------------------------------
2792+
Nested Loop
2793+
-> Seq Scan on int8_tbl b
2794+
Filter: (q2 = 2)
2795+
-> Nested Loop
2796+
-> Seq Scan on int8_tbl a
2797+
Filter: (q2 = 1)
2798+
-> Index Scan using tenk1_thous_tenthous on tenk1
2799+
Index Cond: ((thousand = a.q1) AND (tenthous = b.q1))
2800+
(8 rows)
2801+
27832802
--
27842803
-- test extraction of restriction OR clauses from join OR clause
27852804
-- (we used to only do this for indexable clauses)

src/test/regress/sql/join.sql

+9
Original file line numberDiff line numberDiff line change
@@ -763,6 +763,15 @@ select * from
763763
int4(sin(0)) q2
764764
where thousand = (q1 + q2);
765765

766+
--
767+
-- test ability to generate a suitable plan for a star-schema query
768+
--
769+
770+
explain (costs off)
771+
select * from
772+
tenk1, int8_tbl a, int8_tbl b
773+
where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
774+
766775
--
767776
-- test extraction of restriction OR clauses from join OR clause
768777
-- (we used to only do this for indexable clauses)

0 commit comments

Comments
 (0)