Skip to content

Commit 6f41995

Browse files
committed
Fix planning of star-schema-style queries.
Part of the intent of the parameterized-path mechanism was to handle star-schema queries efficiently, but some overly-restrictive search limiting logic added in commit e2fa76d prevented such cases from working as desired. Fix that and add a regression test about it. Per gripe from Marc Cousin. This is arguably a bug rather than a new feature, so back-patch to 9.2 where parameterized paths were introduced.
1 parent d670765 commit 6f41995

File tree

4 files changed

+86
-17
lines changed

4 files changed

+86
-17
lines changed

src/backend/optimizer/README

Lines changed: 27 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -705,17 +705,37 @@ intermediate layers of joins, for example:
705705
-> Index Scan using C_Z_IDX on C
706706
Index Condition: C.Z = A.X
707707

708-
If all joins are plain inner joins then this is unnecessary, because
709-
it's always possible to reorder the joins so that a parameter is used
708+
If all joins are plain inner joins then this is usually unnecessary,
709+
because it's possible to reorder the joins so that a parameter is used
710710
immediately below the nestloop node that provides it. But in the
711-
presence of outer joins, join reordering may not be possible, and then
712-
this option can be critical. Before version 9.2, Postgres used ad-hoc
713-
methods for planning and executing such queries, and those methods could
714-
not handle passing parameters down through multiple join levels.
711+
presence of outer joins, such join reordering may not be possible.
712+
713+
Also, the bottom-level scan might require parameters from more than one
714+
other relation. In principle we could join the other relations first
715+
so that all the parameters are supplied from a single nestloop level.
716+
But if those other relations have no join clause in common (which is
717+
common in star-schema queries for instance), the planner won't consider
718+
joining them directly to each other. In such a case we need to be able
719+
to create a plan like
720+
721+
NestLoop
722+
-> Seq Scan on SmallTable1 A
723+
NestLoop
724+
-> Seq Scan on SmallTable2 B
725+
NestLoop
726+
-> Index Scan using XYIndex on LargeTable C
727+
Index Condition: C.X = A.AID and C.Y = B.BID
728+
729+
so we should be willing to pass down A.AID through a join even though
730+
there is no join order constraint forcing the plan to look like this.
731+
732+
Before version 9.2, Postgres used ad-hoc methods for planning and
733+
executing nestloop queries of this kind, and those methods could not
734+
handle passing parameters down through multiple join levels.
715735

716736
To plan such queries, we now use a notion of a "parameterized path",
717737
which is a path that makes use of a join clause to a relation that's not
718-
scanned by the path. In the example just above, we would construct a
738+
scanned by the path. In the example two above, we would construct a
719739
path representing the possibility of doing this:
720740

721741
-> Index Scan using C_Z_IDX on C

src/backend/optimizer/path/joinpath.c

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -113,13 +113,14 @@ add_paths_to_joinrel(PlannerInfo *root,
113113
/*
114114
* Decide whether it's sensible to generate parameterized paths for this
115115
* joinrel, and if so, which relations such paths should require. There
116-
* is no need to create a parameterized result path unless there is a join
117-
* order restriction that prevents joining one of our input rels directly
118-
* to the parameter source rel instead of joining to the other input rel.
119-
* This restriction reduces the number of parameterized paths we have to
120-
* deal with at higher join levels, without compromising the quality of
121-
* the resulting plan. We express the restriction as a Relids set that
122-
* must overlap the parameterization of any proposed join path.
116+
* is usually no need to create a parameterized result path unless there
117+
* is a join order restriction that prevents joining one of our input rels
118+
* directly to the parameter source rel instead of joining to the other
119+
* input rel. (But see exception in try_nestloop_path.) This restriction
120+
* reduces the number of parameterized paths we have to deal with at
121+
* higher join levels, without compromising the quality of the resulting
122+
* plan. We express the restriction as a Relids set that must overlap the
123+
* parameterization of any proposed join path.
123124
*/
124125
foreach(lc, root->join_info_list)
125126
{
@@ -227,9 +228,29 @@ try_nestloop_path(PlannerInfo *root,
227228
if (required_outer &&
228229
!bms_overlap(required_outer, param_source_rels))
229230
{
230-
/* Waste no memory when we reject a path here */
231-
bms_free(required_outer);
232-
return;
231+
/*
232+
* We override the param_source_rels heuristic to accept nestloop
233+
* paths in which the outer rel satisfies some but not all of the
234+
* inner path's parameterization. This is necessary to get good plans
235+
* for star-schema scenarios, in which a parameterized path for a
236+
* large table may require parameters from multiple small tables that
237+
* will not get joined directly to each other. We can handle that by
238+
* stacking nestloops that have the small tables on the outside; but
239+
* this breaks the rule the param_source_rels heuristic is based on,
240+
* namely that parameters should not be passed down across joins
241+
* unless there's a join-order-constraint-based reason to do so. So
242+
* ignore the param_source_rels restriction when this case applies.
243+
*/
244+
Relids outerrelids = outer_path->parent->relids;
245+
Relids innerparams = PATH_REQ_OUTER(inner_path);
246+
247+
if (!(bms_overlap(innerparams, outerrelids) &&
248+
bms_nonempty_difference(innerparams, outerrelids)))
249+
{
250+
/* Waste no memory when we reject a path here */
251+
bms_free(required_outer);
252+
return;
253+
}
233254
}
234255

235256
/*

src/test/regress/expected/join.out

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2789,6 +2789,25 @@ where thousand = (q1 + q2);
27892789
-> Seq Scan on int4_tbl
27902790
(12 rows)
27912791

2792+
--
2793+
-- test ability to generate a suitable plan for a star-schema query
2794+
--
2795+
explain (costs off)
2796+
select * from
2797+
tenk1, int8_tbl a, int8_tbl b
2798+
where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
2799+
QUERY PLAN
2800+
---------------------------------------------------------------------
2801+
Nested Loop
2802+
-> Seq Scan on int8_tbl b
2803+
Filter: (q2 = 2)
2804+
-> Nested Loop
2805+
-> Seq Scan on int8_tbl a
2806+
Filter: (q2 = 1)
2807+
-> Index Scan using tenk1_thous_tenthous on tenk1
2808+
Index Cond: ((thousand = a.q1) AND (tenthous = b.q1))
2809+
(8 rows)
2810+
27922811
--
27932812
-- test placement of movable quals in a parameterized join tree
27942813
--

src/test/regress/sql/join.sql

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -763,6 +763,15 @@ select * from
763763
int4(sin(0)) q2
764764
where thousand = (q1 + q2);
765765

766+
--
767+
-- test ability to generate a suitable plan for a star-schema query
768+
--
769+
770+
explain (costs off)
771+
select * from
772+
tenk1, int8_tbl a, int8_tbl b
773+
where thousand = a.q1 and tenthous = b.q1 and a.q2 = 1 and b.q2 = 2;
774+
766775
--
767776
-- test placement of movable quals in a parameterized join tree
768777
--

0 commit comments

Comments
 (0)