Skip to content

Commit 9f5f212

Browse files
committed
Allow the planner to collapse explicit inner JOINs together, rather than
necessarily following the JOIN syntax to develop the query plan. The old behavior is still available by setting GUC variable JOIN_COLLAPSE_LIMIT to 1. Also create a GUC variable FROM_COLLAPSE_LIMIT to control the similar decision about when to collapse sub-SELECT lists into their parent lists. (This behavior existed already, but the limit was always GEQO_THRESHOLD/2; now it's separately adjustable.)
1 parent 15ab7a8 commit 9f5f212

File tree

12 files changed

+1040
-878
lines changed

12 files changed

+1040
-878
lines changed

doc/src/sgml/perform.sgml

Lines changed: 62 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.23 2003/01/12 18:42:59 tgl Exp $
2+
$Header: /cvsroot/pgsql/doc/src/sgml/perform.sgml,v 1.24 2003/01/25 23:10:27 tgl Exp $
33
-->
44

55
<chapter id="performance-tips">
@@ -591,53 +591,93 @@ SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
591591
</para>
592592

593593
<para>
594-
The <productname>PostgreSQL</productname> query planner treats all
595-
explicit <literal>JOIN</> syntaxes as constraining the join order, even though
596-
it is not logically necessary to make such a constraint for inner
597-
joins. Therefore, although all of these queries give the same result:
594+
Explicit inner join syntax (<literal>INNER JOIN</>, <literal>CROSS
595+
JOIN</>, or unadorned <literal>JOIN</>) is semantically the same as
596+
listing the input relations in <literal>FROM</>, so it does not need to
597+
constrain the join order. But it is possible to instruct the
598+
<productname>PostgreSQL</productname> query planner to treat
599+
explicit inner <literal>JOIN</>s as constraining the join order anyway.
600+
For example, these three queries are logically equivalent:
598601
<programlisting>
599602
SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
600603
SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;
601604
SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);
602605
</programlisting>
606+
But if we tell the planner to honor the <literal>JOIN</> order,
603607
the second and third take less time to plan than the first. This effect
604608
is not worth worrying about for only three tables, but it can be a
605609
lifesaver with many tables.
606610
</para>
607611

612+
<para>
613+
To force the planner to follow the <literal>JOIN</> order for inner joins,
614+
set the <varname>JOIN_COLLAPSE_LIMIT</> run-time parameter to 1.
615+
(Other possible values are discussed below.)
616+
</para>
617+
608618
<para>
609619
You do not need to constrain the join order completely in order to
610-
cut search time, because it's OK to use <literal>JOIN</> operators in a plain
611-
<literal>FROM</> list. For example,
620+
cut search time, because it's OK to use <literal>JOIN</> operators
621+
within items of a plain <literal>FROM</> list. For example, consider
612622
<programlisting>
613623
SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;
614624
</programlisting>
625+
With <varname>JOIN_COLLAPSE_LIMIT</> = 1, this
615626
forces the planner to join A to B before joining them to other tables,
616627
but doesn't constrain its choices otherwise. In this example, the
617628
number of possible join orders is reduced by a factor of 5.
618629
</para>
619630

620631
<para>
621-
If you have a mix of outer and inner joins in a complex query, you
622-
might not want to constrain the planner's search for a good ordering
623-
of inner joins inside an outer join. You can't do that directly in the
624-
<literal>JOIN</> syntax, but you can get around the syntactic limitation by using
625-
subselects. For example,
632+
Constraining the planner's search in this way is a useful technique
633+
both for reducing planning time and for directing the planner to a
634+
good query plan. If the planner chooses a bad join order by default,
635+
you can force it to choose a better order via <literal>JOIN</> syntax
636+
--- assuming that you know of a better order, that is. Experimentation
637+
is recommended.
638+
</para>
639+
640+
<para>
641+
A closely related issue that affects planning time is collapsing of
642+
sub-SELECTs into their parent query. For example, consider
643+
<programlisting>
644+
SELECT *
645+
FROM x, y,
646+
(SELECT * FROM a, b, c WHERE something) AS ss
647+
WHERE somethingelse
648+
</programlisting>
649+
This situation might arise from use of a view that contains a join;
650+
the view's SELECT rule will be inserted in place of the view reference,
651+
yielding a query much like the above. Normally, the planner will try
652+
to collapse the sub-query into the parent, yielding
626653
<programlisting>
627-
SELECT * FROM d LEFT JOIN
628-
(SELECT * FROM a, b, c WHERE ...) AS ss
629-
ON (...);
654+
SELECT * FROM x, y, a, b, c WHERE something AND somethingelse
630655
</programlisting>
631-
Here, joining to D must be the last step in the query plan, but the
632-
planner is free to consider various join orders for A, B, and C.
656+
This usually results in a better plan than planning the sub-query
657+
separately. (For example, the outer WHERE conditions might be such that
658+
joining X to A first eliminates many rows of A, thus avoiding the need to
659+
form the full logical output of the sub-select.) But at the same time,
660+
we have increased the planning time; here, we have a five-way join
661+
problem replacing two separate three-way join problems. Because of the
662+
exponential growth of the number of possibilities, this makes a big
663+
difference. The planner tries to avoid getting stuck in huge join search
664+
problems by not collapsing a sub-query if more than
665+
<varname>FROM_COLLAPSE_LIMIT</> FROM-items would result in the parent
666+
query. You can trade off planning time against quality of plan by
667+
adjusting this run-time parameter up or down.
633668
</para>
634669

635670
<para>
636-
Constraining the planner's search in this way is a useful technique
637-
both for reducing planning time and for directing the planner to a
638-
good query plan. If the planner chooses a bad join order by default,
639-
you can force it to choose a better order via <literal>JOIN</> syntax --- assuming
640-
that you know of a better order, that is. Experimentation is recommended.
671+
<varname>FROM_COLLAPSE_LIMIT</> and <varname>JOIN_COLLAPSE_LIMIT</>
672+
are similarly named because they do almost the same thing: one controls
673+
when the planner will <quote>flatten out</> sub-SELECTs, and the
674+
other controls when it will flatten out explicit inner JOINs. Typically
675+
you would either set <varname>JOIN_COLLAPSE_LIMIT</> equal to
676+
<varname>FROM_COLLAPSE_LIMIT</> (so that explicit JOINs and sub-SELECTs
677+
act similarly) or set <varname>JOIN_COLLAPSE_LIMIT</> to 1 (if you want
678+
to control join order with explicit JOINs). But you might set them
679+
differently if you are trying to fine-tune the tradeoff between planning
680+
time and run time.
641681
</para>
642682
</sect1>
643683

doc/src/sgml/release.sgml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$Header: /cvsroot/pgsql/doc/src/sgml/release.sgml,v 1.180 2003/01/23 23:38:51 petere Exp $
2+
$Header: /cvsroot/pgsql/doc/src/sgml/release.sgml,v 1.181 2003/01/25 23:10:27 tgl Exp $
33
-->
44

55
<appendix id="release">
@@ -24,6 +24,7 @@ CDATA means the content is "SGML-free", so you can write without
2424
worries about funny characters.
2525
-->
2626
<literallayout><![CDATA[
27+
Explicit JOINs no longer constrain query plan, unless JOIN_COLLAPSE_LIMIT = 1
2728
Performance of "foo IN (SELECT ...)" queries has been considerably improved
2829
FETCH 0 now re-fetches cursor's current row, per SQL spec
2930
Revised executor state representation; plan trees are read-only to executor now

doc/src/sgml/runtime.sgml

Lines changed: 41 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.166 2003/01/11 05:04:14 momjian Exp $
2+
$Header: /cvsroot/pgsql/doc/src/sgml/runtime.sgml,v 1.167 2003/01/25 23:10:27 tgl Exp $
33
-->
44

55
<Chapter Id="runtime">
@@ -773,6 +773,19 @@ env PGOPTIONS='-c geqo=off' psql
773773
</listitem>
774774
</varlistentry>
775775

776+
<varlistentry>
777+
<term><varname>FROM_COLLAPSE_LIMIT</varname> (<type>integer</type>)</term>
778+
<listitem>
779+
<para>
780+
The planner will merge sub-queries into upper queries if the resulting
781+
FROM list would have no more than this many items. Smaller values
782+
reduce planning time but may yield inferior query plans.
783+
The default is 8. It is usually wise to keep this less than
784+
<literal>GEQO_THRESHOLD</>.
785+
</para>
786+
</listitem>
787+
</varlistentry>
788+
776789
<varlistentry>
777790
<indexterm>
778791
<primary>genetic query optimization</primary>
@@ -826,12 +839,27 @@ env PGOPTIONS='-c geqo=off' psql
826839
<listitem>
827840
<para>
828841
Use genetic query optimization to plan queries with at least
829-
this many <literal>FROM</> items involved. (Note that a
842+
this many <literal>FROM</> items involved. (Note that an outer
830843
<literal>JOIN</> construct counts as only one <literal>FROM</>
831844
item.) The default is 11. For simpler queries it is usually best
832-
to use the deterministic, exhaustive planner. This parameter
833-
also controls how hard the optimizer will try to merge subquery
834-
<literal>FROM</literal> clauses into the upper query.
845+
to use the deterministic, exhaustive planner, but for queries with
846+
many tables the deterministic planner takes too long.
847+
</para>
848+
</listitem>
849+
</varlistentry>
850+
851+
<varlistentry>
852+
<term><varname>JOIN_COLLAPSE_LIMIT</varname> (<type>integer</type>)</term>
853+
<listitem>
854+
<para>
855+
The planner will flatten explicit inner <literal>JOIN</> constructs
856+
into lists of <literal>FROM</> items whenever a list of no more than
857+
this many items would result. Usually this is set the same as
858+
<literal>FROM_COLLAPSE_LIMIT</>. Setting it to 1 prevents any
859+
flattening of inner <literal>JOIN</>s, allowing explicit
860+
<literal>JOIN</> syntax to be used to control the join order.
861+
Intermediate values might be useful to trade off planning time
862+
against quality of plan.
835863
</para>
836864
</listitem>
837865
</varlistentry>
@@ -1842,8 +1870,8 @@ dynamic_library_path = '/usr/local/lib/postgresql:/home/my_project/lib:$libdir'
18421870
server. The default is 64. Each buffer is typically 8192
18431871
bytes. This must be greater than 16, as well as at least twice
18441872
the value of <varname>MAX_CONNECTIONS</varname>; however, a
1845-
higher value can often improve performance on modern
1846-
machines. Values of at least a few thousand are recommended
1873+
higher value can often improve performance.
1874+
Values of a few thousand are recommended
18471875
for production installations. This option can only be set at
18481876
server start.
18491877
</para>
@@ -1878,15 +1906,17 @@ dynamic_library_path = '/usr/local/lib/postgresql:/home/my_project/lib:$libdir'
18781906
<listitem>
18791907
<para>
18801908
Specifies the amount of memory to be used by internal sorts and
1881-
hashes before switching to temporary disk files. The value is
1909+
hash tables before switching to temporary disk files. The value is
18821910
specified in kilobytes, and defaults to 1024 kilobytes (1 MB).
1883-
Note that for a complex query, several sorts might be running in
1884-
parallel, and each one will be allowed to use as much memory as
1885-
this value specifies before it starts to put data into temporary
1911+
Note that for a complex query, several sorts or hashes might be
1912+
running in parallel; each one will be allowed to use as much memory
1913+
as this value specifies before it starts to put data into temporary
18861914
files. Also, each running backend could be doing one or more
18871915
sorts simultaneously, so the total memory used could be many
18881916
times the value of <varname>SORT_MEM</varname>. Sorts are used
18891917
by <literal>ORDER BY</>, merge joins, and <command>CREATE INDEX</>.
1918+
Hash tables are used in hash joins, hash-based aggregation, and
1919+
hash-based processing of <literal>IN</> sub-selects.
18901920
</para>
18911921
</listitem>
18921922
</varlistentry>

src/backend/optimizer/path/allpaths.c

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/allpaths.c,v 1.94 2003/01/20 18:54:49 tgl Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/optimizer/path/allpaths.c,v 1.95 2003/01/25 23:10:27 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -30,8 +30,9 @@
3030
#include "rewrite/rewriteManip.h"
3131

3232

33-
bool enable_geqo = true;
34-
int geqo_rels = DEFAULT_GEQO_RELS;
33+
/* These parameters are set by GUC */
34+
bool enable_geqo = false; /* just in case GUC doesn't set it */
35+
int geqo_threshold;
3536

3637

3738
static void set_base_rel_pathlists(Query *root);
@@ -422,7 +423,7 @@ make_fromexpr_rel(Query *root, FromExpr *from)
422423
* Consider the different orders in which we could join the rels,
423424
* using either GEQO or regular optimizer.
424425
*/
425-
if (enable_geqo && levels_needed >= geqo_rels)
426+
if (enable_geqo && levels_needed >= geqo_threshold)
426427
return geqo(root, levels_needed, initial_rels);
427428
else
428429
return make_one_rel_by_joins(root, levels_needed, initial_rels);

src/backend/optimizer/plan/planner.c

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $Header: /cvsroot/pgsql/src/backend/optimizer/plan/planner.c,v 1.141 2003/01/20 18:54:52 tgl Exp $
11+
* $Header: /cvsroot/pgsql/src/backend/optimizer/plan/planner.c,v 1.142 2003/01/25 23:10:27 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -166,12 +166,6 @@ subquery_planner(Query *parse, double tuple_fraction)
166166
parse->jointree = (FromExpr *)
167167
pull_up_subqueries(parse, (Node *) parse->jointree, false);
168168

169-
/*
170-
* If so, we may have created opportunities to simplify the jointree.
171-
*/
172-
parse->jointree = (FromExpr *)
173-
preprocess_jointree(parse, (Node *) parse->jointree);
174-
175169
/*
176170
* Detect whether any rangetable entries are RTE_JOIN kind; if not,
177171
* we can avoid the expense of doing flatten_join_alias_vars().
@@ -246,6 +240,16 @@ subquery_planner(Query *parse, double tuple_fraction)
246240
}
247241
parse->havingQual = (Node *) newHaving;
248242

243+
/*
244+
* See if we can simplify the jointree; opportunities for this may come
245+
* from having pulled up subqueries, or from flattening explicit JOIN
246+
* syntax. We must do this after flattening JOIN alias variables, since
247+
* eliminating explicit JOIN nodes from the jointree will cause
248+
* get_relids_for_join() to fail.
249+
*/
250+
parse->jointree = (FromExpr *)
251+
preprocess_jointree(parse, (Node *) parse->jointree);
252+
249253
/*
250254
* Do the main planning. If we have an inherited target relation,
251255
* that needs special processing, else go straight to

0 commit comments

Comments
 (0)