Skip to content

Commit fc069a3

Browse files
akorotkovAlena0704
andcommitted
Implement Self-Join Elimination
The Self-Join Elimination (SJE) feature removes an inner join of a plain table to itself in the query tree if it is proven that the join can be replaced with a scan without impacting the query result. Self-join and inner relation get replaced with the outer in query, equivalence classes, and planner info structures. Also, the inner restrictlist moves to the outer one with the removal of duplicated clauses. Thus, this optimization reduces the length of the range table list (this especially makes sense for partitioned relations), reduces the number of restriction clauses and, in turn, selectivity estimations, and potentially improves total planner prediction for the query. This feature is dedicated to avoiding redundancy, which can appear after pull-up transformations or the creation of an EquivalenceClass-derived clause like the below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3); SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x); SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x; In the future, we could also reduce redundancy caused by subquery pull-up after unnecessary outer join removal in cases like the one below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x); Also, it can drastically help to join partitioned tables, removing entries even before their expansion. The SJE proof is based on innerrel_is_unique() machinery. We can remove a self-join when for each outer row: 1. At most, one inner row matches the join clause; 2. Each matched inner row must be (physically) the same as the outer one; 3. Inner and outer rows have the same row mark. In this patch, we use the next approach to identify a self-join: 1. Collect all merge-joinable join quals which look like a.x = b.x; 2. Add to the list above the baseretrictinfo of the inner table; 3. Check innerrel_is_unique() for the qual list. If it returns false, skip this pair of joining tables; 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the possibility of self-join elimination, the inner and outer clauses must match exactly. The relation replacement procedure is not trivial and is partly combined with the one used to remove useless left joins. Tests covering this feature were added to join.sql. Some of the existing regression tests changed due to self-join removal logic. Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Simon Riggs <simon@2ndquadrant.com> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org> Reviewed-by: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Hywel Carver <hywel@skillerwhale.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec> Reviewed-by: Michał Kłeczek <michal@kleczek.org> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
1 parent 3fb5862 commit fc069a3

File tree

19 files changed

+2983
-148
lines changed

19 files changed

+2983
-148
lines changed

doc/src/sgml/config.sgml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5544,6 +5544,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
55445544
</listitem>
55455545
</varlistentry>
55465546

5547+
<varlistentry id="guc-enable_self_join_elimination" xreflabel="enable_self_join_elimination">
5548+
<term><varname>enable_self_join_elimination</varname> (<type>boolean</type>)
5549+
<indexterm>
5550+
<primary><varname>enable_self_join_elimination</varname> configuration parameter</primary>
5551+
</indexterm>
5552+
</term>
5553+
<listitem>
5554+
<para>
5555+
Enables or disables the query planner's optimization which analyses
5556+
the query tree and replaces self joins with semantically equivalent
5557+
single scans. Takes into consideration only plain tables.
5558+
The default is <literal>on</literal>.
5559+
</para>
5560+
</listitem>
5561+
</varlistentry>
5562+
55475563
<varlistentry id="guc-enable-seqscan" xreflabel="enable_seqscan">
55485564
<term><varname>enable_seqscan</varname> (<type>boolean</type>)
55495565
<indexterm>

src/backend/optimizer/path/equivclass.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -852,7 +852,8 @@ find_computable_ec_member(PlannerInfo *root,
852852
exprvars = pull_var_clause((Node *) exprs,
853853
PVC_INCLUDE_AGGREGATES |
854854
PVC_INCLUDE_WINDOWFUNCS |
855-
PVC_INCLUDE_PLACEHOLDERS);
855+
PVC_INCLUDE_PLACEHOLDERS |
856+
PVC_INCLUDE_CONVERTROWTYPES);
856857

857858
foreach(lc, ec->ec_members)
858859
{

src/backend/optimizer/path/indxpath.c

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4162,6 +4162,22 @@ bool
41624162
relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
41634163
List *restrictlist,
41644164
List *exprlist, List *oprlist)
4165+
{
4166+
return relation_has_unique_index_ext(root, rel, restrictlist,
4167+
exprlist, oprlist, NULL);
4168+
}
4169+
4170+
/*
4171+
* relation_has_unique_index_ext
4172+
* Same as relation_has_unique_index_for(), but supports extra_clauses
4173+
* parameter. If extra_clauses isn't NULL, return baserestrictinfo clauses
4174+
* which were used to derive uniqueness.
4175+
*/
4176+
bool
4177+
relation_has_unique_index_ext(PlannerInfo *root, RelOptInfo *rel,
4178+
List *restrictlist,
4179+
List *exprlist, List *oprlist,
4180+
List **extra_clauses)
41654181
{
41664182
ListCell *ic;
41674183

@@ -4217,6 +4233,7 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
42174233
{
42184234
IndexOptInfo *ind = (IndexOptInfo *) lfirst(ic);
42194235
int c;
4236+
List *exprs = NIL;
42204237

42214238
/*
42224239
* If the index is not unique, or not immediately enforced, or if it's
@@ -4268,6 +4285,24 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
42684285
if (match_index_to_operand(rexpr, c, ind))
42694286
{
42704287
matched = true; /* column is unique */
4288+
4289+
if (bms_membership(rinfo->clause_relids) == BMS_SINGLETON)
4290+
{
4291+
MemoryContext oldMemCtx =
4292+
MemoryContextSwitchTo(root->planner_cxt);
4293+
4294+
/*
4295+
* Add filter clause into a list allowing caller to
4296+
* know if uniqueness have made not only by join
4297+
* clauses.
4298+
*/
4299+
Assert(bms_is_empty(rinfo->left_relids) ||
4300+
bms_is_empty(rinfo->right_relids));
4301+
if (extra_clauses)
4302+
exprs = lappend(exprs, rinfo);
4303+
MemoryContextSwitchTo(oldMemCtx);
4304+
}
4305+
42714306
break;
42724307
}
42734308
}
@@ -4310,7 +4345,11 @@ relation_has_unique_index_for(PlannerInfo *root, RelOptInfo *rel,
43104345

43114346
/* Matched all key columns of this index? */
43124347
if (c == ind->nkeycolumns)
4348+
{
4349+
if (extra_clauses)
4350+
*extra_clauses = exprs;
43134351
return true;
4352+
}
43144353
}
43154354

43164355
return false;

0 commit comments

Comments
 (0)