Skip to content

Commit 0d115dd

Browse files
committed
Extend CTE patch to support recursive UNION (ie, without ALL). The
implementation uses an in-memory hash table, so it will poop out for very large recursive results ... but the performance characteristics of a sort-based implementation would be pretty unpleasant too.
1 parent 059349b commit 0d115dd

File tree

13 files changed

+345
-74
lines changed

13 files changed

+345
-74
lines changed

doc/src/sgml/queries.sgml

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/queries.sgml,v 1.46 2008/10/04 21:56:52 tgl Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/queries.sgml,v 1.47 2008/10/07 19:27:03 tgl Exp $ -->
22

33
<chapter id="queries">
44
<title>Queries</title>
@@ -1519,7 +1519,8 @@ SELECT sum(n) FROM t;
15191519
</programlisting>
15201520

15211521
The general form of a recursive <literal>WITH</> query is always a
1522-
<firstterm>non-recursive term</>, then <literal>UNION ALL</>, then a
1522+
<firstterm>non-recursive term</>, then <literal>UNION</> (or
1523+
<literal>UNION ALL</>), then a
15231524
<firstterm>recursive term</>, where only the recursive term can contain
15241525
a reference to the query's own output. Such a query is executed as
15251526
follows:
@@ -1530,9 +1531,10 @@ SELECT sum(n) FROM t;
15301531

15311532
<step performance="required">
15321533
<para>
1533-
Evaluate the non-recursive term. Include all its output rows in the
1534-
result of the recursive query, and also place them in a temporary
1535-
<firstterm>working table</>.
1534+
Evaluate the non-recursive term. For <literal>UNION</> (but not
1535+
<literal>UNION ALL</>), discard duplicate rows. Include all remaining
1536+
rows in the result of the recursive query, and also place them in a
1537+
temporary <firstterm>working table</>.
15361538
</para>
15371539
</step>
15381540

@@ -1544,9 +1546,11 @@ SELECT sum(n) FROM t;
15441546
<step performance="required">
15451547
<para>
15461548
Evaluate the recursive term, substituting the current contents of
1547-
the working table for the recursive self-reference. Include all its
1548-
output rows in the result of the recursive query, and also place them
1549-
in a temporary <firstterm>intermediate table</>.
1549+
the working table for the recursive self-reference.
1550+
For <literal>UNION</> (but not <literal>UNION ALL</>), discard
1551+
duplicate rows and rows that duplicate any previous result row.
1552+
Include all remaining rows in the result of the recursive query, and
1553+
also place them in a temporary <firstterm>intermediate table</>.
15501554
</para>
15511555
</step>
15521556

@@ -1598,10 +1602,13 @@ GROUP BY sub_part
15981602
<para>
15991603
When working with recursive queries it is important to be sure that
16001604
the recursive part of the query will eventually return no tuples,
1601-
or else the query will loop indefinitely. A useful trick for
1602-
development purposes is to place a <literal>LIMIT</> in the parent
1603-
query. For example, this query would loop forever without the
1604-
<literal>LIMIT</>:
1605+
or else the query will loop indefinitely. Sometimes, using
1606+
<literal>UNION</> instead of <literal>UNION ALL</> can accomplish this
1607+
by discarding rows that duplicate previous output rows; this catches
1608+
cycles that would otherwise repeat. A useful trick for testing queries
1609+
when you are not certain if they might loop is to place a <literal>LIMIT</>
1610+
in the parent query. For example, this query would loop forever without
1611+
the <literal>LIMIT</>:
16051612

16061613
<programlisting>
16071614
WITH RECURSIVE t(n) AS (
@@ -1614,7 +1621,7 @@ SELECT n FROM t LIMIT 100;
16141621

16151622
This works because <productname>PostgreSQL</productname>'s implementation
16161623
evaluates only as many rows of a <literal>WITH</> query as are actually
1617-
demanded by the parent query. Using this trick in production is not
1624+
fetched by the parent query. Using this trick in production is not
16181625
recommended, because other systems might work differently.
16191626
</para>
16201627

doc/src/sgml/ref/select.sgml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
$PostgreSQL: pgsql/doc/src/sgml/ref/select.sgml,v 1.105 2008/10/04 21:56:52 tgl Exp $
2+
$PostgreSQL: pgsql/doc/src/sgml/ref/select.sgml,v 1.106 2008/10/07 19:27:04 tgl Exp $
33
PostgreSQL documentation
44
-->
55

@@ -202,10 +202,10 @@ and <replaceable class="parameter">with_query</replaceable> is:
202202
subquery to reference itself by name. Such a subquery must have
203203
the form
204204
<synopsis>
205-
<replaceable class="parameter">non_recursive_term</replaceable> UNION ALL <replaceable class="parameter">recursive_term</replaceable>
205+
<replaceable class="parameter">non_recursive_term</replaceable> UNION [ ALL ] <replaceable class="parameter">recursive_term</replaceable>
206206
</synopsis>
207207
where the recursive self-reference must appear on the right-hand
208-
side of <literal>UNION ALL</>. Only one recursive self-reference
208+
side of the <literal>UNION</>. Only one recursive self-reference
209209
is permitted per query.
210210
</para>
211211

@@ -1234,7 +1234,7 @@ SELECT distance, employee_name FROM employee_recursive;
12341234
</programlisting>
12351235

12361236
Notice the typical form of recursive queries:
1237-
an initial condition, followed by <literal>UNION ALL</literal>,
1237+
an initial condition, followed by <literal>UNION</literal>,
12381238
followed by the recursive part of the query. Be sure that the
12391239
recursive part of the query will eventually return no tuples, or
12401240
else the query will loop indefinitely. (See <xref linkend="queries-with">

src/backend/executor/nodeRecursiveunion.c

Lines changed: 150 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/executor/nodeRecursiveunion.c,v 1.1 2008/10/04 21:56:53 tgl Exp $
11+
* $PostgreSQL: pgsql/src/backend/executor/nodeRecursiveunion.c,v 1.2 2008/10/07 19:27:04 tgl Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -17,6 +17,41 @@
1717
#include "executor/execdebug.h"
1818
#include "executor/nodeRecursiveunion.h"
1919
#include "miscadmin.h"
20+
#include "utils/memutils.h"
21+
22+
23+
/*
24+
* To implement UNION (without ALL), we need a hashtable that stores tuples
25+
* already seen. The hash key is computed from the grouping columns.
26+
*/
27+
typedef struct RUHashEntryData *RUHashEntry;
28+
29+
typedef struct RUHashEntryData
30+
{
31+
TupleHashEntryData shared; /* common header for hash table entries */
32+
} RUHashEntryData;
33+
34+
35+
/*
36+
* Initialize the hash table to empty.
37+
*/
38+
static void
39+
build_hash_table(RecursiveUnionState *rustate)
40+
{
41+
RecursiveUnion *node = (RecursiveUnion *) rustate->ps.plan;
42+
43+
Assert(node->numCols > 0);
44+
Assert(node->numGroups > 0);
45+
46+
rustate->hashtable = BuildTupleHashTable(node->numCols,
47+
node->dupColIdx,
48+
rustate->eqfunctions,
49+
rustate->hashfunctions,
50+
node->numGroups,
51+
sizeof(RUHashEntryData),
52+
rustate->tableContext,
53+
rustate->tempContext);
54+
}
2055

2156

2257
/* ----------------------------------------------------------------
@@ -44,49 +79,85 @@ ExecRecursiveUnion(RecursiveUnionState *node)
4479
PlanState *innerPlan = innerPlanState(node);
4580
RecursiveUnion *plan = (RecursiveUnion *) node->ps.plan;
4681
TupleTableSlot *slot;
82+
RUHashEntry entry;
83+
bool isnew;
4784

4885
/* 1. Evaluate non-recursive term */
4986
if (!node->recursing)
5087
{
51-
slot = ExecProcNode(outerPlan);
52-
if (!TupIsNull(slot))
88+
for (;;)
5389
{
90+
slot = ExecProcNode(outerPlan);
91+
if (TupIsNull(slot))
92+
break;
93+
if (plan->numCols > 0)
94+
{
95+
/* Find or build hashtable entry for this tuple's group */
96+
entry = (RUHashEntry)
97+
LookupTupleHashEntry(node->hashtable, slot, &isnew);
98+
/* Must reset temp context after each hashtable lookup */
99+
MemoryContextReset(node->tempContext);
100+
/* Ignore tuple if already seen */
101+
if (!isnew)
102+
continue;
103+
}
104+
/* Each non-duplicate tuple goes to the working table ... */
54105
tuplestore_puttupleslot(node->working_table, slot);
106+
/* ... and to the caller */
55107
return slot;
56108
}
57109
node->recursing = true;
58110
}
59111

60-
retry:
61112
/* 2. Execute recursive term */
62-
slot = ExecProcNode(innerPlan);
63-
if (TupIsNull(slot))
113+
for (;;)
64114
{
65-
if (node->intermediate_empty)
66-
return NULL;
115+
slot = ExecProcNode(innerPlan);
116+
if (TupIsNull(slot))
117+
{
118+
/* Done if there's nothing in the intermediate table */
119+
if (node->intermediate_empty)
120+
break;
67121

68-
/* done with old working table ... */
69-
tuplestore_end(node->working_table);
122+
/* done with old working table ... */
123+
tuplestore_end(node->working_table);
70124

71-
/* intermediate table becomes working table */
72-
node->working_table = node->intermediate_table;
125+
/* intermediate table becomes working table */
126+
node->working_table = node->intermediate_table;
73127

74-
/* create new empty intermediate table */
75-
node->intermediate_table = tuplestore_begin_heap(false, false, work_mem);
76-
node->intermediate_empty = true;
128+
/* create new empty intermediate table */
129+
node->intermediate_table = tuplestore_begin_heap(false, false,
130+
work_mem);
131+
node->intermediate_empty = true;
77132

78-
/* and reset the inner plan */
79-
innerPlan->chgParam = bms_add_member(innerPlan->chgParam,
80-
plan->wtParam);
81-
goto retry;
82-
}
83-
else
84-
{
133+
/* reset the recursive term */
134+
innerPlan->chgParam = bms_add_member(innerPlan->chgParam,
135+
plan->wtParam);
136+
137+
/* and continue fetching from recursive term */
138+
continue;
139+
}
140+
141+
if (plan->numCols > 0)
142+
{
143+
/* Find or build hashtable entry for this tuple's group */
144+
entry = (RUHashEntry)
145+
LookupTupleHashEntry(node->hashtable, slot, &isnew);
146+
/* Must reset temp context after each hashtable lookup */
147+
MemoryContextReset(node->tempContext);
148+
/* Ignore tuple if already seen */
149+
if (!isnew)
150+
continue;
151+
}
152+
153+
/* Else, tuple is good; stash it in intermediate table ... */
85154
node->intermediate_empty = false;
86155
tuplestore_puttupleslot(node->intermediate_table, slot);
87-
}
156+
/* ... and return it */
157+
return slot;
158+
}
88159

89-
return slot;
160+
return NULL;
90161
}
91162

92163
/* ----------------------------------------------------------------
@@ -109,12 +180,40 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
109180
rustate->ps.plan = (Plan *) node;
110181
rustate->ps.state = estate;
111182

183+
rustate->eqfunctions = NULL;
184+
rustate->hashfunctions = NULL;
185+
rustate->hashtable = NULL;
186+
rustate->tempContext = NULL;
187+
rustate->tableContext = NULL;
188+
112189
/* initialize processing state */
113190
rustate->recursing = false;
114191
rustate->intermediate_empty = true;
115192
rustate->working_table = tuplestore_begin_heap(false, false, work_mem);
116193
rustate->intermediate_table = tuplestore_begin_heap(false, false, work_mem);
117194

195+
/*
196+
* If hashing, we need a per-tuple memory context for comparisons, and a
197+
* longer-lived context to store the hash table. The table can't just be
198+
* kept in the per-query context because we want to be able to throw it
199+
* away when rescanning.
200+
*/
201+
if (node->numCols > 0)
202+
{
203+
rustate->tempContext =
204+
AllocSetContextCreate(CurrentMemoryContext,
205+
"RecursiveUnion",
206+
ALLOCSET_DEFAULT_MINSIZE,
207+
ALLOCSET_DEFAULT_INITSIZE,
208+
ALLOCSET_DEFAULT_MAXSIZE);
209+
rustate->tableContext =
210+
AllocSetContextCreate(CurrentMemoryContext,
211+
"RecursiveUnion hash table",
212+
ALLOCSET_DEFAULT_MINSIZE,
213+
ALLOCSET_DEFAULT_INITSIZE,
214+
ALLOCSET_DEFAULT_MAXSIZE);
215+
}
216+
118217
/*
119218
* Make the state structure available to descendant WorkTableScan nodes
120219
* via the Param slot reserved for it.
@@ -154,6 +253,19 @@ ExecInitRecursiveUnion(RecursiveUnion *node, EState *estate, int eflags)
154253
outerPlanState(rustate) = ExecInitNode(outerPlan(node), estate, eflags);
155254
innerPlanState(rustate) = ExecInitNode(innerPlan(node), estate, eflags);
156255

256+
/*
257+
* If hashing, precompute fmgr lookup data for inner loop, and create
258+
* the hash table.
259+
*/
260+
if (node->numCols > 0)
261+
{
262+
execTuplesHashPrepare(node->numCols,
263+
node->dupOperators,
264+
&rustate->eqfunctions,
265+
&rustate->hashfunctions);
266+
build_hash_table(rustate);
267+
}
268+
157269
return rustate;
158270
}
159271

@@ -178,6 +290,12 @@ ExecEndRecursiveUnion(RecursiveUnionState *node)
178290
tuplestore_end(node->working_table);
179291
tuplestore_end(node->intermediate_table);
180292

293+
/* free subsidiary stuff including hashtable */
294+
if (node->tempContext)
295+
MemoryContextDelete(node->tempContext);
296+
if (node->tableContext)
297+
MemoryContextDelete(node->tableContext);
298+
181299
/*
182300
* clean out the upper tuple table
183301
*/
@@ -217,6 +335,14 @@ ExecRecursiveUnionReScan(RecursiveUnionState *node, ExprContext *exprCtxt)
217335
if (outerPlan->chgParam == NULL)
218336
ExecReScan(outerPlan, exprCtxt);
219337

338+
/* Release any hashtable storage */
339+
if (node->tableContext)
340+
MemoryContextResetAndDeleteChildren(node->tableContext);
341+
342+
/* And rebuild empty hashtable if needed */
343+
if (plan->numCols > 0)
344+
build_hash_table(node);
345+
220346
/* reset processing state */
221347
node->recursing = false;
222348
node->intermediate_empty = true;

src/backend/nodes/copyfuncs.c

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
* Portions Copyright (c) 1994, Regents of the University of California
1616
*
1717
* IDENTIFICATION
18-
* $PostgreSQL: pgsql/src/backend/nodes/copyfuncs.c,v 1.407 2008/10/06 17:39:25 tgl Exp $
18+
* $PostgreSQL: pgsql/src/backend/nodes/copyfuncs.c,v 1.408 2008/10/07 19:27:04 tgl Exp $
1919
*
2020
*-------------------------------------------------------------------------
2121
*/
@@ -193,6 +193,13 @@ _copyRecursiveUnion(RecursiveUnion *from)
193193
* copy remainder of node
194194
*/
195195
COPY_SCALAR_FIELD(wtParam);
196+
COPY_SCALAR_FIELD(numCols);
197+
if (from->numCols > 0)
198+
{
199+
COPY_POINTER_FIELD(dupColIdx, from->numCols * sizeof(AttrNumber));
200+
COPY_POINTER_FIELD(dupOperators, from->numCols * sizeof(Oid));
201+
}
202+
COPY_SCALAR_FIELD(numGroups);
196203

197204
return newnode;
198205
}

0 commit comments

Comments
 (0)