Skip to content

Commit 323ae00

Browse files
committed
doc: Expand recursive query documentation
Break the section up with subsection headings. Add examples for depth- and breadth-first search ordering. For consistency with the SQL search clause, start the depth counting at 0 instead of 1 in the examples. Discussion: https://www.postgresql.org/message-id/c5603982-0088-7f14-0caa-fdcd0c837b57@2ndquadrant.com
1 parent 8fccf75 commit 323ae00

File tree

1 file changed

+142
-21
lines changed

1 file changed

+142
-21
lines changed

doc/src/sgml/queries.sgml

Lines changed: 142 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2011,6 +2011,10 @@ GROUP BY region, product;
20112011
but we'd have needed two levels of nested sub-<command>SELECT</command>s. It's a bit
20122012
easier to follow this way.
20132013
</para>
2014+
</sect2>
2015+
2016+
<sect2 id="queries-with-recursive">
2017+
<title>Recursive Queries</title>
20142018

20152019
<para>
20162020
<indexterm>
@@ -2114,6 +2118,120 @@ GROUP BY sub_part
21142118
</programlisting>
21152119
</para>
21162120

2121+
<sect3 id="queries-with-search">
2122+
<title>Search Order</title>
2123+
2124+
<para>
2125+
When computing a tree traversal using a recursive query, you might want to
2126+
order the results in either depth-first or breadth-first order. This can
2127+
be done by computing an ordering column alongside the other data columns
2128+
and using that to sort the results at the end. Note that this does not
2129+
actually control in which order the query evaluation visits the rows; that
2130+
is as always in SQL implementation-dependent. This approach merely
2131+
provides a convenient way to order the results afterwards.
2132+
</para>
2133+
2134+
<para>
2135+
To create a depth-first order, we compute for each result row an array of
2136+
rows that we have visited so far. For example, consider the following
2137+
query that searches a table <structname>tree</structname> using a
2138+
<structfield>link</structfield> field:
2139+
2140+
<programlisting>
2141+
WITH RECURSIVE search_tree(id, link, data) AS (
2142+
SELECT t.id, t.link, t.data
2143+
FROM tree t
2144+
UNION ALL
2145+
SELECT t.id, t.link, t.data
2146+
FROM tree t, search_tree st
2147+
WHERE t.id = st.link
2148+
)
2149+
SELECT * FROM search_tree;
2150+
</programlisting>
2151+
2152+
To add depth-first ordering information, you can write this:
2153+
2154+
<programlisting>
2155+
WITH RECURSIVE search_tree(id, link, data, <emphasis>path</emphasis>) AS (
2156+
SELECT t.id, t.link, t.data, <emphasis>ARRAY[t.id]</emphasis>
2157+
FROM tree t
2158+
UNION ALL
2159+
SELECT t.id, t.link, t.data, <emphasis>path || t.id</emphasis>
2160+
FROM tree t, search_tree st
2161+
WHERE t.id = st.link
2162+
)
2163+
SELECT * FROM search_tree <emphasis>ORDER BY path</emphasis>;
2164+
</programlisting>
2165+
</para>
2166+
2167+
<para>
2168+
In the general case where more than one field needs to be used to identify
2169+
a row, use an array of rows. For example, if we needed to track fields
2170+
<structfield>f1</structfield> and <structfield>f2</structfield>:
2171+
2172+
<programlisting>
2173+
WITH RECURSIVE search_tree(id, link, data, <emphasis>path</emphasis>) AS (
2174+
SELECT t.id, t.link, t.data, <emphasis>ARRAY[ROW(t.f1, t.f2)]</emphasis>
2175+
FROM tree t
2176+
UNION ALL
2177+
SELECT t.id, t.link, t.data, <emphasis>path || ROW(t.f1, t.f2)</emphasis>
2178+
FROM tree t, search_tree st
2179+
WHERE t.id = st.link
2180+
)
2181+
SELECT * FROM search_tree <emphasis>ORDER BY path</emphasis>;
2182+
</programlisting>
2183+
</para>
2184+
2185+
<note>
2186+
<para>
2187+
The queries shown in this and the following section involving
2188+
<literal>ROW</literal> constructors in the target list only support
2189+
<literal>UNION ALL</literal> (not plain <literal>UNION</literal>) in the
2190+
current implementation.
2191+
</para>
2192+
</note>
2193+
2194+
<tip>
2195+
<para>
2196+
Omit the <literal>ROW()</literal> syntax in the common case where only one
2197+
field needs to be tracked. This allows a simple array rather than a
2198+
composite-type array to be used, gaining efficiency.
2199+
</para>
2200+
</tip>
2201+
2202+
<para>
2203+
To create a breadth-first order, you can add a column that tracks the depth
2204+
of the search, for example:
2205+
2206+
<programlisting>
2207+
WITH RECURSIVE search_tree(id, link, data, <emphasis>depth</emphasis>) AS (
2208+
SELECT t.id, t.link, t.data, <emphasis>0</emphasis>
2209+
FROM tree t
2210+
UNION ALL
2211+
SELECT t.id, t.link, t.data, <emphasis>depth + 1</emphasis>
2212+
FROM tree t, search_tree st
2213+
WHERE t.id = st.link
2214+
)
2215+
SELECT * FROM search_tree <emphasis>ORDER BY depth</emphasis>;
2216+
</programlisting>
2217+
2218+
To get a stable sort, add data columns as secondary sorting columns.
2219+
</para>
2220+
2221+
<tip>
2222+
<para>
2223+
The recursive query evaluation algorithm produces its output in
2224+
breadth-first search order. However, this is an implementation detail and
2225+
it is perhaps unsound to rely on it. The order of the rows within each
2226+
level is certainly undefined, so some explicit ordering might be desired
2227+
in any case.
2228+
</para>
2229+
</tip>
2230+
</sect3>
2231+
2232+
<sect3 id="queries-with-cycle">
2233+
<title>Cycle Detection</title>
2234+
21172235
<para>
21182236
When working with recursive queries it is important to be sure that
21192237
the recursive part of the query will eventually return no tuples,
@@ -2123,13 +2241,13 @@ GROUP BY sub_part
21232241
cycle does not involve output rows that are completely duplicate: it may be
21242242
necessary to check just one or a few fields to see if the same point has
21252243
been reached before. The standard method for handling such situations is
2126-
to compute an array of the already-visited values. For example, consider
2244+
to compute an array of the already-visited values. For example, consider again
21272245
the following query that searches a table <structname>graph</structname> using a
21282246
<structfield>link</structfield> field:
21292247

21302248
<programlisting>
21312249
WITH RECURSIVE search_graph(id, link, data, depth) AS (
2132-
SELECT g.id, g.link, g.data, 1
2250+
SELECT g.id, g.link, g.data, 0
21332251
FROM graph g
21342252
UNION ALL
21352253
SELECT g.id, g.link, g.data, sg.depth + 1
@@ -2147,17 +2265,17 @@ SELECT * FROM search_graph;
21472265
<structfield>is_cycle</structfield> and <structfield>path</structfield> to the loop-prone query:
21482266

21492267
<programlisting>
2150-
WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS (
2151-
SELECT g.id, g.link, g.data, 1,
2152-
false,
2153-
ARRAY[g.id]
2268+
WITH RECURSIVE search_graph(id, link, data, depth, <emphasis>is_cycle, path</emphasis>) AS (
2269+
SELECT g.id, g.link, g.data, 0,
2270+
<emphasis>false,
2271+
ARRAY[g.id]</emphasis>
21542272
FROM graph g
21552273
UNION ALL
21562274
SELECT g.id, g.link, g.data, sg.depth + 1,
2157-
g.id = ANY(path),
2158-
path || g.id
2275+
<emphasis>g.id = ANY(path),
2276+
path || g.id</emphasis>
21592277
FROM graph g, search_graph sg
2160-
WHERE g.id = sg.link AND NOT is_cycle
2278+
WHERE g.id = sg.link <emphasis>AND NOT is_cycle</emphasis>
21612279
)
21622280
SELECT * FROM search_graph;
21632281
</programlisting>
@@ -2172,17 +2290,17 @@ SELECT * FROM search_graph;
21722290
compare fields <structfield>f1</structfield> and <structfield>f2</structfield>:
21732291

21742292
<programlisting>
2175-
WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS (
2176-
SELECT g.id, g.link, g.data, 1,
2177-
false,
2178-
ARRAY[ROW(g.f1, g.f2)]
2293+
WITH RECURSIVE search_graph(id, link, data, depth, <emphasis>is_cycle, path</emphasis>) AS (
2294+
SELECT g.id, g.link, g.data, 0,
2295+
<emphasis>false,
2296+
ARRAY[ROW(g.f1, g.f2)]</emphasis>
21792297
FROM graph g
21802298
UNION ALL
21812299
SELECT g.id, g.link, g.data, sg.depth + 1,
2182-
ROW(g.f1, g.f2) = ANY(path),
2183-
path || ROW(g.f1, g.f2)
2300+
<emphasis>ROW(g.f1, g.f2) = ANY(path),
2301+
path || ROW(g.f1, g.f2)</emphasis>
21842302
FROM graph g, search_graph sg
2185-
WHERE g.id = sg.link AND NOT is_cycle
2303+
WHERE g.id = sg.link <emphasis>AND NOT is_cycle</emphasis>
21862304
)
21872305
SELECT * FROM search_graph;
21882306
</programlisting>
@@ -2198,10 +2316,8 @@ SELECT * FROM search_graph;
21982316

21992317
<tip>
22002318
<para>
2201-
The recursive query evaluation algorithm produces its output in
2202-
breadth-first search order. You can display the results in depth-first
2203-
search order by making the outer query <literal>ORDER BY</literal> a
2204-
<quote>path</quote> column constructed in this way.
2319+
The cycle path column is computed in the same way as the depth-first
2320+
ordering column show in the previous section.
22052321
</para>
22062322
</tip>
22072323

@@ -2217,7 +2333,7 @@ WITH RECURSIVE t(n) AS (
22172333
UNION ALL
22182334
SELECT n+1 FROM t
22192335
)
2220-
SELECT n FROM t LIMIT 100;
2336+
SELECT n FROM t <emphasis>LIMIT 100</emphasis>;
22212337
</programlisting>
22222338

22232339
This works because <productname>PostgreSQL</productname>'s implementation
@@ -2229,6 +2345,11 @@ SELECT n FROM t LIMIT 100;
22292345
outer query will usually try to fetch all of the <literal>WITH</literal> query's
22302346
output anyway.
22312347
</para>
2348+
</sect3>
2349+
</sect2>
2350+
2351+
<sect2>
2352+
<title>Common Table Expression Materialization</title>
22322353

22332354
<para>
22342355
A useful property of <literal>WITH</literal> queries is that they are

0 commit comments

Comments
 (0)