aboutsummaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/queries.sgml163
1 files changed, 142 insertions, 21 deletions
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index bad97a75b27..f06afe2c3fb 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -2011,6 +2011,10 @@ GROUP BY region, product;
but we'd have needed two levels of nested sub-<command>SELECT</command>s. It's a bit
easier to follow this way.
</para>
+ </sect2>
+
+ <sect2 id="queries-with-recursive">
+ <title>Recursive Queries</title>
<para>
<indexterm>
@@ -2114,6 +2118,120 @@ GROUP BY sub_part
</programlisting>
</para>
+ <sect3 id="queries-with-search">
+ <title>Search Order</title>
+
+ <para>
+ When computing a tree traversal using a recursive query, you might want to
+ order the results in either depth-first or breadth-first order. This can
+ be done by computing an ordering column alongside the other data columns
+ and using that to sort the results at the end. Note that this does not
+ actually control in which order the query evaluation visits the rows; that
+ is as always in SQL implementation-dependent. This approach merely
+ provides a convenient way to order the results afterwards.
+ </para>
+
+ <para>
+ To create a depth-first order, we compute for each result row an array of
+ rows that we have visited so far. For example, consider the following
+ query that searches a table <structname>tree</structname> using a
+ <structfield>link</structfield> field:
+
+<programlisting>
+WITH RECURSIVE search_tree(id, link, data) AS (
+ SELECT t.id, t.link, t.data
+ FROM tree t
+ UNION ALL
+ SELECT t.id, t.link, t.data
+ FROM tree t, search_tree st
+ WHERE t.id = st.link
+)
+SELECT * FROM search_tree;
+</programlisting>
+
+ To add depth-first ordering information, you can write this:
+
+<programlisting>
+WITH RECURSIVE search_tree(id, link, data, <emphasis>path</emphasis>) AS (
+ SELECT t.id, t.link, t.data, <emphasis>ARRAY[t.id]</emphasis>
+ FROM tree t
+ UNION ALL
+ SELECT t.id, t.link, t.data, <emphasis>path || t.id</emphasis>
+ FROM tree t, search_tree st
+ WHERE t.id = st.link
+)
+SELECT * FROM search_tree <emphasis>ORDER BY path</emphasis>;
+</programlisting>
+ </para>
+
+ <para>
+ In the general case where more than one field needs to be used to identify
+ a row, use an array of rows. For example, if we needed to track fields
+ <structfield>f1</structfield> and <structfield>f2</structfield>:
+
+<programlisting>
+WITH RECURSIVE search_tree(id, link, data, <emphasis>path</emphasis>) AS (
+ SELECT t.id, t.link, t.data, <emphasis>ARRAY[ROW(t.f1, t.f2)]</emphasis>
+ FROM tree t
+ UNION ALL
+ SELECT t.id, t.link, t.data, <emphasis>path || ROW(t.f1, t.f2)</emphasis>
+ FROM tree t, search_tree st
+ WHERE t.id = st.link
+)
+SELECT * FROM search_tree <emphasis>ORDER BY path</emphasis>;
+</programlisting>
+ </para>
+
+ <note>
+ <para>
+ The queries shown in this and the following section involving
+ <literal>ROW</literal> constructors in the target list only support
+ <literal>UNION ALL</literal> (not plain <literal>UNION</literal>) in the
+ current implementation.
+ </para>
+ </note>
+
+ <tip>
+ <para>
+ Omit the <literal>ROW()</literal> syntax in the common case where only one
+ field needs to be tracked. This allows a simple array rather than a
+ composite-type array to be used, gaining efficiency.
+ </para>
+ </tip>
+
+ <para>
+ To create a breadth-first order, you can add a column that tracks the depth
+ of the search, for example:
+
+<programlisting>
+WITH RECURSIVE search_tree(id, link, data, <emphasis>depth</emphasis>) AS (
+ SELECT t.id, t.link, t.data, <emphasis>0</emphasis>
+ FROM tree t
+ UNION ALL
+ SELECT t.id, t.link, t.data, <emphasis>depth + 1</emphasis>
+ FROM tree t, search_tree st
+ WHERE t.id = st.link
+)
+SELECT * FROM search_tree <emphasis>ORDER BY depth</emphasis>;
+</programlisting>
+
+ To get a stable sort, add data columns as secondary sorting columns.
+ </para>
+
+ <tip>
+ <para>
+ The recursive query evaluation algorithm produces its output in
+ breadth-first search order. However, this is an implementation detail and
+ it is perhaps unsound to rely on it. The order of the rows within each
+ level is certainly undefined, so some explicit ordering might be desired
+ in any case.
+ </para>
+ </tip>
+ </sect3>
+
+ <sect3 id="queries-with-cycle">
+ <title>Cycle Detection</title>
+
<para>
When working with recursive queries it is important to be sure that
the recursive part of the query will eventually return no tuples,
@@ -2123,13 +2241,13 @@ GROUP BY sub_part
cycle does not involve output rows that are completely duplicate: it may be
necessary to check just one or a few fields to see if the same point has
been reached before. The standard method for handling such situations is
- to compute an array of the already-visited values. For example, consider
+ to compute an array of the already-visited values. For example, consider again
the following query that searches a table <structname>graph</structname> using a
<structfield>link</structfield> field:
<programlisting>
WITH RECURSIVE search_graph(id, link, data, depth) AS (
- SELECT g.id, g.link, g.data, 1
+ SELECT g.id, g.link, g.data, 0
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1
@@ -2147,17 +2265,17 @@ SELECT * FROM search_graph;
<structfield>is_cycle</structfield> and <structfield>path</structfield> to the loop-prone query:
<programlisting>
-WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS (
- SELECT g.id, g.link, g.data, 1,
- false,
- ARRAY[g.id]
+WITH RECURSIVE search_graph(id, link, data, depth, <emphasis>is_cycle, path</emphasis>) AS (
+ SELECT g.id, g.link, g.data, 0,
+ <emphasis>false,
+ ARRAY[g.id]</emphasis>
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,
- g.id = ANY(path),
- path || g.id
+ <emphasis>g.id = ANY(path),
+ path || g.id</emphasis>
FROM graph g, search_graph sg
- WHERE g.id = sg.link AND NOT is_cycle
+ WHERE g.id = sg.link <emphasis>AND NOT is_cycle</emphasis>
)
SELECT * FROM search_graph;
</programlisting>
@@ -2172,17 +2290,17 @@ SELECT * FROM search_graph;
compare fields <structfield>f1</structfield> and <structfield>f2</structfield>:
<programlisting>
-WITH RECURSIVE search_graph(id, link, data, depth, is_cycle, path) AS (
- SELECT g.id, g.link, g.data, 1,
- false,
- ARRAY[ROW(g.f1, g.f2)]
+WITH RECURSIVE search_graph(id, link, data, depth, <emphasis>is_cycle, path</emphasis>) AS (
+ SELECT g.id, g.link, g.data, 0,
+ <emphasis>false,
+ ARRAY[ROW(g.f1, g.f2)]</emphasis>
FROM graph g
UNION ALL
SELECT g.id, g.link, g.data, sg.depth + 1,
- ROW(g.f1, g.f2) = ANY(path),
- path || ROW(g.f1, g.f2)
+ <emphasis>ROW(g.f1, g.f2) = ANY(path),
+ path || ROW(g.f1, g.f2)</emphasis>
FROM graph g, search_graph sg
- WHERE g.id = sg.link AND NOT is_cycle
+ WHERE g.id = sg.link <emphasis>AND NOT is_cycle</emphasis>
)
SELECT * FROM search_graph;
</programlisting>
@@ -2198,10 +2316,8 @@ SELECT * FROM search_graph;
<tip>
<para>
- The recursive query evaluation algorithm produces its output in
- breadth-first search order. You can display the results in depth-first
- search order by making the outer query <literal>ORDER BY</literal> a
- <quote>path</quote> column constructed in this way.
+ The cycle path column is computed in the same way as the depth-first
+ ordering column show in the previous section.
</para>
</tip>
@@ -2217,7 +2333,7 @@ WITH RECURSIVE t(n) AS (
UNION ALL
SELECT n+1 FROM t
)
-SELECT n FROM t LIMIT 100;
+SELECT n FROM t <emphasis>LIMIT 100</emphasis>;
</programlisting>
This works because <productname>PostgreSQL</productname>'s implementation
@@ -2229,6 +2345,11 @@ SELECT n FROM t LIMIT 100;
outer query will usually try to fetch all of the <literal>WITH</literal> query's
output anyway.
</para>
+ </sect3>
+ </sect2>
+
+ <sect2>
+ <title>Common Table Expression Materialization</title>
<para>
A useful property of <literal>WITH</literal> queries is that they are