aboutsummaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2012-03-09 12:48:48 -0500
committerTom Lane <tgl@sss.pgh.pa.us>2012-03-09 12:49:25 -0500
commitb14953932dfdda7d915b9e276a09df8458efeec8 (patch)
treeb21dbb22e7dc71bed2d3ca3df1e5374f678cf6be /doc/src
parent342baf4ce61f06ad3898490dc5125579d9e6bd18 (diff)
downloadpostgresql-b14953932dfdda7d915b9e276a09df8458efeec8.tar.gz
postgresql-b14953932dfdda7d915b9e276a09df8458efeec8.zip
Revise FDW planning API, again.
Further reflection shows that a single callback isn't very workable if we desire to let FDWs generate multiple Paths, because that forces the FDW to do all work necessary to generate a valid Plan node for each Path. Instead split the former PlanForeignScan API into three steps: GetForeignRelSize, GetForeignPaths, GetForeignPlan. We had already bit the bullet of breaking the 9.1 FDW API for 9.2, so this shouldn't cause very much additional pain, and it's substantially more flexible for complex FDWs. Add an fdw_private field to RelOptInfo so that the new functions can save state there rather than possibly having to recalculate information two or three times. In addition, we'd not thought through what would be needed to allow an FDW to set up subexpressions of its choice for runtime execution. We could treat ForeignScan.fdw_private as an executable expression but that seems likely to break existing FDWs unnecessarily (in particular, it would restrict the set of node types allowable in fdw_private to those supported by expression_tree_walker). Instead, invent a separate field fdw_exprs which will receive the postprocessing appropriate for expression trees. (One field is enough since it can be a list of expressions; also, we assume the corresponding expression state tree(s) will be held within fdw_state, so we don't need to add anything to ForeignScanState.) Per review of Hanada Shigeru's pgsql_fdw patch. We may need to tweak this further as we continue to work on that patch, but to me it feels a lot closer to being right now.
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/fdwhandler.sgml230
1 files changed, 195 insertions, 35 deletions
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index dbfcbbc2b36..f7bf3d8a395 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -89,52 +89,92 @@
<para>
<programlisting>
void
-PlanForeignScan (Oid foreigntableid,
- PlannerInfo *root,
- RelOptInfo *baserel);
+GetForeignRelSize (PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid);
</programlisting>
- Create possible access paths for a scan on a foreign table. This is
- called when a query is planned.
+ Obtain relation size estimates for a foreign table. This is called
+ at the beginning of planning for a query involving a foreign table.
+ <literal>root</> is the planner's global information about the query;
+ <literal>baserel</> is the planner's information about this table; and
<literal>foreigntableid</> is the <structname>pg_class</> OID of the
- foreign table. <literal>root</> is the planner's global information
- about the query, and <literal>baserel</> is the planner's information
- about this table.
+ foreign table. (<literal>foreigntableid</> could be obtained from the
+ planner data structures, but it's passed explicitly to save effort.)
</para>
<para>
- The function must generate at least one access path (ForeignPath node)
- for a scan on the foreign table and must call <function>add_path</> to
- add the path to <literal>baserel-&gt;pathlist</>. It's recommended to
- use <function>create_foreignscan_path</> to build the ForeignPath node.
- The function may generate multiple access paths, e.g., a path which has
- valid <literal>pathkeys</> to represent a pre-sorted result. Each access
- path must contain cost estimates, and can contain any FDW-private
- information that is needed to execute the foreign scan at a later time.
- (Note that the private information must be represented in a form that
- <function>copyObject</> knows how to copy.)
+ This function should update <literal>baserel-&gt;rows</> to be the
+ expected number of rows returned by the table scan, after accounting for
+ the filtering done by the restriction quals. The initial value of
+ <literal>baserel-&gt;rows</> is just a constant default estimate, which
+ should be replaced if at all possible. The function may also choose to
+ update <literal>baserel-&gt;width</> if it can compute a better estimate
+ of the average result row width.
</para>
<para>
- The information in <literal>root</> and <literal>baserel</> can be used
- to reduce the amount of information that has to be fetched from the
- foreign table (and therefore reduce the cost estimate).
- <literal>baserel-&gt;baserestrictinfo</> is particularly interesting, as
- it contains restriction quals (<literal>WHERE</> clauses) that can be
- used to filter the rows to be fetched. (The FDW is not required to
- enforce these quals, as the finished plan will recheck them anyway.)
- <literal>baserel-&gt;reltargetlist</> can be used to determine which
- columns need to be fetched.
+ See <xref linkend="fdw-planning"> for additional information.
+ </para>
+
+ <para>
+<programlisting>
+void
+GetForeignPaths (PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid);
+</programlisting>
+
+ Create possible access paths for a scan on a foreign table.
+ This is called during query planning.
+ The parameters are the same as for <function>GetForeignRelSize</>,
+ which has already been called.
+ </para>
+
+ <para>
+ This function must generate at least one access path
+ (<structname>ForeignPath</> node) for a scan on the foreign table and
+ must call <function>add_path</> to add each such path to
+ <literal>baserel-&gt;pathlist</>. It's recommended to use
+ <function>create_foreignscan_path</> to build the
+ <structname>ForeignPath</> nodes. The function can generate multiple
+ access paths, e.g., a path which has valid <literal>pathkeys</> to
+ represent a pre-sorted result. Each access path must contain cost
+ estimates, and can contain any FDW-private information that is needed to
+ identify the specific scan method intended.
+ </para>
+
+ <para>
+ See <xref linkend="fdw-planning"> for additional information.
+ </para>
+
+ <para>
+<programlisting>
+ForeignScan *
+GetForeignPlan (PlannerInfo *root,
+ RelOptInfo *baserel,
+ Oid foreigntableid,
+ ForeignPath *best_path,
+ List *tlist,
+ List *scan_clauses);
+</programlisting>
+
+ Create a <structname>ForeignScan</> plan node from the selected foreign
+ access path. This is called at the end of query planning.
+ The parameters are as for <function>GetForeignRelSize</>, plus
+ the selected <structname>ForeignPath</> (previously produced by
+ <function>GetForeignPaths</>), the target list to be emitted by the
+ plan node, and the restriction clauses to be enforced by the plan node.
</para>
<para>
- In addition to returning cost estimates, the function should update
- <literal>baserel-&gt;rows</> to be the expected number of rows returned
- by the scan, after accounting for the filtering done by the restriction
- quals. The initial value of <literal>baserel-&gt;rows</> is just a
- constant default estimate, which should be replaced if at all possible.
- The function may also choose to update <literal>baserel-&gt;width</> if
- it can compute a better estimate of the average result row width.
+ This function must create and return a <structname>ForeignScan</> plan
+ node; it's recommended to use <function>make_foreignscan</> to build the
+ <structname>ForeignScan</> node.
+ </para>
+
+ <para>
+ See <xref linkend="fdw-planning"> for additional information.
</para>
<para>
@@ -170,7 +210,7 @@ BeginForeignScan (ForeignScanState *node,
the table to scan is accessible through the
<structname>ForeignScanState</> node (in particular, from the underlying
<structname>ForeignScan</> plan node, which contains any FDW-private
- information provided by <function>PlanForeignScan</>).
+ information provided by <function>GetForeignPlan</>).
</para>
<para>
@@ -347,6 +387,126 @@ GetForeignServerByName(const char *name, bool missing_ok);
return NULL if missing_ok is true, otherwise raise an error.
</para>
+ </sect1>
+
+ <sect1 id="fdw-planning">
+ <title>Foreign Data Wrapper Query Planning</title>
+
+ <para>
+ The FDW callback functions <function>GetForeignRelSize</>,
+ <function>GetForeignPaths</>, and <function>GetForeignPlan</> must fit
+ into the workings of the <productname>PostgreSQL</> planner. Here are
+ some notes about what they must do.
+ </para>
+
+ <para>
+ The information in <literal>root</> and <literal>baserel</> can be used
+ to reduce the amount of information that has to be fetched from the
+ foreign table (and therefore reduce the cost).
+ <literal>baserel-&gt;baserestrictinfo</> is particularly interesting, as
+ it contains restriction quals (<literal>WHERE</> clauses) that should be
+ used to filter the rows to be fetched. (The FDW itself is not required
+ to enforce these quals, as the core executor can check them instead.)
+ <literal>baserel-&gt;reltargetlist</> can be used to determine which
+ columns need to be fetched; but note that it only lists columns that
+ have to be emitted by the <structname>ForeignScan</> plan node, not
+ columns that are used in qual evaluation but not output by the query.
+ </para>
+
+ <para>
+ Various private fields are available for the FDW planning functions to
+ keep information in. Generally, whatever you store in FDW private fields
+ should be palloc'd, so that it will be reclaimed at the end of planning.
+ </para>
+
+ <para>
+ <literal>baserel-&gt;fdw_private</> is a <type>void</> pointer that is
+ available for FDW planning functions to store information relevant to
+ the particular foreign table. The core planner does not touch it except
+ to initialize it to NULL when the <literal>baserel</> node is created.
+ It is useful for passing information forward from
+ <function>GetForeignRelSize</> to <function>GetForeignPaths</> and/or
+ <function>GetForeignPaths</> to <function>GetForeignPlan</>, thereby
+ avoiding recalculation.
+ </para>
+
+ <para>
+ <function>GetForeignPaths</> can identify the meaning of different
+ access paths by storing private information in the
+ <structfield>fdw_private</> field of <structname>ForeignPath</> nodes.
+ <structfield>fdw_private</> is declared as a <type>List</> pointer, but
+ could actually contain anything since the core planner does not touch
+ it. However, best practice is to use a representation that's dumpable
+ by <function>nodeToString</>, for use with debugging support available
+ in the backend.
+ </para>
+
+ <para>
+ <function>GetForeignPlan</> can examine the <structfield>fdw_private</>
+ field of the selected <structname>ForeignPath</> node, and can generate
+ <structfield>fdw_exprs</> and <structfield>fdw_private</> lists to be
+ placed in the <structname>ForeignScan</> plan node, where they will be
+ available at execution time. Both of these lists must be
+ represented in a form that <function>copyObject</> knows how to copy.
+ The <structfield>fdw_private</> list has no other restrictions and is
+ not interpreted by the core backend in any way. The
+ <structfield>fdw_exprs</> list, if not NIL, is expected to contain
+ expression trees that are intended to be executed at runtime. These
+ trees will undergo post-processing by the planner to make them fully
+ executable.
+ </para>
+
+ <para>
+ In <function>GetForeignPlan</>, generally the passed-in targetlist can
+ be copied into the plan node as-is. The passed scan_clauses list
+ contains the same clauses as <literal>baserel-&gt;baserestrictinfo</>,
+ but may be re-ordered for better execution efficiency. In simple cases
+ the FDW can just strip <structname>RestrictInfo</> nodes from the
+ scan_clauses list (using <function>extract_actual_clauses</>) and put
+ all the clauses into the plan node's qual list, which means that all the
+ clauses will be checked by the executor at runtime. More complex FDWs
+ may be able to check some of the clauses internally, in which case those
+ clauses can be removed from the plan node's qual list so that the
+ executor doesn't waste time rechecking them.
+ </para>
+
+ <para>
+ As an example, the FDW might identify some restriction clauses of the
+ form <replaceable>foreign_variable</> <literal>=</>
+ <replaceable>sub_expression</>, which it determines can be executed on
+ the remote server given the locally-evaluated value of the
+ <replaceable>sub_expression</>. The actual identification of such a
+ clause should happen during <function>GetForeignPaths</>, since it would
+ affect the cost estimate for the path. The path's
+ <structfield>fdw_private</> field would probably include a pointer to
+ the identified clause's <structname>RestrictInfo</> node. Then
+ <function>GetForeignPlan</> would remove that clause from scan_clauses,
+ but add the <replaceable>sub_expression</> to <structfield>fdw_exprs</>
+ to ensure that it gets massaged into executable form. It would probably
+ also put control information into the plan node's
+ <structfield>fdw_private</> field to tell the execution functions what
+ to do at runtime. The query transmitted to the remote server would
+ involve something like <literal>WHERE <replaceable>foreign_variable</> =
+ $1</literal>, with the parameter value obtained at runtime from
+ evaluation of the <structfield>fdw_exprs</> expression tree.
+ </para>
+
+ <para>
+ The FDW should always construct at least one path that depends only on
+ the table's restriction clauses. In join queries, it might also choose
+ to construct path(s) that depend on join clauses, for example
+ <replaceable>foreign_variable</> <literal>=</>
+ <replaceable>local_variable</>. Such clauses will not be found in
+ <literal>baserel-&gt;baserestrictinfo</> but must be sought in the
+ relation's join lists. A path using such a clause is called a
+ <quote>parameterized path</>. It must show the other relation(s) as
+ <literal>required_outer</> and list the specific join clause(s) in
+ <literal>param_clauses</>. In <function>GetForeignPlan</>, the
+ <replaceable>local_variable</> portion of the join clause would be added
+ to <structfield>fdw_exprs</>, and then at runtime the case works the
+ same as for an ordinary restriction clause.
+ </para>
+
</sect1>
</chapter>