Refactor planning of projection steps that don't need a Result plan node.

The original upper-planner-pathification design (commit 3fc6e2d7f5b652b4) assumed that we could always determine during Path formation whether or not we would need a Result plan node to perform projection of a targetlist. That turns out not to work very well, though, because createplan.c still has some responsibilities for choosing the specific target list associated with sorting/grouping nodes (in particular it might choose to add resjunk columns for sorting). We might not ever refactor that --- doing so would push more work into Path formation, which isn't attractive --- and we certainly won't do so for 9.6. So, while create_projection_path and apply_projection_to_path can tell for sure what will happen if the subpath is projection-capable, they can't tell for sure when it isn't. This is at least a latent bug in apply_projection_to_path, which might think it can apply a target to a non-projecting node when the node will end up computing something different. Also, I'd tied the creation of a ProjectionPath node to whether or not a Result is needed, but it turns out that we sometimes need a ProjectionPath node anyway to avoid modifying a possibly-shared subpath node. Callers had to use create_projection_path for such cases, and we added code to them that knew about the potential omission of a Result node and attempted to adjust the cost estimates for that. That was uncertainly correct and definitely ugly/unmaintainable. To fix, have create_projection_path explicitly check whether a Result is needed and adjust its cost estimate accordingly, though it creates a ProjectionPath in either case. apply_projection_to_path is now mostly just an optimized version that can avoid creating an extra Path node when the input is known to not be shared with any other live path. (There is one case that create_projection_path doesn't handle, which is pushing parallel-safe expressions below a Gather node. We could make it do that by duplicating the GatherPath, but there seems no need as yet.) create_projection_plan still has to recheck the tlist-match condition, which means that if the matching situation does get changed by createplan.c then we'll have made a slightly incorrect cost estimate. But there seems no help for that in the near term, and I doubt it occurs often enough, let alone would change planning decisions often enough, to be worth stressing about. I added a "dummypp" field to ProjectionPath to track whether create_projection_path thinks a Result is needed. This is not really necessary as-committed because create_projection_plan doesn't look at the flag; but it seems like a good idea to remember what we thought when forming the cost estimate, if only for debugging purposes. In passing, get rid of the target_parallel parameter added to apply_projection_to_path by commit 54f5c5150. I don't think that's a good idea because it involves callers in what should be an internal decision, and opens us up to missing optimization opportunities if callers think they don't need to provide a valid flag, as most don't. For the moment, this just costs us an extra has_parallel_hazard call when planning a Gather. If that starts to look expensive, I think a better solution would be to teach PathTarget to carry/cache knowledge of parallel-safety of its contents.
author: Tom Lane <tgl@sss.pgh.pa.us> 2016-06-21 18:38:20 -0400
committer: Tom Lane <tgl@sss.pgh.pa.us> 2016-06-21 18:38:20 -0400
commit: 8b9d323cb9810109e3e5aab1ead427cbbb7aa77e (patch)
tree: 522428b6759a078fba63be994ae392fbf79c34e9 /src/backend/optimizer/plan/createplan.c
parent: 936b62ddf247c26e8cc4fca34bd8a4c2e65c09fd (diff)
download: postgresql-8b9d323cb9810109e3e5aab1ead427cbbb7aa77e.tar.gz
postgresql-8b9d323cb9810109e3e5aab1ead427cbbb7aa77e.zip
1 files changed, 21 insertions, 14 deletions
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ab8df76a6ed..b2db6e8d035 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1409,8 +1409,9 @@ create_gather_plan(PlannerInfo *root, GatherPath *best_path)
 /*
  * create_projection_plan
  *
- *	  Create a Result node to do a projection step and (recursively) plans
- *	  for its subpaths.
+ *	  Create a plan tree to do a projection step and (recursively) plans
+ *	  for its subpaths.  We may need a Result node for the projection,
+ *	  but sometimes we can just let the subplan do the work.
  */
 static Plan *
 create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
@@ -1425,31 +1426,37 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
 	tlist = build_path_tlist(root, &best_path->path);
 
 	/*
-	 * We might not really need a Result node here.  There are several ways
-	 * that this can happen.  For example, MergeAppend doesn't project, so we
-	 * would have thought that we needed a projection to attach resjunk sort
-	 * columns to its output ... but create_merge_append_plan might have added
-	 * those same resjunk sort columns to both MergeAppend and its children.
-	 * Alternatively, apply_projection_to_path might have created a projection
-	 * path as the subpath of a Gather node even though the subpath was
-	 * projection-capable.  So, if the subpath is capable of projection or the
-	 * desired tlist is the same expression-wise as the subplan's, just jam it
-	 * in there.  We'll have charged for a Result that doesn't actually appear
-	 * in the plan, but that's better than having a Result we don't need.
+	 * We might not really need a Result node here, either because the subplan
+	 * can project or because it's returning the right list of expressions
+	 * anyway.  Usually create_projection_path will have detected that and set
+	 * dummypp if we don't need a Result; but its decision can't be final,
+	 * because some createplan.c routines change the tlists of their nodes.
+	 * (An example is that create_merge_append_plan might add resjunk sort
+	 * columns to a MergeAppend.)  So we have to recheck here.  If we do
+	 * arrive at a different answer than create_projection_path did, we'll
+	 * have made slightly wrong cost estimates; but label the plan with the
+	 * cost estimates we actually used, not "corrected" ones.  (XXX this could
+	 * be cleaned up if we moved more of the sortcolumn setup logic into Path
+	 * creation, but that would add expense to creating Paths we might end up
+	 * not using.)
 	 */
 	if (is_projection_capable_path(best_path->subpath) ||
 		tlist_same_exprs(tlist, subplan->targetlist))
 	{
+		/* Don't need a separate Result, just assign tlist to subplan */
 		plan = subplan;
 		plan->targetlist = tlist;
 
-		/* Adjust cost to match what we thought during planning */
+		/* Label plan with the estimated costs we actually used */
 		plan->startup_cost = best_path->path.startup_cost;
 		plan->total_cost = best_path->path.total_cost;
+		plan->plan_rows = best_path->path.rows;
+		plan->plan_width = best_path->path.pathtarget->width;
 		/* ... but be careful not to munge subplan's parallel-aware flag */
 	}
 	else
 	{
+		/* We need a Result node */
 		plan = (Plan *) make_result(tlist, NULL, subplan);
 
 		copy_generic_path_info(plan, (Path *) best_path);
author	Tom Lane <tgl@sss.pgh.pa.us>	2016-06-21 18:38:20 -0400
committer	Tom Lane <tgl@sss.pgh.pa.us>	2016-06-21 18:38:20 -0400
commit	8b9d323cb9810109e3e5aab1ead427cbbb7aa77e (patch)
tree	522428b6759a078fba63be994ae392fbf79c34e9 /src/backend/optimizer/plan/createplan.c
parent	936b62ddf247c26e8cc4fca34bd8a4c2e65c09fd (diff)
download	postgresql-8b9d323cb9810109e3e5aab1ead427cbbb7aa77e.tar.gz postgresql-8b9d323cb9810109e3e5aab1ead427cbbb7aa77e.zip