Move targetlist SRF handling from expression evaluation to new executor node.

Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT generate_series(1,5)) so far was done in the expression evaluation (i.e. ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code. This meant that most executor nodes performing projection, and most expression evaluation functions, had to deal with the possibility that an evaluated expression could return a set of return values. That's bad because it leads to repeated code in a lot of places. It also, and that's my (Andres's) motivation, made it a lot harder to implement a more efficient way of doing expression evaluation. To fix this, introduce a new executor node (ProjectSet) that can evaluate targetlists containing one or more SRFs. To avoid the complexity of the old way of handling nested expressions returning sets (e.g. having to pass up ExprDoneCond, and dealing with arguments to functions returning sets etc.), those SRFs can only be at the top level of the node's targetlist. The planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is only necessary in ProjectSet nodes and that SRFs are only present at the top level of the node's targetlist. If there are nested SRFs the planner creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get input from an underlying node. We also discussed and prototyped evaluating targetlist SRFs using ROWS FROM(), but that turned out to be more complicated than we'd hoped. While moving SRF evaluation to ProjectSet would allow to retain the old "least common multiple" behavior when multiple SRFs are present in one targetlist (i.e. continue returning rows until all SRFs are at the end of their input at the same time), we decided to instead only return rows till all SRFs are exhausted, returning NULL for already exhausted ones. We deemed the previous behavior to be too confusing, unexpected and actually not particularly useful. As a side effect, the previously prohibited case of multiple set returning arguments to a function, is now allowed. Not because it's particularly desirable, but because it ends up working and there seems to be no argument for adding code to prohibit it. Currently the behavior for COALESCE and CASE containing SRFs has changed, returning multiple rows from the expression, even when the SRF containing "arm" of the expression is not evaluated. That's because the SRFs are evaluated in a separate ProjectSet node. As that's quite confusing, we're likely to instead prohibit SRFs in those places. But that's still being discussed, and the code would reside in places not touched here, so that's a task for later. There's a lot of, now superfluous, code dealing with set return expressions around. But as the changes to get rid of those are verbose largely boring, it seems better for readability to keep the cleanup as a separate commit. Author: Tom Lane and Andres Freund Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
author: Andres Freund <andres@anarazel.de> 2017-01-18 12:46:50 -0800
committer: Andres Freund <andres@anarazel.de> 2017-01-18 13:40:27 -0800
commit: 69f4b9c85f168ae006929eec44fc44d569e846b9 (patch)
tree: 0a7fa27d1be6e341d14f067d4942c1c0e0964729 /src/backend/optimizer
parent: e37360d5df240443bb6e997d26d54f59146283fc (diff)
download: postgresql-69f4b9c85f168ae006929eec44fc44d569e846b9.tar.gz
postgresql-69f4b9c85f168ae006929eec44fc44d569e846b9.zip
9 files changed, 538 insertions, 128 deletions
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 19987397028..7ae2b74b2c2 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -375,6 +375,7 @@ RelOptInfo      - a relation or joined relations
   UniquePath    - remove duplicate rows (either by hashing or sorting)
   GatherPath    - collect the results of parallel workers
   ProjectionPath - a Result plan node with child (used for projection)
+  ProjectSetPath - a ProjectSet plan node applied to some sub-path
   SortPath      - a Sort plan node applied to some sub-path
   GroupPath     - a Group plan node applied to some sub-path
   UpperUniquePath - a Unique plan node applied to some sub-path
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 96ca7d3bb07..7c017fe1e4a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3051,6 +3051,10 @@ print_path(PlannerInfo *root, Path *path, int indent)
 			ptype = "Projection";
 			subpath = ((ProjectionPath *) path)->subpath;
 			break;
+		case T_ProjectSetPath:
+			ptype = "ProjectSet";
+			subpath = ((ProjectSetPath *) path)->subpath;
+			break;
 		case T_SortPath:
 			ptype = "Sort";
 			subpath = ((SortPath *) path)->subpath;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index c4ada214ed2..fae1f67b9c0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -81,6 +81,7 @@ static Plan *create_join_plan(PlannerInfo *root, JoinPath *best_path);
 static Plan *create_append_plan(PlannerInfo *root, AppendPath *best_path);
 static Plan *create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path);
 static Result *create_result_plan(PlannerInfo *root, ResultPath *best_path);
+static ProjectSet *create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path);
 static Material *create_material_plan(PlannerInfo *root, MaterialPath *best_path,
 					 int flags);
 static Plan *create_unique_plan(PlannerInfo *root, UniquePath *best_path,
@@ -264,6 +265,7 @@ static SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy, Plan *lefttree,
 		   long numGroups);
 static LockRows *make_lockrows(Plan *lefttree, List *rowMarks, int epqParam);
 static Result *make_result(List *tlist, Node *resconstantqual, Plan *subplan);
+static ProjectSet *make_project_set(List *tlist, Plan *subplan);
 static ModifyTable *make_modifytable(PlannerInfo *root,
 				 CmdType operation, bool canSetTag,
 				 Index nominalRelation,
@@ -392,6 +394,10 @@ create_plan_recurse(PlannerInfo *root, Path *best_path, int flags)
 												   (ResultPath *) best_path);
 			}
 			break;
+		case T_ProjectSet:
+			plan = (Plan *) create_project_set_plan(root,
+											   (ProjectSetPath *) best_path);
+			break;
 		case T_Material:
 			plan = (Plan *) create_material_plan(root,
 												 (MaterialPath *) best_path,
@@ -1142,6 +1148,31 @@ create_result_plan(PlannerInfo *root, ResultPath *best_path)
 }
 
 /*
+ * create_project_set_plan
+ *	  Create a ProjectSet plan for 'best_path'.
+ *
+ *	  Returns a Plan node.
+ */
+static ProjectSet *
+create_project_set_plan(PlannerInfo *root, ProjectSetPath *best_path)
+{
+	ProjectSet *plan;
+	Plan	   *subplan;
+	List	   *tlist;
+
+	/* Since we intend to project, we don't need to constrain child tlist */
+	subplan = create_plan_recurse(root, best_path->subpath, 0);
+
+	tlist = build_path_tlist(root, &best_path->path);
+
+	plan = make_project_set(tlist, subplan);
+
+	copy_generic_path_info(&plan->plan, (Path *) best_path);
+
+	return plan;
+}
+
+/*
  * create_material_plan
  *	  Create a Material plan for 'best_path' and (recursively) plans
  *	  for its subpaths.
@@ -6064,6 +6095,25 @@ make_result(List *tlist,
 }
 
 /*
+ * make_project_set
+ *	  Build a ProjectSet plan node
+ */
+static ProjectSet *
+make_project_set(List *tlist,
+				 Plan *subplan)
+{
+	ProjectSet *node = makeNode(ProjectSet);
+	Plan	   *plan = &node->plan;
+
+	plan->targetlist = tlist;
+	plan->qual = NIL;
+	plan->lefttree = subplan;
+	plan->righttree = NULL;
+
+	return node;
+}
+
+/*
  * make_modifytable
  *	  Build a ModifyTable plan node
  */
@@ -6229,6 +6279,15 @@ is_projection_capable_path(Path *path)
 			 * projection to its dummy path.
 			 */
 			return IS_DUMMY_PATH(path);
+		case T_ProjectSet:
+
+			/*
+			 * Although ProjectSet certainly projects, say "no" because we
+			 * don't want the planner to randomly replace its tlist with
+			 * something else; the SRFs have to stay at top level.  This might
+			 * get relaxed later.
+			 */
+			return false;
 		default:
 			break;
 	}
@@ -6257,6 +6316,15 @@ is_projection_capable_plan(Plan *plan)
 		case T_MergeAppend:
 		case T_RecursiveUnion:
 			return false;
+		case T_ProjectSet:
+
+			/*
+			 * Although ProjectSet certainly projects, say "no" because we
+			 * don't want the planner to randomly replace its tlist with
+			 * something else; the SRFs have to stay at top level.  This might
+			 * get relaxed later.
+			 */
+			return false;
 		default:
 			break;
 	}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 25f2c5a6147..4b5902fc3ec 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -153,6 +153,8 @@ static List *make_pathkeys_for_window(PlannerInfo *root, WindowClause *wc,
 static PathTarget *make_sort_input_target(PlannerInfo *root,
 					   PathTarget *final_target,
 					   bool *have_postponed_srfs);
+static void adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
+					  List *targets, List *targets_contain_srfs);
 
 
 /*****************************************************************************
@@ -1400,8 +1402,9 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 	int64		count_est = 0;
 	double		limit_tuples = -1.0;
 	bool		have_postponed_srfs = false;
-	double		tlist_rows;
 	PathTarget *final_target;
+	List	   *final_targets;
+	List	   *final_targets_contain_srfs;
 	RelOptInfo *current_rel;
 	RelOptInfo *final_rel;
 	ListCell   *lc;
@@ -1464,6 +1467,10 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		/* Also extract the PathTarget form of the setop result tlist */
 		final_target = current_rel->cheapest_total_path->pathtarget;
 
+		/* The setop result tlist couldn't contain any SRFs */
+		Assert(!parse->hasTargetSRFs);
+		final_targets = final_targets_contain_srfs = NIL;
+
 		/*
 		 * Can't handle FOR [KEY] UPDATE/SHARE here (parser should have
 		 * checked already, but let's make sure).
@@ -1489,8 +1496,14 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 	{
 		/* No set operations, do regular planning */
 		PathTarget *sort_input_target;
+		List	   *sort_input_targets;
+		List	   *sort_input_targets_contain_srfs;
 		PathTarget *grouping_target;
+		List	   *grouping_targets;
+		List	   *grouping_targets_contain_srfs;
 		PathTarget *scanjoin_target;
+		List	   *scanjoin_targets;
+		List	   *scanjoin_targets_contain_srfs;
 		bool		have_grouping;
 		AggClauseCosts agg_costs;
 		WindowFuncLists *wflists = NULL;
@@ -1735,8 +1748,50 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 			scanjoin_target = grouping_target;
 
 		/*
-		 * Forcibly apply scan/join target to all the Paths for the scan/join
-		 * rel.
+		 * If there are any SRFs in the targetlist, we must separate each of
+		 * these PathTargets into SRF-computing and SRF-free targets.  Replace
+		 * each of the named targets with a SRF-free version, and remember the
+		 * list of additional projection steps we need to add afterwards.
+		 */
+		if (parse->hasTargetSRFs)
+		{
+			/* final_target doesn't recompute any SRFs in sort_input_target */
+			split_pathtarget_at_srfs(root, final_target, sort_input_target,
+									 &final_targets,
+									 &final_targets_contain_srfs);
+			final_target = (PathTarget *) linitial(final_targets);
+			Assert(!linitial_int(final_targets_contain_srfs));
+			/* likewise for sort_input_target vs. grouping_target */
+			split_pathtarget_at_srfs(root, sort_input_target, grouping_target,
+									 &sort_input_targets,
+									 &sort_input_targets_contain_srfs);
+			sort_input_target = (PathTarget *) linitial(sort_input_targets);
+			Assert(!linitial_int(sort_input_targets_contain_srfs));
+			/* likewise for grouping_target vs. scanjoin_target */
+			split_pathtarget_at_srfs(root, grouping_target, scanjoin_target,
+									 &grouping_targets,
+									 &grouping_targets_contain_srfs);
+			grouping_target = (PathTarget *) linitial(grouping_targets);
+			Assert(!linitial_int(grouping_targets_contain_srfs));
+			/* scanjoin_target will not have any SRFs precomputed for it */
+			split_pathtarget_at_srfs(root, scanjoin_target, NULL,
+									 &scanjoin_targets,
+									 &scanjoin_targets_contain_srfs);
+			scanjoin_target = (PathTarget *) linitial(scanjoin_targets);
+			Assert(!linitial_int(scanjoin_targets_contain_srfs));
+		}
+		else
+		{
+			/* initialize lists, just to keep compiler quiet */
+			final_targets = final_targets_contain_srfs = NIL;
+			sort_input_targets = sort_input_targets_contain_srfs = NIL;
+			grouping_targets = grouping_targets_contain_srfs = NIL;
+			scanjoin_targets = scanjoin_targets_contain_srfs = NIL;
+		}
+
+		/*
+		 * Forcibly apply SRF-free scan/join target to all the Paths for the
+		 * scan/join rel.
 		 *
 		 * In principle we should re-run set_cheapest() here to identify the
 		 * cheapest path, but it seems unlikely that adding the same tlist
@@ -1807,6 +1862,12 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 			current_rel->partial_pathlist = NIL;
 		}
 
+		/* Now fix things up if scan/join target contains SRFs */
+		if (parse->hasTargetSRFs)
+			adjust_paths_for_srfs(root, current_rel,
+								  scanjoin_targets,
+								  scanjoin_targets_contain_srfs);
+
 		/*
 		 * Save the various upper-rel PathTargets we just computed into
 		 * root->upper_targets[].  The core code doesn't use this, but it
@@ -1831,6 +1892,11 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 												&agg_costs,
 												rollup_lists,
 												rollup_groupclauses);
+			/* Fix things up if grouping_target contains SRFs */
+			if (parse->hasTargetSRFs)
+				adjust_paths_for_srfs(root, current_rel,
+									  grouping_targets,
+									  grouping_targets_contain_srfs);
 		}
 
 		/*
@@ -1846,6 +1912,11 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 											  tlist,
 											  wflists,
 											  activeWindows);
+			/* Fix things up if sort_input_target contains SRFs */
+			if (parse->hasTargetSRFs)
+				adjust_paths_for_srfs(root, current_rel,
+									  sort_input_targets,
+									  sort_input_targets_contain_srfs);
 		}
 
 		/*
@@ -1874,40 +1945,11 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 										   final_target,
 										   have_postponed_srfs ? -1.0 :
 										   limit_tuples);
-	}
-
-	/*
-	 * If there are set-returning functions in the tlist, scale up the output
-	 * rowcounts of all surviving Paths to account for that.  Note that if any
-	 * SRFs appear in sorting or grouping columns, we'll have underestimated
-	 * the numbers of rows passing through earlier steps; but that's such a
-	 * weird usage that it doesn't seem worth greatly complicating matters to
-	 * account for it.
-	 */
-	if (parse->hasTargetSRFs)
-		tlist_rows = tlist_returns_set_rows(tlist);
-	else
-		tlist_rows = 1;
-
-	if (tlist_rows > 1)
-	{
-		foreach(lc, current_rel->pathlist)
-		{
-			Path	   *path = (Path *) lfirst(lc);
-
-			/*
-			 * We assume that execution costs of the tlist as such were
-			 * already accounted for.  However, it still seems appropriate to
-			 * charge something more for the executor's general costs of
-			 * processing the added tuples.  The cost is probably less than
-			 * cpu_tuple_cost, though, so we arbitrarily use half of that.
-			 */
-			path->total_cost += path->rows * (tlist_rows - 1) *
-				cpu_tuple_cost / 2;
-
-			path->rows *= tlist_rows;
-		}
-		/* No need to run set_cheapest; we're keeping all paths anyway. */
+		/* Fix things up if final_target contains SRFs */
+		if (parse->hasTargetSRFs)
+			adjust_paths_for_srfs(root, current_rel,
+								  final_targets,
+								  final_targets_contain_srfs);
 	}
 
 	/*
@@ -5102,6 +5144,109 @@ get_cheapest_fractional_path(RelOptInfo *rel, double tuple_fraction)
 }
 
 /*
+ * adjust_paths_for_srfs
+ *		Fix up the Paths of the given upperrel to handle tSRFs properly.
+ *
+ * The executor can only handle set-returning functions that appear at the
+ * top level of the targetlist of a ProjectSet plan node.  If we have any SRFs
+ * that are not at top level, we need to split up the evaluation into multiple
+ * plan levels in which each level satisfies this constraint.  This function
+ * modifies each Path of an upperrel that (might) compute any SRFs in its
+ * output tlist to insert appropriate projection steps.
+ *
+ * The given targets and targets_contain_srfs lists are from
+ * split_pathtarget_at_srfs().  We assume the existing Paths emit the first
+ * target in targets.
+ */
+static void
+adjust_paths_for_srfs(PlannerInfo *root, RelOptInfo *rel,
+					  List *targets, List *targets_contain_srfs)
+{
+	ListCell   *lc;
+
+	Assert(list_length(targets) == list_length(targets_contain_srfs));
+	Assert(!linitial_int(targets_contain_srfs));
+
+	/* If no SRFs appear at this plan level, nothing to do */
+	if (list_length(targets) == 1)
+		return;
+
+	/*
+	 * Stack SRF-evaluation nodes atop each path for the rel.
+	 *
+	 * In principle we should re-run set_cheapest() here to identify the
+	 * cheapest path, but it seems unlikely that adding the same tlist eval
+	 * costs to all the paths would change that, so we don't bother. Instead,
+	 * just assume that the cheapest-startup and cheapest-total paths remain
+	 * so.  (There should be no parameterized paths anymore, so we needn't
+	 * worry about updating cheapest_parameterized_paths.)
+	 */
+	foreach(lc, rel->pathlist)
+	{
+		Path	   *subpath = (Path *) lfirst(lc);
+		Path	   *newpath = subpath;
+		ListCell   *lc1,
+				   *lc2;
+
+		Assert(subpath->param_info == NULL);
+		forboth(lc1, targets, lc2, targets_contain_srfs)
+		{
+			PathTarget *thistarget = (PathTarget *) lfirst(lc1);
+			bool		contains_srfs = (bool) lfirst_int(lc2);
+
+			/* If this level doesn't contain SRFs, do regular projection */
+			if (contains_srfs)
+				newpath = (Path *) create_set_projection_path(root,
+															  rel,
+															  newpath,
+															  thistarget);
+			else
+				newpath = (Path *) apply_projection_to_path(root,
+															rel,
+															newpath,
+															thistarget);
+		}
+		lfirst(lc) = newpath;
+		if (subpath == rel->cheapest_startup_path)
+			rel->cheapest_startup_path = newpath;
+		if (subpath == rel->cheapest_total_path)
+			rel->cheapest_total_path = newpath;
+	}
+
+	/* Likewise for partial paths, if any */
+	foreach(lc, rel->partial_pathlist)
+	{
+		Path	   *subpath = (Path *) lfirst(lc);
+		Path	   *newpath = subpath;
+		ListCell   *lc1,
+				   *lc2;
+
+		Assert(subpath->param_info == NULL);
+		forboth(lc1, targets, lc2, targets_contain_srfs)
+		{
+			PathTarget *thistarget = (PathTarget *) lfirst(lc1);
+			bool		contains_srfs = (bool) lfirst_int(lc2);
+
+			/* If this level doesn't contain SRFs, do regular projection */
+			if (contains_srfs)
+				newpath = (Path *) create_set_projection_path(root,
+															  rel,
+															  newpath,
+															  thistarget);
+			else
+			{
+				/* avoid apply_projection_to_path, in case of multiple refs */
+				newpath = (Path *) create_projection_path(root,
+														  rel,
+														  newpath,
+														  thistarget);
+			}
+		}
+		lfirst(lc) = newpath;
+	}
+}
+
+/*
  * expression_planner
  *		Perform planner's transformations on a standalone expression.
  *
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 413a0d9da27..be267b9da74 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -733,6 +733,9 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
 					fix_scan_expr(root, splan->resconstantqual, rtoffset);
 			}
 			break;
+		case T_ProjectSet:
+			set_upper_references(root, plan, rtoffset);
+			break;
 		case T_ModifyTable:
 			{
 				ModifyTable *splan = (ModifyTable *) plan;
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index aad0b684ed3..9fc748973e7 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2680,6 +2680,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 							  &context);
 			break;
 
+		case T_ProjectSet:
 		case T_Hash:
 		case T_Material:
 		case T_Sort:
@@ -2687,6 +2688,7 @@ finalize_plan(PlannerInfo *root, Plan *plan, Bitmapset *valid_params,
 		case T_Gather:
 		case T_SetOp:
 		case T_Group:
+			/* no node-type-specific fields need fixing */
 			break;
 
 		default:
diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 9e122e383d8..85ffa3afc7c 100644
--- a/src/backend/optimizer/util/clauses.c
+++ b/src/backend/optimizer/util/clauses.c
@@ -99,7 +99,6 @@ static bool contain_agg_clause_walker(Node *node, void *context);
 static bool get_agg_clause_costs_walker(Node *node,
 							get_agg_clause_costs_context *context);
 static bool find_window_functions_walker(Node *node, WindowFuncLists *lists);
-static bool expression_returns_set_rows_walker(Node *node, double *count);
 static bool contain_subplans_walker(Node *node, void *context);
 static bool contain_mutable_functions_walker(Node *node, void *context);
 static bool contain_volatile_functions_walker(Node *node, void *context);
@@ -790,114 +789,37 @@ find_window_functions_walker(Node *node, WindowFuncLists *lists)
 /*
  * expression_returns_set_rows
  *	  Estimate the number of rows returned by a set-returning expression.
- *	  The result is 1 if there are no set-returning functions.
+ *	  The result is 1 if it's not a set-returning expression.
  *
- * We use the product of the rowcount estimates of all the functions in
- * the given tree (this corresponds to the behavior of ExecMakeFunctionResult
- * for nested set-returning functions).
+ * We should only examine the top-level function or operator; it used to be
+ * appropriate to recurse, but not anymore.  (Even if there are more SRFs in
+ * the function's inputs, their multipliers are accounted for separately.)
  *
  * Note: keep this in sync with expression_returns_set() in nodes/nodeFuncs.c.
  */
 double
 expression_returns_set_rows(Node *clause)
 {
-	double		result = 1;
-
-	(void) expression_returns_set_rows_walker(clause, &result);
-	return clamp_row_est(result);
-}
-
-static bool
-expression_returns_set_rows_walker(Node *node, double *count)
-{
-	if (node == NULL)
-		return false;
-	if (IsA(node, FuncExpr))
+	if (clause == NULL)
+		return 1.0;
+	if (IsA(clause, FuncExpr))
 	{
-		FuncExpr   *expr = (FuncExpr *) node;
+		FuncExpr   *expr = (FuncExpr *) clause;
 
 		if (expr->funcretset)
-			*count *= get_func_rows(expr->funcid);
+			return clamp_row_est(get_func_rows(expr->funcid));
 	}
-	if (IsA(node, OpExpr))
+	if (IsA(clause, OpExpr))
 	{
-		OpExpr	   *expr = (OpExpr *) node;
+		OpExpr	   *expr = (OpExpr *) clause;
 
 		if (expr->opretset)
 		{
 			set_opfuncid(expr);
-			*count *= get_func_rows(expr->opfuncid);
+			return clamp_row_est(get_func_rows(expr->opfuncid));
 		}
 	}
-
-	/* Avoid recursion for some cases that can't return a set */
-	if (IsA(node, Aggref))
-		return false;
-	if (IsA(node, WindowFunc))
-		return false;
-	if (IsA(node, DistinctExpr))
-		return false;
-	if (IsA(node, NullIfExpr))
-		return false;
-	if (IsA(node, ScalarArrayOpExpr))
-		return false;
-	if (IsA(node, BoolExpr))
-		return false;
-	if (IsA(node, SubLink))
-		return false;
-	if (IsA(node, SubPlan))
-		return false;
-	if (IsA(node, AlternativeSubPlan))
-		return false;
-	if (IsA(node, ArrayExpr))
-		return false;
-	if (IsA(node, RowExpr))
-		return false;
-	if (IsA(node, RowCompareExpr))
-		return false;
-	if (IsA(node, CoalesceExpr))
-		return false;
-	if (IsA(node, MinMaxExpr))
-		return false;
-	if (IsA(node, XmlExpr))
-		return false;
-
-	return expression_tree_walker(node, expression_returns_set_rows_walker,
-								  (void *) count);
-}
-
-/*
- * tlist_returns_set_rows
- *	  Estimate the number of rows returned by a set-returning targetlist.
- *	  The result is 1 if there are no set-returning functions.
- *
- * Here, the result is the largest rowcount estimate of any of the tlist's
- * expressions, not the product as you would get from naively applying
- * expression_returns_set_rows() to the whole tlist.  The behavior actually
- * implemented by ExecTargetList produces a number of rows equal to the least
- * common multiple of the expression rowcounts, so that the product would be
- * a worst-case estimate that is typically not realistic.  Taking the max as
- * we do here is a best-case estimate that might not be realistic either,
- * but it's probably closer for typical usages.  We don't try to compute the
- * actual LCM because we're working with very approximate estimates, so their
- * LCM would be unduly noisy.
- */
-double
-tlist_returns_set_rows(List *tlist)
-{
-	double		result = 1;
-	ListCell   *lc;
-
-	foreach(lc, tlist)
-	{
-		TargetEntry *tle = (TargetEntry *) lfirst(lc);
-		double		colresult;
-
-		colresult = expression_returns_set_rows((Node *) tle->expr);
-		if (result < colresult)
-			result = colresult;
-	}
-	return result;
+	return 1.0;
 }
 
 
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 3b7c56d3c7d..f440875ceb1 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2320,6 +2320,72 @@ apply_projection_to_path(PlannerInfo *root,
 }
 
 /*
+ * create_set_projection_path
+ *	  Creates a pathnode that represents performing a projection that
+ *	  includes set-returning functions.
+ *
+ * 'rel' is the parent relation associated with the result
+ * 'subpath' is the path representing the source of data
+ * 'target' is the PathTarget to be computed
+ */
+ProjectSetPath *
+create_set_projection_path(PlannerInfo *root,
+						   RelOptInfo *rel,
+						   Path *subpath,
+						   PathTarget *target)
+{
+	ProjectSetPath *pathnode = makeNode(ProjectSetPath);
+	double		tlist_rows;
+	ListCell   *lc;
+
+	pathnode->path.pathtype = T_ProjectSet;
+	pathnode->path.parent = rel;
+	pathnode->path.pathtarget = target;
+	/* For now, assume we are above any joins, so no parameterization */
+	pathnode->path.param_info = NULL;
+	pathnode->path.parallel_aware = false;
+	pathnode->path.parallel_safe = rel->consider_parallel &&
+		subpath->parallel_safe &&
+		is_parallel_safe(root, (Node *) target->exprs);
+	pathnode->path.parallel_workers = subpath->parallel_workers;
+	/* Projection does not change the sort order XXX? */
+	pathnode->path.pathkeys = subpath->pathkeys;
+
+	pathnode->subpath = subpath;
+
+	/*
+	 * Estimate number of rows produced by SRFs for each row of input; if
+	 * there's more than one in this node, use the maximum.
+	 */
+	tlist_rows = 1;
+	foreach(lc, target->exprs)
+	{
+		Node	   *node = (Node *) lfirst(lc);
+		double		itemrows;
+
+		itemrows = expression_returns_set_rows(node);
+		if (tlist_rows < itemrows)
+			tlist_rows = itemrows;
+	}
+
+	/*
+	 * In addition to the cost of evaluating the tlist, charge cpu_tuple_cost
+	 * per input row, and half of cpu_tuple_cost for each added output row.
+	 * This is slightly bizarre maybe, but it's what 9.6 did; we may revisit
+	 * this estimate later.
+	 */
+	pathnode->path.rows = subpath->rows * tlist_rows;
+	pathnode->path.startup_cost = subpath->startup_cost +
+		target->cost.startup;
+	pathnode->path.total_cost = subpath->total_cost +
+		target->cost.startup +
+		(cpu_tuple_cost + target->cost.per_tuple) * subpath->rows +
+		(pathnode->path.rows - subpath->rows) * cpu_tuple_cost / 2;
+
+	return pathnode;
+}
+
+/*
  * create_sort_path
  *	  Creates a pathnode that represents performing an explicit sort.
  *
diff --git a/src/backend/optimizer/util/tlist.c b/src/backend/optimizer/util/tlist.c
index 45205a830f1..cca5db88e2f 100644
--- a/src/backend/optimizer/util/tlist.c
+++ b/src/backend/optimizer/util/tlist.c
@@ -16,9 +16,20 @@
 
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
+#include "optimizer/cost.h"
 #include "optimizer/tlist.h"
 
 
+typedef struct
+{
+	List	   *nextlevel_tlist;
+	bool		nextlevel_contains_srfs;
+} split_pathtarget_context;
+
+static bool split_pathtarget_walker(Node *node,
+						split_pathtarget_context *context);
+
+
 /*****************************************************************************
  *		Target list creation and searching utilities
  *****************************************************************************/
@@ -759,3 +770,191 @@ apply_pathtarget_labeling_to_tlist(List *tlist, PathTarget *target)
 		i++;
 	}
 }
+
+/*
+ * split_pathtarget_at_srfs
+ *		Split given PathTarget into multiple levels to position SRFs safely
+ *
+ * The executor can only handle set-returning functions that appear at the
+ * top level of the targetlist of a ProjectSet plan node.  If we have any SRFs
+ * that are not at top level, we need to split up the evaluation into multiple
+ * plan levels in which each level satisfies this constraint.  This function
+ * creates appropriate PathTarget(s) for each level.
+ *
+ * As an example, consider the tlist expression
+ *		x + srf1(srf2(y + z))
+ * This expression should appear as-is in the top PathTarget, but below that
+ * we must have a PathTarget containing
+ *		x, srf1(srf2(y + z))
+ * and below that, another PathTarget containing
+ *		x, srf2(y + z)
+ * and below that, another PathTarget containing
+ *		x, y, z
+ * When these tlists are processed by setrefs.c, subexpressions that match
+ * output expressions of the next lower tlist will be replaced by Vars,
+ * so that what the executor gets are tlists looking like
+ *		Var1 + Var2
+ *		Var1, srf1(Var2)
+ *		Var1, srf2(Var2 + Var3)
+ *		x, y, z
+ * which satisfy the desired property.
+ *
+ * In some cases, a SRF has already been evaluated in some previous plan level
+ * and we shouldn't expand it again (that is, what we see in the target is
+ * already meant as a reference to a lower subexpression).  So, don't expand
+ * any tlist expressions that appear in input_target, if that's not NULL.
+ * In principle we might need to consider matching subexpressions to
+ * input_target, but for now it's not necessary because only ORDER BY and
+ * GROUP BY expressions are at issue and those will look the same at both
+ * plan levels.
+ *
+ * The outputs of this function are two parallel lists, one a list of
+ * PathTargets and the other an integer list of bool flags indicating
+ * whether the corresponding PathTarget contains any top-level SRFs.
+ * The lists are given in the order they'd need to be evaluated in, with
+ * the "lowest" PathTarget first.  So the last list entry is always the
+ * originally given PathTarget, and any entries before it indicate evaluation
+ * levels that must be inserted below it.  The first list entry must not
+ * contain any SRFs, since it will typically be attached to a plan node
+ * that cannot evaluate SRFs.
+ *
+ * Note: using a list for the flags may seem like overkill, since there
+ * are only a few possible patterns for which levels contain SRFs.
+ * But this representation decouples callers from that knowledge.
+ */
+void
+split_pathtarget_at_srfs(PlannerInfo *root,
+						 PathTarget *target, PathTarget *input_target,
+						 List **targets, List **targets_contain_srfs)
+{
+	/* Initialize output lists to empty; we prepend to them within loop */
+	*targets = *targets_contain_srfs = NIL;
+
+	/* Loop to consider each level of PathTarget we need */
+	for (;;)
+	{
+		bool		target_contains_srfs = false;
+		split_pathtarget_context context;
+		ListCell   *lc;
+
+		context.nextlevel_tlist = NIL;
+		context.nextlevel_contains_srfs = false;
+
+		/*
+		 * Scan the PathTarget looking for SRFs.  Top-level SRFs are handled
+		 * in this loop, ones lower down are found by split_pathtarget_walker.
+		 */
+		foreach(lc, target->exprs)
+		{
+			Node	   *node = (Node *) lfirst(lc);
+
+			/*
+			 * A tlist item that is just a reference to an expression already
+			 * computed in input_target need not be evaluated here, so just
+			 * make sure it's included in the next PathTarget.
+			 */
+			if (input_target && list_member(input_target->exprs, node))
+			{
+				context.nextlevel_tlist = lappend(context.nextlevel_tlist, node);
+				continue;
+			}
+
+			/* Else, we need to compute this expression. */
+			if (IsA(node, FuncExpr) &&
+				((FuncExpr *) node)->funcretset)
+			{
+				/* Top-level SRF: it can be evaluated here */
+				target_contains_srfs = true;
+				/* Recursively examine SRF's inputs */
+				split_pathtarget_walker((Node *) ((FuncExpr *) node)->args,
+										&context);
+			}
+			else if (IsA(node, OpExpr) &&
+					 ((OpExpr *) node)->opretset)
+			{
+				/* Same as above, but for set-returning operator */
+				target_contains_srfs = true;
+				split_pathtarget_walker((Node *) ((OpExpr *) node)->args,
+										&context);
+			}
+			else
+			{
+				/* Not a top-level SRF, so recursively examine expression */
+				split_pathtarget_walker(node, &context);
+			}
+		}
+
+		/*
+		 * Prepend current target and associated flag to output lists.
+		 */
+		*targets = lcons(target, *targets);
+		*targets_contain_srfs = lcons_int(target_contains_srfs,
+										  *targets_contain_srfs);
+
+		/*
+		 * Done if we found no SRFs anywhere in this target; the tentative
+		 * tlist we built for the next level can be discarded.
+		 */
+		if (!target_contains_srfs && !context.nextlevel_contains_srfs)
+			break;
+
+		/*
+		 * Else build the next PathTarget down, and loop back to process it.
+		 * Copy the subexpressions to make sure PathTargets don't share
+		 * substructure (might be unnecessary, but be safe); and drop any
+		 * duplicate entries in the sub-targetlist.
+		 */
+		target = create_empty_pathtarget();
+		add_new_columns_to_pathtarget(target,
+							   (List *) copyObject(context.nextlevel_tlist));
+		set_pathtarget_cost_width(root, target);
+	}
+}
+
+/* Recursively examine expressions for split_pathtarget_at_srfs */
+static bool
+split_pathtarget_walker(Node *node, split_pathtarget_context *context)
+{
+	if (node == NULL)
+		return false;
+	if (IsA(node, Var) ||
+		IsA(node, PlaceHolderVar) ||
+		IsA(node, Aggref) ||
+		IsA(node, GroupingFunc) ||
+		IsA(node, WindowFunc))
+	{
+		/*
+		 * Pass these items down to the child plan level for evaluation.
+		 *
+		 * We assume that these constructs cannot contain any SRFs (if one
+		 * does, there will be an executor failure from a misplaced SRF).
+		 */
+		context->nextlevel_tlist = lappend(context->nextlevel_tlist, node);
+
+		/* Having done that, we need not examine their sub-structure */
+		return false;
+	}
+	else if ((IsA(node, FuncExpr) &&
+			  ((FuncExpr *) node)->funcretset) ||
+			 (IsA(node, OpExpr) &&
+			  ((OpExpr *) node)->opretset))
+	{
+		/*
+		 * Pass SRFs down to the child plan level for evaluation, and mark
+		 * that it contains SRFs.  (We are not at top level of our own tlist,
+		 * else this would have been picked up by split_pathtarget_at_srfs.)
+		 */
+		context->nextlevel_tlist = lappend(context->nextlevel_tlist, node);
+		context->nextlevel_contains_srfs = true;
+
+		/* Inputs to the SRF need not be considered here, so we're done */
+		return false;
+	}
+
+	/*
+	 * Otherwise, the node is evaluatable within the current PathTarget, so
+	 * recurse to examine its inputs.
+	 */
+	return expression_tree_walker(node, split_pathtarget_walker,
+								  (void *) context);
+}
author	Andres Freund <andres@anarazel.de>	2017-01-18 12:46:50 -0800
committer	Andres Freund <andres@anarazel.de>	2017-01-18 13:40:27 -0800
commit	69f4b9c85f168ae006929eec44fc44d569e846b9 (patch)
tree	0a7fa27d1be6e341d14f067d4942c1c0e0964729 /src/backend/optimizer
parent	e37360d5df240443bb6e997d26d54f59146283fc (diff)
download	postgresql-69f4b9c85f168ae006929eec44fc44d569e846b9.tar.gz postgresql-69f4b9c85f168ae006929eec44fc44d569e846b9.zip