diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2016-06-21 18:38:20 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2016-06-21 18:38:20 -0400 |
commit | 8b9d323cb9810109e3e5aab1ead427cbbb7aa77e (patch) | |
tree | 522428b6759a078fba63be994ae392fbf79c34e9 /src/backend/optimizer/util/pathnode.c | |
parent | 936b62ddf247c26e8cc4fca34bd8a4c2e65c09fd (diff) | |
download | postgresql-8b9d323cb9810109e3e5aab1ead427cbbb7aa77e.tar.gz postgresql-8b9d323cb9810109e3e5aab1ead427cbbb7aa77e.zip |
Refactor planning of projection steps that don't need a Result plan node.
The original upper-planner-pathification design (commit 3fc6e2d7f5b652b4)
assumed that we could always determine during Path formation whether or not
we would need a Result plan node to perform projection of a targetlist.
That turns out not to work very well, though, because createplan.c still
has some responsibilities for choosing the specific target list associated
with sorting/grouping nodes (in particular it might choose to add resjunk
columns for sorting). We might not ever refactor that --- doing so would
push more work into Path formation, which isn't attractive --- and we
certainly won't do so for 9.6. So, while create_projection_path and
apply_projection_to_path can tell for sure what will happen if the subpath
is projection-capable, they can't tell for sure when it isn't. This is at
least a latent bug in apply_projection_to_path, which might think it can
apply a target to a non-projecting node when the node will end up computing
something different.
Also, I'd tied the creation of a ProjectionPath node to whether or not a
Result is needed, but it turns out that we sometimes need a ProjectionPath
node anyway to avoid modifying a possibly-shared subpath node. Callers had
to use create_projection_path for such cases, and we added code to them
that knew about the potential omission of a Result node and attempted to
adjust the cost estimates for that. That was uncertainly correct and
definitely ugly/unmaintainable.
To fix, have create_projection_path explicitly check whether a Result
is needed and adjust its cost estimate accordingly, though it creates
a ProjectionPath in either case. apply_projection_to_path is now mostly
just an optimized version that can avoid creating an extra Path node when
the input is known to not be shared with any other live path. (There
is one case that create_projection_path doesn't handle, which is pushing
parallel-safe expressions below a Gather node. We could make it do that
by duplicating the GatherPath, but there seems no need as yet.)
create_projection_plan still has to recheck the tlist-match condition,
which means that if the matching situation does get changed by createplan.c
then we'll have made a slightly incorrect cost estimate. But there seems
no help for that in the near term, and I doubt it occurs often enough,
let alone would change planning decisions often enough, to be worth
stressing about.
I added a "dummypp" field to ProjectionPath to track whether
create_projection_path thinks a Result is needed. This is not really
necessary as-committed because create_projection_plan doesn't look at the
flag; but it seems like a good idea to remember what we thought when
forming the cost estimate, if only for debugging purposes.
In passing, get rid of the target_parallel parameter added to
apply_projection_to_path by commit 54f5c5150. I don't think that's a good
idea because it involves callers in what should be an internal decision,
and opens us up to missing optimization opportunities if callers think they
don't need to provide a valid flag, as most don't. For the moment, this
just costs us an extra has_parallel_hazard call when planning a Gather.
If that starts to look expensive, I think a better solution would be to
teach PathTarget to carry/cache knowledge of parallel-safety of its
contents.
Diffstat (limited to 'src/backend/optimizer/util/pathnode.c')
-rw-r--r-- | src/backend/optimizer/util/pathnode.c | 94 |
1 files changed, 63 insertions, 31 deletions
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 0ff353fa11d..8fd933fd6bd 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -2168,6 +2168,7 @@ create_projection_path(PlannerInfo *root, PathTarget *target) { ProjectionPath *pathnode = makeNode(ProjectionPath); + PathTarget *oldtarget = subpath->pathtarget; pathnode->path.pathtype = T_Result; pathnode->path.parent = rel; @@ -2184,13 +2185,46 @@ create_projection_path(PlannerInfo *root, pathnode->subpath = subpath; /* - * The Result node's cost is cpu_tuple_cost per row, plus the cost of - * evaluating the tlist. There is no qual to worry about. + * We might not need a separate Result node. If the input plan node type + * can project, we can just tell it to project something else. Or, if it + * can't project but the desired target has the same expression list as + * what the input will produce anyway, we can still give it the desired + * tlist (possibly changing its ressortgroupref labels, but nothing else). + * Note: in the latter case, create_projection_plan has to recheck our + * conclusion; see comments therein. */ - pathnode->path.rows = subpath->rows; - pathnode->path.startup_cost = subpath->startup_cost + target->cost.startup; - pathnode->path.total_cost = subpath->total_cost + target->cost.startup + - (cpu_tuple_cost + target->cost.per_tuple) * subpath->rows; + if (is_projection_capable_path(subpath) || + equal(oldtarget->exprs, target->exprs)) + { + /* No separate Result node needed */ + pathnode->dummypp = true; + + /* + * Set cost of plan as subpath's cost, adjusted for tlist replacement. + */ + pathnode->path.rows = subpath->rows; + pathnode->path.startup_cost = subpath->startup_cost + + (target->cost.startup - oldtarget->cost.startup); + pathnode->path.total_cost = subpath->total_cost + + (target->cost.startup - oldtarget->cost.startup) + + (target->cost.per_tuple - oldtarget->cost.per_tuple) * subpath->rows; + } + else + { + /* We really do need the Result node */ + pathnode->dummypp = false; + + /* + * The Result node's cost is cpu_tuple_cost per row, plus the cost of + * evaluating the tlist. There is no qual to worry about. + */ + pathnode->path.rows = subpath->rows; + pathnode->path.startup_cost = subpath->startup_cost + + target->cost.startup; + pathnode->path.total_cost = subpath->total_cost + + target->cost.startup + + (cpu_tuple_cost + target->cost.per_tuple) * subpath->rows; + } return pathnode; } @@ -2199,38 +2233,37 @@ create_projection_path(PlannerInfo *root, * apply_projection_to_path * Add a projection step, or just apply the target directly to given path. * - * Most plan types include ExecProject, so we can implement a new projection - * without an extra plan node: just replace the given path's pathtarget with - * the desired one. If the given path can't project, add a ProjectionPath. + * This has the same net effect as create_projection_path(), except that if + * a separate Result plan node isn't needed, we just replace the given path's + * pathtarget with the desired one. This must be used only when the caller + * knows that the given path isn't referenced elsewhere and so can be modified + * in-place. * - * We can also short-circuit cases where the targetlist expressions are - * actually equal; this is not an uncommon case, since it may arise from - * trying to apply a PathTarget with sortgroupref labeling to a derived - * path without such labeling. + * If the input path is a GatherPath, we try to push the new target down to + * its input as well; this is a yet more invasive modification of the input + * path, which create_projection_path() can't do. * - * This requires knowing that the source path won't be referenced for other - * purposes (e.g., other possible paths), since we modify it in-place. Note - * also that we mustn't change the source path's parent link; so when it is + * Note that we mustn't change the source path's parent link; so when it is * add_path'd to "rel" things will be a bit inconsistent. So far that has * not caused any trouble. * * 'rel' is the parent relation associated with the result * 'path' is the path representing the source of data * 'target' is the PathTarget to be computed - * 'target_parallel' indicates that target expressions are all parallel-safe */ Path * apply_projection_to_path(PlannerInfo *root, RelOptInfo *rel, Path *path, - PathTarget *target, - bool target_parallel) + PathTarget *target) { QualCost oldcost; - /* Make a separate ProjectionPath if needed */ - if (!is_projection_capable_path(path) && - !equal(path->pathtarget->exprs, target->exprs)) + /* + * If given path can't project, we might need a Result node, so make a + * separate ProjectionPath. + */ + if (!is_projection_capable_path(path)) return (Path *) create_projection_path(root, rel, path, target); /* @@ -2247,10 +2280,11 @@ apply_projection_to_path(PlannerInfo *root, /* * If the path happens to be a Gather path, we'd like to arrange for the * subpath to return the required target list so that workers can help - * project. But if there is something that is not parallel-safe in the + * project. But if there is something that is not parallel-safe in the * target expressions, then we can't. */ - if (IsA(path, GatherPath) &&target_parallel) + if (IsA(path, GatherPath) && + !has_parallel_hazard((Node *) target->exprs, false)) { GatherPath *gpath = (GatherPath *) path; @@ -2258,14 +2292,12 @@ apply_projection_to_path(PlannerInfo *root, * We always use create_projection_path here, even if the subpath is * projection-capable, so as to avoid modifying the subpath in place. * It seems unlikely at present that there could be any other - * references to the subpath anyway, but better safe than sorry. - * (create_projection_plan will only insert a Result node if the - * subpath is not projection-capable, so we only include the cost of - * that node if it will actually be inserted. This is a bit grotty - * but we can improve it later if it seems important.) + * references to the subpath, but better safe than sorry. + * + * Note that we don't change the GatherPath's cost estimates; it might + * be appropriate to do so, to reflect the fact that the bulk of the + * target evaluation will happen in workers. */ - if (!is_projection_capable_path(gpath->subpath)) - gpath->path.total_cost += cpu_tuple_cost * gpath->subpath->rows; gpath->subpath = (Path *) create_projection_path(root, gpath->subpath->parent, |