aboutsummaryrefslogtreecommitdiff
path: root/src/backend/optimizer/plan/planner.c
Commit message (Collapse)AuthorAge
...
* Fix problems when a plain-inheritance parent table is excluded.Tom Lane2023-10-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an UPDATE/DELETE/MERGE's target table is an old-style inheritance tree, it's possible for the parent to get excluded from the plan while some children are not. (I believe this is only possible if we can prove that a CHECK ... NO INHERIT constraint on the parent contradicts the query WHERE clause, so it's a very unusual case.) In such a case, ExecInitModifyTable mistakenly concluded that the first surviving child is the target table, leading to at least two bugs: 1. The wrong table's statement-level triggers would get fired. 2. In v16 and up, it was possible to fail with "invalid perminfoindex 0 in RTE with relid nnnn" due to the child RTE not having permissions data included in the query plan. This was hard to reproduce reliably because it did not occur unless the update triggered some non-HOT index updates. In v14 and up, this is easy to fix by defining ModifyTable.rootRelation to be the parent RTE in plain inheritance as well as partitioned cases. While the wrong-triggers bug also appears in older branches, the relevant code in both the planner and executor is quite a bit different, so it would take a good deal of effort to develop and test a suitable patch. Given the lack of field complaints about the trigger issue, I'll desist for now. (Patching v11 for this seems unwise anyway, given that it will have no more releases after next month.) Per bug #18147 from Hans Buschmann. Amit Langote and Tom Lane Discussion: https://postgr.es/m/18147-6fc796538913ee88@postgresql.org
* Fix EvalPlanQual rechecking during MERGE.Dean Rasheed2023-09-30
| | | | | | | | | | | | | | | | | | Under some circumstances, concurrent MERGE operations could lead to inconsistent results, that varied according the plan chosen. This was caused by a lack of rowmarks on the source relation, which meant that EvalPlanQual rechecking was not guaranteed to return the same source tuples when re-running the join query. Fix by ensuring that preprocess_rowmarks() sets up PlanRowMarks for all non-target relations used in MERGE, in the same way that it does for UPDATE and DELETE. Per bug #18103. Back-patch to v15, where MERGE was introduced. Dean Rasheed, reviewed by Richard Guo. Discussion: https://postgr.es/m/18103-c4386baab8e355e3%40postgresql.org
* Allow plan nodes with initPlans to be considered parallel-safe.Tom Lane2023-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the plan itself is parallel-safe, and the initPlans are too, there's no reason anymore to prevent the plan from being marked parallel-safe. That restriction (dating to commit ab77a5a45) was really a special case of the fact that we couldn't transmit subplans to parallel workers at all. We fixed that in commit 5e6d8d2bb and follow-ons, but this case never got addressed. We still forbid attaching initPlans to a Gather node that's inserted pursuant to debug_parallel_query = regress. That's because, when we hide the Gather from EXPLAIN output, we'd hide the initPlans too, causing cosmetic regression diffs. It seems inadvisable to kluge EXPLAIN to the extent required to make the output look the same, so just don't do it in that case. Along the way, this also takes care of some sloppiness about updating path costs to match when we move initplans from one place to another during createplan.c and setrefs.c. Since all the planning decisions are already made by that point, this is just cosmetic; but it seems good to keep EXPLAIN output consistent with where the initplans are. The diff in query_planner() might be worth remarking on. I found that one because after fixing things to allow parallel-safe initplans, one partition_prune test case changed plans (as shown in the patch) --- but only when debug_parallel_query was active. The reason proved to be that we only bothered to mark Result nodes as potentially parallel-safe when debug_parallel_query is on. This neglects the fact that parallel-safety may be of interest for a sub-query even though the Result itself doesn't parallelize. Discussion: https://postgr.es/m/1129530.1681317832@sss.pgh.pa.us
* Fix code indentation vioaltion introduced in commit cc32ec24fd.Amit Kapila2023-07-06
| | | | | | Per buildfarm member koel Discussion: https://postgr.es/m/CAA4eK1K2Y_iegdXRUNbsghY9+b-4cSOrxYt9T8TtwXkkdWMVJA@mail.gmail.com
* Revert the commits related to allowing page lock to conflict among parallel ↵Amit Kapila2023-07-06
| | | | | | | | | | | | | | | | | | | | group members. This commit reverts the work done by commits 3ba59ccc89 and 72e78d831a. Those commits were incorrect in asserting that we never acquire any other heavy-weight lock after acquring page lock other than relation extension lock. We can acquire a lock on catalogs while doing catalog look up after acquring page lock. This won't impact any existing feature but we need to think some other way to achieve this before parallelizing other write operations or even improving the parallelism in vacuum (like allowing multiple workers for an index). Reported-by: Jaime Casanova Author: Amit Kapila Backpatch-through: 13 Discussion: https://postgr.es/m/CAJKUy5jffnRKNvRHKQ0LynRb0RJC-o4P8Ku3x9vGAVLwDBWumQ@mail.gmail.com
* Remove redundant PARTITION BY columns from WindowClausesDavid Rowley2023-07-03
| | | | | | | | | | | | | | | | | | | Here we adjust the query planner to have it remove items from a window clause's PARTITION BY clause in cases where the pathkey for a column in the PARTITION BY clause is redundant. Doing this allows the optimization added in 9d9c02ccd to stop window aggregation early rather than going into "pass-through" mode to find tuples belonging to the next partition. Also, when we manage to remove all PARTITION BY columns, we now no longer needlessly check that the current tuple belongs to the same partition as the last tuple in nodeWindowAgg.c. If the pathkey was redundant then all tuples must contain the same value for the given redundant column, so there's no point in checking that during execution. Author: David Rowley Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvo2ji+hdxrxfXtRtsfSVw3to2o1nCO20qimw0dUGK8hcQ@mail.gmail.com
* Revert "Move PartitionPruneInfo out of plan nodes into PlannedStmt"Alvaro Herrera2023-05-04
| | | | | | | | | | | | | This reverts commit ec386948948c and its fixup 589bb816499e. This change was intended to support query planning avoiding acquisition of locks on partitions that were going to be pruned; however, the overall project took a different direction at [1] and this bit is no longer needed. Put things back the way they were as agreed in [2], to avoid unnecessary complexity. Discussion: [1] https://postgr.es/m/4191508.1674157166@sss.pgh.pa.us Discussion: [2] https://postgr.es/m/20230502175409.kcoirxczpdha26wt@alvherre.pgsql
* Fix parallel-safety marking when moving initplans to another node.Tom Lane2023-04-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our policy since commit ab77a5a45 has been that a plan node having any initplans is automatically not parallel-safe. (This could be relaxed, but not today.) clean_up_removed_plan_level neglected this, and could attach initplans to a parallel-safe child plan node without clearing the plan's parallel-safe flag. That could lead to "subplan was not initialized" errors at runtime, in case an initplan referenced another one and only the referencing one got transmitted to parallel workers. The fix in clean_up_removed_plan_level is trivial enough. materialize_finished_plan also moves initplans from one node to another, but it's okay because it already copies the source node's parallel_safe flag. The other place that does this kind of thing is standard_planner's hack to inject a top-level Gather when debug_parallel_query is active. But that's actually dead code given that we're correctly enforcing the "initplans aren't parallel safe" rule, so just replace it with an Assert that there are no initplans. Also improve some related comments. Normally we'd add a regression test case for this sort of bug. The mistake itself is already reached by existing tests, but there is accidentally no visible problem. The only known test case that creates an actual failure seems too indirect and fragile to justify keeping it as a regression test (not least because it fails to fail in v11, though the bug is clearly present there too). Per report from Justin Pryzby. Back-patch to all supported branches. Discussion: https://postgr.es/m/ZDVt6MaNWkRDO1LQ@telsasoft.com
* Rename force_parallel_mode to debug_parallel_queryDavid Rowley2023-02-15
| | | | | | | | | | | | | | | | | | | | force_parallel_mode is meant to be used to allow us to exercise the parallel query infrastructure to ensure that it's working as we expect. It seems some users think this GUC is for forcing the query planner into picking a parallel plan regardless of the costs. A quick look at the documentation would have made them realize that they were wrong, but the GUC is likely too conveniently named which, evidently, seems to often result in users expecting that it forces the planner into usefully parallelizing queries. Here we rename the GUC to something which casual users are less likely to mistakenly think is what they need to make their query run more quickly. For now, the old name can still be used. We'll revisit if the old name mapping can be removed once the buildfarm configs are all updated. Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAApHDvrsOi92_uA7PEaHZMH-S4Xv+MGhQWA+GrP8b1kjpS1HjQ@mail.gmail.com
* Invent "join domains" to replace the below_outer_join hack.Tom Lane2023-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | EquivalenceClasses are now understood as applying within a "join domain", which is a set of inner-joined relations (possibly underneath an outer join). We no longer need to treat an EC from below an outer join as a second-class citizen. I have hopes of eventually being able to treat outer-join clauses via EquivalenceClasses, by means of only applying deductions within the EC's join domain. There are still problems in the way of that, though, so for now the reconsider_outer_join_clause logic is still here. I haven't been able to get rid of RestrictInfo.is_pushed_down either, but I wonder if that could be recast using JoinDomains. I had to hack one test case in postgres_fdw.sql to make it still test what it was meant to, because postgres_fdw is inconsistent about how it deals with quals containing non-shippable expressions; see https://postgr.es/m/1691374.1671659838@sss.pgh.pa.us. That should be improved, but I don't think it's within the scope of this patch series. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
* Do assorted mop-up in the planner.Tom Lane2023-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove RestrictInfo.nullable_relids, along with a good deal of infrastructure that calculated it. One use-case for it was in join_clause_is_movable_to, but we can now replace that usage with a check to see if the clause's relids include any outer join that can null the target relation. The other use-case was in join_clause_is_movable_into, but that test can just be dropped entirely now that the clause's relids include outer joins. Furthermore, join_clause_is_movable_into should now be accurate enough that it will accept anything returned by generate_join_implied_equalities, so we can restore the Assert that was diked out in commit 95f4e59c3. Remove the outerjoin_delayed mechanism. We needed this before to prevent quals from getting evaluated below outer joins that should null some of their vars. Now that we consider varnullingrels while placing quals, that's taken care of automatically, so throw the whole thing away. Teach remove_useless_result_rtes to also remove useless FromExprs. Having done that, the delay_upper_joins flag serves no purpose any more and we can remove it, largely reverting 11086f2f2. Use constant TRUE for "dummy" clauses when throwing back outer joins. This improves on a hack I introduced in commit 6a6522529. If we have a left-join clause l.x = r.y, and a WHERE clause l.x = constant, we generate r.y = constant and then don't really have a need for the join clause. But we must throw the join clause back anyway after marking it redundant, so that the join search heuristics won't think this is a clauseless join and avoid it. That was a kluge introduced under time pressure, and after looking at it I thought of a better way: let's just introduce constant-TRUE "join clauses" instead, and get rid of them at the end. This improves the generated plans for such cases by not having to test a redundant join clause. We can also get rid of the ugly hack used to mark such clauses as redundant for selectivity estimation. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
* Make Vars be outer-join-aware.Tom Lane2023-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally we used the same Var struct to represent the value of a table column everywhere in parse and plan trees. This choice predates our support for SQL outer joins, and it's really a pretty bad idea with outer joins, because the Var's value can depend on where it is in the tree: it might go to NULL above an outer join. So expression nodes that are equal() per equalfuncs.c might not represent the same value, which is a huge correctness hazard for the planner. To improve this, decorate Var nodes with a bitmapset showing which outer joins (identified by RTE indexes) may have nulled them at the point in the parse tree where the Var appears. This allows us to trust that equal() Vars represent the same value. A certain amount of klugery is still needed to cope with cases where we re-order two outer joins, but it's possible to make it work without sacrificing that core principle. PlaceHolderVars receive similar decoration for the same reason. In the planner, we include these outer join bitmapsets into the relids that an expression is considered to depend on, and in consequence also add outer-join relids to the relids of join RelOptInfos. This allows us to correctly perceive whether an expression can be calculated above or below a particular outer join. This change affects FDWs that want to plan foreign joins. They *must* follow suit when labeling foreign joins in order to match with the core planner, but for many purposes (if postgres_fdw is any guide) they'd prefer to consider only base relations within the join. To support both requirements, redefine ForeignScan.fs_relids as base+OJ relids, and add a new field fs_base_relids that's set up by the core planner. Large though it is, this commit just does the minimum necessary to install the new mechanisms and get check-world passing again. Follow-up patches will perform some cleanup. (The README additions and comments mention some stuff that will appear in the follow-up.) Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
* Remove redundant grouping and DISTINCT columns.Tom Lane2023-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid explicitly grouping by columns that we know are redundant for sorting, for example we need group by only one of x and y in SELECT ... WHERE x = y GROUP BY x, y This comes up more often than you might think, as shown by the changes in the regression tests. It's nearly free to detect too, since we are just piggybacking on the existing logic that detects redundant pathkeys. (In some of the existing plans that change, it's visible that a sort step preceding the grouping step already didn't bother to sort by the redundant column, making the old plan a bit silly-looking.) To do this, build processed_groupClause and processed_distinctClause lists that omit any provably-redundant sort items, and consult those not the originals where relevant. This means that within the planner, one should usually consult root->processed_groupClause or root->processed_distinctClause if one wants to know which columns are to be grouped on; but to check whether grouping or distinct-ing is happening at all, check non-NIL-ness of parse->groupClause or parse->distinctClause. This is comparable to longstanding rules about handling the HAVING clause, so I don't think it'll be a huge maintenance problem. nodeAgg.c also needs minor mods, because it's now possible to generate AGG_PLAIN and AGG_SORTED Agg nodes with zero grouping columns. Patch by me; thanks to Richard Guo and David Rowley for review. Discussion: https://postgr.es/m/185315.1672179489@sss.pgh.pa.us
* Don't presort ORDER BY/DISTINCT Aggrefs with volatile functionsDavid Rowley2023-01-17
| | | | | | | | | | | | | | | | | | | | | | In 1349d2790, we gave the planner the ability to provide ORDER BY/DISTINCT Aggrefs with presorted input so that nodeAgg would not have to perform sorts during execution. That commit failed to properly consider the implications of if the Aggref had a volatile function in its ORDER BY/DISTINCT clause. As it happened, this resulted in an ERROR about the volatile function being missing from the targetlist. Here, instead of adding the volatile function to the targetlist, we just never consider an Aggref with a volatile function in its ORDER BY/DISTINCT clause when choosing which Aggrefs we should sort by. We do this as if we were to choose a plan which provided these aggregates with presorted input, then if there were many such aggregates which could all share the same sort order, then it may be surprising if they all shared the same sort sometimes and didn't at other times when some other set of aggregates were given presorted results. We can avoid this inconsistency by just never providing these volatile function aggregates with presorted input. Reported-by: Dean Rasheed Discussion: https://postgr.es/m/CAEZATCWETioXs5kY8vT6BVguY41_wD962VDk=u_Nvd7S1UXzuQ@mail.gmail.com
* Have the planner consider Incremental Sort for DISTINCTDavid Rowley2023-01-11
| | | | | | | | | | | | | | | | | Prior to this, we only considered a full sort on the cheapest input path and uniquifying any path which was already sorted in the required sort order. Here we adjust create_final_distinct_paths() so that it also adds an Incremental Sort path on any path which has presorted keys. Additionally, this adjusts the parallel distinct code so that we now consider sorting the cheapest partial path and incrementally sorting any partial paths with presorted keys. Previously we didn't consider any sorting for parallel distinct and only added a unique path atop any path which had the required pathkeys already. Author: David Rowley Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvo8Lz2H=42urBbfP65LTcEUOh288MT7DsG2_EWtW1AXHQ@mail.gmail.com
* Add additional regression tests for select_active_windowsDavid Rowley2023-01-07
| | | | | | | | | | | | | | | | | | | | | | | | During the development of 728202b63, which was aimed at reducing the number of sorts required to evaluate multiple window functions with different WindowClause definitions, the code written sorted the WindowClauses in reverse tleSortGroupRef order. There appears to be no discussion in the thread which was opened to discuss the development of this patch and no comments mentioning the fact that having the WindowClauses in reverse tleSortGroupRef order makes it more likely that the final WindowClause to be evaluated will provide presorted input to the query's DISTINCT or ORDER BY clause. The reason for this is that the tleSortGroupRef indexes are assigned for the DISTINCT and ORDER BY clauses before they are for the WindowClauses PARTITION BY and ORDER BY clauses. Putting the WindowClause with the lowest tleSortGroupRef last means that it's more likely that no additional sorting is required for the query's DISTINCT or ORDER BY clause. All we're doing here is adding some tests and a comment to help ensure that remains true and that we don't accidentally forget to consider this again should we ever rewrite that code. Author: Ankit Kumar Pandey, David Rowley Discussion: https://postgr.es/m/CAApHDvq=g2=ny59f1bvwRVvupsgPHK-KjLPBsSL25fVuGZ4idQ@mail.gmail.com
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Allow window functions to adjust their frameOptionsDavid Rowley2022-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | WindowFuncs such as row_number() don't care if it's called with ROWS UNBOUNDED PRECEDING AND CURRENT ROW or with RANGE UNBOUNDED PRECEDING AND CURRENT ROW. The latter is less efficient as the RANGE option requires that the executor check for peer rows, so using the ROW option instead would cause less overhead. Because RANGE is part of the default frame options for WindowClauses, it means WindowAgg is, by default, working much harder than it needs to for window functions where the ROWS / RANGE option has no effect on the window function's result. On a test query from the discussion thread, a performance improvement of 344% was seen by using ROWS instead of RANGE. Here we add a new support function node type to allow support functions to be called for window functions so that the most optimal version of the frame options can be set. The planner has been adjusted so that the frame options are changed only if all window functions sharing the same window clause agree on what the optimized frame options are. Here we give the ability for row_number(), rank(), dense_rank(), percent_rank(), cume_dist() and ntile() to alter their WindowClause's frameOptions. Reviewed-by: Vik Fearing, Erwin Brandstetter, Zhihong Yu Discussion: https://postgr.es/m/CAGHENJ7LBBszxS+SkWWFVnBmOT2oVsBhDMB1DFrgerCeYa_DyA@mail.gmail.com Discussion: https://postgr.es/m/CAApHDvohAKEtTXxq7Pc-ic2dKT8oZfbRKeEJP64M0B6+S88z+A@mail.gmail.com
* Add enable_presorted_aggregate GUCDavid Rowley2022-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1349d279 added query planner support to allow more efficient execution of aggregate functions which have an ORDER BY or a DISTINCT clause. Prior to that commit, the planner would only request that the lower planner produce a plan with the order required for the GROUP BY clause and it would be left up to nodeAgg.c to perform the final sort of records within each group so that the aggregate transition functions were called in the correct order. Now that the planner requests the lower planner produce a plan with the GROUP BY and the ORDER BY / DISTINCT aggregates in mind, there is the possibility that the planner chooses a plan which could be less efficient than what would have been produced before 1349d279. While developing 1349d279, I had in mind that Incremental Sort would help us in cases where an index exists only on the GROUP BY column(s). Incremental Sort would just replace the implicit tuplesorts which are being performed in nodeAgg.c. However, because the planner has the flexibility to instead choose a plan which just performs a full sort on both the GROUP BY and ORDER BY / DISTINCT aggregate columns, there is potential for the planner to make a bad choice. The costing for Incremental Sort is not perfect as it assumes an even distribution of rows to sort within each sort group. Here we add an escape hatch in the form of the enable_presorted_aggregate GUC. This will allow users to get the pre-PG16 behavior in cases where they have no other means to convince the query planner to produce a plan which only sorts on the GROUP BY column(s). Discussion: https://postgr.es/m/CAApHDvr1Sm+g9hbv4REOVuvQKeDWXcKUAhmbK5K+dfun0s9CvA@mail.gmail.com
* Remove pessimistic cost penalization from Incremental SortDavid Rowley2022-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When incremental sorts were added in v13 a 1.5x pessimism factor was added to the cost modal. Seemingly this was done because the cost modal only has an estimate of the total number of input rows and the number of presorted groups. It assumes that the input rows will be evenly distributed throughout the presorted groups. The 1.5x pessimism factor was added to slightly reduce the likelihood of incremental sorts being used in the hope to avoid performance regressions where an incremental sort plan was picked and turned out slower due to a large skew in the number of rows in the presorted groups. An additional quirk with the path generation code meant that we could consider both a sort and an incremental sort on paths with presorted keys. This meant that with the pessimism factor, it was possible that we opted to perform a sort rather than an incremental sort when the given path had presorted keys. Here we remove the 1.5x pessimism factor to allow incremental sorts to have a fairer chance at being chosen against a full sort. Previously we would generally create a sort path on the cheapest input path (if that wasn't sorted already) and incremental sort paths on any path which had presorted keys. This meant that if the cheapest input path wasn't completely sorted but happened to have presorted keys, we would create a full sort path *and* an incremental sort path on that input path. Here we change this logic so that if there are presorted keys, we only create an incremental sort path, and create sort paths only when a full sort is required. Both the removal of the cost pessimism factor and the changes made to the path generation make it more likely that incremental sorts will now be chosen. That, of course, as with teaching the planner any new tricks, means an increased likelihood that the planner will perform an incremental sort when it's not the best method. Our standard escape hatch for these cases is an enable_* GUC. enable_incremental_sort already exists for this. This came out of a report by Pavel Luzanov where he mentioned that the master branch was choosing to perform a Seq Scan -> Sort -> Group Aggregate for his query with an ORDER BY aggregate function. The v15 plan for his query performed an Index Scan -> Group Aggregate, of course, the aggregate performed the final sort internally in nodeAgg.c for the aggregate's ORDER BY. The ideal plan would have been to use the index, which provided partially sorted input then use an incremental sort to provide the aggregate with the sorted input. This was not being chosen due to the pessimism in the incremental sort cost modal, so here we remove that and rationalize the path generation so that sort and incremental sort plans don't have to needlessly compete. We assume that it's senseless to ever use a full sort on a given input path where an incremental sort can be performed. Reported-by: Pavel Luzanov Reviewed-by: Richard Guo Discussion: https://postgr.es/m/9f61ddbf-2989-1536-b31e-6459370a6baa%40postgrespro.ru
* Add subquery pullup handling for WindowClause runConditionDavid Rowley2022-12-10
| | | | | | | | | | | | | | | | | | | | | | | 9d9c02ccd added code to allow WindowAgg to take some shortcuts when a monotonic WindowFunc reached some value that it could never come back from due to the function's monotonic nature. That commit added a runCondition field to WindowClause to store the condition which, when it becomes false we can start taking shortcuts in nodeWindowAgg.c. Here we fix an issue where subquery pullups didn't properly update the runCondition to update the Vars to properly reference the new query level. Here we also add a missing call to preprocess_expression() for the WindowClause's runCondtion. The WindowFuncs in the targetlist will have had this process done, so we must also do it for the WindowFuncs in the runCondition so that they can be correctly found in the targetlist during setrefs.c Bug: #17709 Reported-by: Alexey Makhmutov Author: Richard Guo, David Rowley Discussion: https://postgr.es/m/17709-4f557160e3e8ee9a@postgresql.org Backpatch-through: 15, where 9d9c02ccd was introduced
* Rework query relation permission checkingAlvaro Herrera2022-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, information about the permissions to be checked on relations mentioned in a query is stored in their range table entries. So the executor must scan the entire range table looking for relations that need to have permissions checked. This can make the permission checking part of the executor initialization needlessly expensive when many inheritance children are present in the range range. While the permissions need not be checked on the individual child relations, the executor still must visit every range table entry to filter them out. This commit moves the permission checking information out of the range table entries into a new plan node called RTEPermissionInfo. Every top-level (inheritance "root") RTE_RELATION entry in the range table gets one and a list of those is maintained alongside the range table. This new list is initialized by the parser when initializing the range table. The rewriter can add more entries to it as rules/views are expanded. Finally, the planner combines the lists of the individual subqueries into one flat list that is passed to the executor for checking. To make it quick to find the RTEPermissionInfo entry belonging to a given relation, RangeTblEntry gets a new Index field 'perminfoindex' that stores the corresponding RTEPermissionInfo's index in the query's list of the latter. ExecutorCheckPerms_hook has gained another List * argument; the signature is now: typedef bool (*ExecutorCheckPerms_hook_type) (List *rangeTable, List *rtePermInfos, bool ereport_on_violation); The first argument is no longer used by any in-core uses of the hook, but we leave it in place because there may be other implementations that do. Implementations should likely scan the rtePermInfos list to determine which operations to allow or deny. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail.gmail.com
* Move PartitioPruneInfo out of plan nodes into PlannedStmtAlvaro Herrera2022-12-01
| | | | | | | | | | | | | | | | | | | | The planner will now add a given PartitioPruneInfo to PlannedStmt.partPruneInfos instead of directly to the Append/MergeAppend plan node. What gets set instead in the latter is an index field which points to the list element of PlannedStmt.partPruneInfos containing the PartitioPruneInfo belonging to the plan node. A later commit will make AcquireExecutorLocks() do the initial partition pruning to determine a minimal set of partitions to be locked when validating a plan tree and it will need to consult the PartitioPruneInfos referenced therein to do so. It would be better for the PartitioPruneInfos to be accessible directly than requiring a walk of the plan tree to find them, which is easier when it can be done by simply iterating over PlannedStmt.partPruneInfos. Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
* Use Limit instead of Unique to implement DISTINCT, when possibleDavid Rowley2022-10-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When all of the query's DISTINCT pathkeys have been marked as redundant due to EquivalenceClasses existing which contain constants, we can just implement the DISTINCT operation on a query by just limiting the number of returned rows to 1 instead of performing a Unique on all of the matching (duplicate) rows. This applies in cases such as: SELECT DISTINCT col,col2 FROM tab WHERE col = 1 AND col2 = 10; If there are any matching rows, then they must all be {1,10}. There's no point in fetching all of those and running a Unique operator on them to leave only a single row. Here we effectively just find the first row and then stop. We are obviously unable to apply this optimization if either the col = 1 or col2 = 10 were missing from the WHERE clause or if there were any additional columns in the SELECT clause. Such queries are probably not all that common, but detecting when we can apply this optimization amounts to checking if the distinct_pathkeys are NULL, which is very cheap indeed. Nothing is done here to check if the query already has a LIMIT clause. If it does then the plan may end up with 2 Limits nodes. There's no harm in that and it's probably not worth the complexity to unify them into a single Limit node. Author: David Rowley Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvqS0j8RUWRUSgCAXxOqnYjHUXmKwspRj4GzVfOO25ByHA@mail.gmail.com Discussion: https://postgr.es/m/MEYPR01MB7101CD5DA0A07C9DE2B74850A4239@MEYPR01MB7101.ausprd01.prod.outlook.com
* Update some comments that should've covered MERGEAlvaro Herrera2022-10-24
| | | | | | | Oversight in 7103ebb7aae8. Backpatch to 15. Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAMbWs48gnDjZXq3-b56dVpQCNUJ5hD9kdtWN4QFwKCEapspNsA@mail.gmail.com
* Rename shadowed local variablesDavid Rowley2022-10-05
| | | | | | | | | | | | In a similar effort to f01592f91, here we mostly rename shadowed local variables to remove the warnings produced when compiling with -Wshadow=compatible-local. This fixes 63 warnings and leaves just 5. Author: Justin Pryzby, David Rowley Reviewed-by: Justin Pryzby Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
* Revert "Optimize order of GROUP BY keys".Tom Lane2022-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and several follow-on fixes. The idea of making a cost-based choice of the order of the sorting columns is not fundamentally unsound, but it requires cost information and data statistics that we don't really have. For example, relying on procost to distinguish the relative costs of different sort comparators is pretty pointless so long as most such comparator functions are labeled with cost 1.0. Moreover, estimating the number of comparisons done by Quicksort requires more than just an estimate of the number of distinct values in the input: you also need some idea of the sizes of the larger groups, if you want an estimate that's good to better than a factor of three or so. That's data that's often unknown or not very reliable. Worse, to arrive at estimates of the number of calls made to the lower-order-column comparison functions, the code needs to make estimates of the numbers of distinct values of multiple columns, which are necessarily even less trustworthy than per-column stats. Even if all the inputs are perfectly reliable, the cost algorithm as-implemented cannot offer useful information about how to order sorting columns beyond the point at which the average group size is estimated to drop to 1. Close inspection of the code added by db0d67db2 shows that there are also multiple small bugs. These could have been fixed, but there's not much point if we don't trust the estimates to be accurate in-principle. Finally, the changes in cost_sort's behavior made for very large changes (often a factor of 2 or so) in the cost estimates for all sorting operations, not only those for multi-column GROUP BY. That naturally changes plan choices in many situations, and there's precious little evidence to show that the changes are for the better. Given the above doubts about whether the new estimates are really trustworthy, it's hard to summon much confidence that these changes are better on the average. Since we're hard up against the release deadline for v15, let's revert these changes for now. We can always try again later. Note: in v15, I left T_PathKeyInfo in place in nodes.h even though it's unreferenced. Removing it would be an ABI break, and it seems a bit late in the release cycle for that. Discussion: https://postgr.es/m/TYAPR01MB586665EB5FB2C3807E893941F5579@TYAPR01MB5866.jpnprd01.prod.outlook.com
* Harmonize more parameter names in bulk.Peter Geoghegan2022-09-20
| | | | | | | | | | | | | | | | Make sure that function declarations use names that exactly match the corresponding names from function definitions in optimizer, parser, utility, libpq, and "commands" code, as well as in remaining library code. Do the same for all code related to frontend programs (with the exception of pg_dump/pg_dumpall related code). Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will handle ecpg and pg_dump/pg_dumpall. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
* Fix out-dated comment in preprocess_groupclause()David Rowley2022-09-20
| | | | | | | | The comment claimed we don't consider other orders of the GROUP BY clause, but this is no longer true as of db0d67db2. Discussion: https://postgr.es/m/CAApHDvq65=9Ro+hLX1i9ugWEiNDvHrBibAO7ARcTnf38_JE+UQ@mail.gmail.com Backpatch-through: 15, where db0d67db2 was introduced.
* More -Wshadow=compatible-local warning fixesDavid Rowley2022-08-26
| | | | | | | | | | | | In a similar effort to f01592f91, here we're targetting fixing the warnings where we've deemed the shadowing variable to serve a close enough purpose to the shadowed variable just to reuse the shadowed version and not declare the shadowing variable at all. By my count, this takes the warning count from 106 down to 71. Author: Justin Pryzby Discussion: https://postgr.es/m/20220825020839.GT2342@telsasoft.com
* Further -Wshadow=compatible-local warning fixesDavid Rowley2022-08-24
| | | | | | | | | | | | | These should have been included in 421892a19 as these shadowed variable warnings can also be fixed by adjusting the scope of the shadowed variable to put the declaration for it in an inner scope. This is part of the same effort as f01592f91. By my count, this takes the warning count from 114 down to 106. Author: David Rowley and Justin Pryzby Discussion: https://postgr.es/m/CAApHDvrwLGBP%2BYw9vriayyf%3DXR4uPWP5jr6cQhP9au_kaDUhbA%40mail.gmail.com
* Improve performance of adjust_appendrel_attrs_multilevel.Tom Lane2022-08-18
| | | | | | | | | | | | | | | | | | | | | | | | The present implementations of adjust_appendrel_attrs_multilevel and its sibling adjust_child_relids_multilevel are very messy, because they work by reconstructing the relids of the child's immediate parent and then seeing if that's bms_equal to the relids of the target parent. Aside from being quite inefficient, this will not work with planned future changes to make joinrels' relid sets contain outer-join relids in addition to baserels. The whole thing can be solved at a stroke by adding explicit parent and top_parent links to child RelOptInfos, and making these functions work with RelOptInfo pointers instead of relids. Doing that is simpler for most callers, too. In my original version of this patch, I got rid of RelOptInfo.top_parent_relids on the grounds that it was now redundant. However, that adds a lot of code churn in places that otherwise would not need changing, and arguably the extra indirection needed to fetch top_parent->relids in those places costs something. So this version leaves that field in place. Discussion: https://postgr.es/m/553080.1657481916@sss.pgh.pa.us
* Fix hypothetical problem passing the wrong GROUP BY pathkeysDavid Rowley2022-08-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1349d2790 changed things to make the planner request that the query_pathkeys contain pathkeys for any ORDER BY / DISTINCT aggregates. Some code added prior to that commit in db0d67db2 made it so the order that the pathkeys appear in the group_pathkeys could be changed so that the GROUP BY could be executed in a more optimal order which minimized sort comparisons. 1349d2790 had to make sure that the pathkeys for any ORDER BY / DISTINCT aggregates remained at the end of the groupby_pathkeys and wasn't reordered, so some code was added to add_paths_to_grouping_rel() to first strip off any pathkeys belonging to ORDER BY / DISTINCT aggregates before passing to the function to optimize the order of the group_pathkeys. It seems I dropped the ball in 1349d2790 and mistakenly used the untouched PlannerInfo.group_pathkeys to pass to get_useful_group_keys_orderings() instead of the version that had the aggregate pathkeys removed. It was only the code path that was handling creating paths for partially_grouped_rel which made this mistake. In practice, we'll never have any extra pathkeys to strip off when processing partially_grouped_rel as that's only used when considering partial paths, which we never do when there are ORDER BY / DISTINCT aggregates. So this is just a hypothetical bug, not a live bug. We already have the correct pathkeys determined, so it's of no extra cost to pass the correct variable. Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20220817015755.GB26426@telsasoft.com
* Use an explicit state flag to control PlaceHolderInfo creation.Tom Lane2022-08-17
| | | | | | | | | | | | | | | | | | | Up to now, callers of find_placeholder_info() were required to pass a flag indicating if it's OK to make a new PlaceHolderInfo. That'd be fine if the callers had free choice, but they do not. Once we begin deconstruct_jointree() it's no longer OK to make more PHIs; while callers before that always want to create a PHI if it's not there already. So there's no freedom of action, only the opportunity to cause bugs by creating PHIs too late. Let's get rid of that in favor of adding a state flag PlannerInfo.placeholdersFrozen, which we can set at the point where it's no longer OK to make more PHIs. This patch also simplifies a couple of call sites that were using complicated logic to avoid calling find_placeholder_info() as much as possible. Now that that lookup is O(1) thanks to the previous commit, the extra bitmap manipulations are probably a net negative. Discussion: https://postgr.es/m/1405792.1660677844@sss.pgh.pa.us
* Avoid using list_length() to test for empty list.Tom Lane2022-08-17
| | | | | | | | | | | | | | | | | | | | | | | | The standard way to check for list emptiness is to compare the List pointer to NIL; our list code goes out of its way to ensure that that is the only representation of an empty list. (An acceptable alternative is a plain boolean test for non-null pointer, but explicit mention of NIL is usually preferable.) Various places didn't get that memo and expressed the condition with list_length(), which might not be so bad except that there were such a variety of ways to check it exactly: equal to zero, less than or equal to zero, less than one, yadda yadda. In the name of code readability, let's standardize all those spellings as "list == NIL" or "list != NIL". (There's probably some microscopic efficiency gain too, though few of these look to be at all performance-critical.) A very small number of cases were left as-is because they seemed more consistent with other adjacent list_length tests that way. Peter Smith, with bikeshedding from a number of us Discussion: https://postgr.es/m/CAHut+PtQYe+ENX5KrONMfugf0q6NHg4hR5dAhqEXEc2eefFeig@mail.gmail.com
* Improve performance of ORDER BY / DISTINCT aggregatesDavid Rowley2022-08-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ORDER BY / DISTINCT aggreagtes have, since implemented in Postgres, been executed by always performing a sort in nodeAgg.c to sort the tuples in the current group into the correct order before calling the transition function on the sorted tuples. This was not great as often there might be an index that could have provided pre-sorted input and allowed the transition functions to be called as the rows come in, rather than having to store them in a tuplestore in order to sort them once all the tuples for the group have arrived. Here we change the planner so it requests a path with a sort order which supports the most amount of ORDER BY / DISTINCT aggregate functions and add new code to the executor to allow it to support the processing of ORDER BY / DISTINCT aggregates where the tuples are already sorted in the correct order. Since there can be many ORDER BY / DISTINCT aggregates in any given query level, it's very possible that we can't find an order that suits all of these aggregates. The sort order that the planner chooses is simply the one that suits the most aggregate functions. We take the most strictly sorted variation of each order and see how many aggregate functions can use that, then we try again with the order of the remaining aggregates to see if another order would suit more aggregate functions. For example: SELECT agg(a ORDER BY a),agg2(a ORDER BY a,b) ... would request the sort order to be {a, b} because {a} is a subset of the sort order of {a,b}, but; SELECT agg(a ORDER BY a),agg2(a ORDER BY c) ... would just pick a plan ordered by {a} (we give precedence to aggregates which are earlier in the targetlist). SELECT agg(a ORDER BY a),agg2(a ORDER BY b),agg3(a ORDER BY b) ... would choose to order by {b} since two aggregates suit that vs just one that requires input ordered by {a}. Author: David Rowley Reviewed-by: Ronan Dunklau, James Coleman, Ranier Vilela, Richard Guo, Tom Lane Discussion: https://postgr.es/m/CAApHDvpHzfo92%3DR4W0%2BxVua3BUYCKMckWAmo-2t_KiXN-wYH%3Dw%40mail.gmail.com
* Remove no-longer-used parameter for create_groupingsets_path().Tom Lane2022-07-01
| | | | | | | | numGroups is unused since commit b5635948a; let's get rid of it. XueJing Zhao, reviewed by Richard Guo Discussion: https://postgr.es/m/DM6PR05MB64923CC8B63A2CAF3B2E5D47B7AD9@DM6PR05MB6492.namprd05.prod.outlook.com
* Pre-beta mechanical code beautification.Tom Lane2022-05-12
| | | | | Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.
* Remove extraneous blank lines before block-closing bracesAlvaro Herrera2022-04-13
| | | | | | | | | These are useless and distracting. We wouldn't have written the code with them to begin with, so there's no reason to keep them. Author: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20220411020336.GB26620@telsasoft.com Discussion: https://postgr.es/m/attachment/133167/0016-Extraneous-blank-lines.patch
* Teach planner and executor about monotonic window funcsDavid Rowley2022-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Window functions such as row_number() always return a value higher than the previously returned value for tuples in any given window partition. Traditionally queries such as; SELECT * FROM ( SELECT *, row_number() over (order by c) rn FROM t ) t WHERE rn <= 10; were executed fairly inefficiently. Neither the query planner nor the executor knew that once rn made it to 11 that nothing further would match the outer query's WHERE clause. It would blindly continue until all tuples were exhausted from the subquery. Here we implement means to make the above execute more efficiently. This is done by way of adding a pg_proc.prosupport function to various of the built-in window functions and adding supporting code to allow the support function to inform the planner if the window function is monotonically increasing, monotonically decreasing, both or neither. The planner is then able to make use of that information and possibly allow the executor to short-circuit execution by way of adding a "run condition" to the WindowAgg to allow it to determine if some of its execution work can be skipped. This "run condition" is not like a normal filter. These run conditions are only built using quals comparing values to monotonic window functions. For monotonic increasing functions, quals making use of the btree operators for <, <= and = can be used (assuming the window function column is on the left). You can see here that once such a condition becomes false that a monotonic increasing function could never make it subsequently true again. For monotonically decreasing functions the >, >= and = btree operators for the given type can be used for run conditions. The best-case situation for this is when there is a single WindowAgg node without a PARTITION BY clause. Here when the run condition becomes false the WindowAgg node can simply return NULL. No more tuples will ever match the run condition. It's a little more complex when there is a PARTITION BY clause. In this case, we cannot return NULL as we must still process other partitions. To speed this case up we pull tuples from the outer plan to check if they're from the same partition and simply discard them if they are. When we find a tuple belonging to another partition we start processing as normal again until the run condition becomes false or we run out of tuples to process. When there are multiple WindowAgg nodes to evaluate then this complicates the situation. For intermediate WindowAggs we must ensure we always return all tuples to the calling node. Any filtering done could lead to incorrect results in WindowAgg nodes above. For all intermediate nodes, we can still save some work when the run condition becomes false. We've no need to evaluate the WindowFuncs anymore. Other WindowAgg nodes cannot reference the value of these and these tuples will not appear in the final result anyway. The savings here are small in comparison to what can be saved in the top-level WingowAgg, but still worthwhile. Intermediate WindowAgg nodes never filter out tuples, but here we change WindowAgg so that the top-level WindowAgg filters out tuples that don't match the intermediate WindowAgg node's run condition. Such filters appear in the "Filter" clause in EXPLAIN for the top-level WindowAgg node. Here we add prosupport functions to allow the above to work for; row_number(), rank(), dense_rank(), count(*) and count(expr). It appears technically possible to do the same for min() and max(), however, it seems unlikely to be useful enough, so that's not done here. Bump catversion Author: David Rowley Reviewed-by: Andy Fan, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqvp3At8++yF8ij06sdcoo1S_b2YoaT9D4Nf+MObzsrLQ@mail.gmail.com
* Optimize order of GROUP BY keysTomas Vondra2022-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When evaluating a query with a multi-column GROUP BY clause using sort, the cost may be heavily dependent on the order in which the keys are compared when building the groups. Grouping does not imply any ordering, so we're allowed to compare the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg, we simply compared keys in the order as specified in the query. This commit explores alternative ordering of the keys, trying to find a cheaper one. In principle, we might generate grouping paths for all permutations of the keys, and leave the rest to the optimizer. But that might get very expensive, so we try to pick only a couple interesting orderings based on both local and global information. When planning the grouping path, we explore statistics (number of distinct values, cost of the comparison function) for the keys and reorder them to minimize comparison costs. Intuitively, it may be better to perform more expensive comparisons (for complex data types etc.) last, because maybe the cheaper comparisons will be enough. Similarly, the higher the cardinality of a key, the lower the probability we’ll need to compare more keys. The patch generates and costs various orderings, picking the cheapest ones. The ordering of group keys may interact with other parts of the query, some of which may not be known while planning the grouping. E.g. there may be an explicit ORDER BY clause, or some other ordering-dependent operation, higher up in the query, and using the same ordering may allow using either incremental sort or even eliminate the sort entirely. The patch generates orderings and picks those minimizing the comparison cost (for various pathkeys), and then adds orderings that might be useful for operations higher up in the plan (ORDER BY, etc.). Finally, it always keeps the ordering specified in the query, on the assumption the user might have additional insights. This introduces a new GUC enable_group_by_reordering, so that the optimization may be disabled if needed. The original patch was proposed by Teodor Sigaev, and later improved and reworked by Dmitry Dolgov. Reviews by a number of people, including me, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed and Zhihong Yu. Author: Dmitry Dolgov, Teodor Sigaev, Tomas Vondra Reviewed-by: Tomas Vondra, Andrey Lepikhov, Claudio Freire, Ibrar Ahmed, Zhihong Yu Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru Discussion: https://postgr.es/m/CA%2Bq6zcW_4o2NC0zutLkOJPsFt80megSpX_dVRo6GK9PC-Jx_Ag%40mail.gmail.com
* Add support for MERGE SQL commandAlvaro Herrera2022-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise require multiple PL statements. For example, MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular tables, partitioned tables and inheritance hierarchies, including column and row security enforcement, as well as support for row and statement triggers and transition tables therein. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used from PL/pgSQL. MERGE does not support targetting updatable views or foreign tables, and RETURNING clauses are not allowed either. These limitations are likely fixable with sufficient effort. Rewrite rules are also not supported, but it's not clear that we'd want to support them. Author: Pavan Deolasee <pavan.deolasee@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Amit Langote <amitlangote09@gmail.com> Author: Simon Riggs <simon.riggs@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Fix planner error with pulling up subquery expressions into function RTEs.Tom Lane2021-10-14
| | | | | | | | | | | | | | | | If a function-in-FROM laterally references the output of some sub-SELECT earlier in the FROM clause, and we are able to flatten that sub-SELECT into the outer query, the expression(s) copied into the function RTE missed being processed by eval_const_expressions. This'd lead to trouble and probable crashes at execution if such expressions contained named-argument function call syntax or functions with defaulted arguments. The bug is masked if the query contains any explicit JOIN syntax, which may help explain why we'd not noticed. Per bug #17227 from Bernd Dorn. This is an oversight in commit 7266d0997, so back-patch to v13 where that came in. Discussion: https://postgr.es/m/17227-5a28ed1512189fa4@postgresql.org
* Allow parallel DISTINCTDavid Rowley2021-08-22
| | | | | | | | | | | | | | | | | We've supported parallel aggregation since e06a38965. At the time, we didn't quite get around to also adding parallel DISTINCT. So, let's do that now. This is implemented by introducing a two-phase DISTINCT. Phase 1 is performed on parallel workers, rows are made distinct there either by hashing or by sort/unique. The results from the parallel workers are combined and the final distinct phase is performed serially to get rid of any duplicate rows that appear due to combining rows for each of the parallel workers. Author: David Rowley Reviewed-by: Zhihong Yu Discussion: https://postgr.es/m/CAApHDvrjRxVKwQN0he79xS+9wyotFXL=RmoWqGGO2N45Farpgw@mail.gmail.com
* Track a Bitmapset of non-pruned partitions in RelOptInfoDavid Rowley2021-08-03
| | | | | | | | | | | | | | | | | | | | For partitioned tables with large numbers of partitions where queries are able to prune all but a very small number of partitions, the time spent in the planner looping over RelOptInfo.part_rels checking for non-NULL RelOptInfos could become a large portion of the overall planning time. Here we add a Bitmapset that records the non-pruned partitions. This allows us to more efficiently skip the pruned partitions by looping over the Bitmapset. This will cause a very slight slow down in cases where no or not many partitions could be pruned, however, those cases are already slow to plan anyway and the overhead of looping over the Bitmapset would be unmeasurable when compared with the other tasks such as path creation for a large number of partitions. Reviewed-by: Amit Langote, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqnPx6JnUuPwaf5ao38zczrAb9mxt9gj4U1EKFfd4AqLA@mail.gmail.com
* Get rid of artificial restriction on hash table sizes on Windows.Tom Lane2021-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The point of introducing the hash_mem_multiplier GUC was to let users reproduce the old behavior of hash aggregation, i.e. that it could use more than work_mem at need. However, the implementation failed to get the job done on Win64, where work_mem is clamped to 2GB to protect various places that calculate memory sizes using "long int". As written, the same clamp was applied to hash_mem. This resulted in severe performance regressions for queries requiring a bit more than 2GB for hash aggregation, as they now spill to disk and there's no way to stop that. Getting rid of the work_mem restriction seems like a good idea, but it's a big job and could not conceivably be back-patched. However, there's only a fairly small number of places that are concerned with the hash_mem value, and it turns out to be possible to remove the restriction there without too much code churn or any ABI breaks. So, let's do that for now to fix the regression, and leave the larger task for another day. This patch does introduce a bit more infrastructure that should help with the larger task, namely pg_bitutils.h support for working with size_t values. Per gripe from Laurent Hasson. Back-patch to v13 where the behavior change came in. Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
* Speedup ScalarArrayOpExpr evaluationDavid Rowley2021-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ScalarArrayOpExprs with "useOr=true" and a set of Consts on the righthand side have traditionally been evaluated by using a linear search over the array. When these arrays contain large numbers of elements then this linear search could become a significant part of execution time. Here we add a new method of evaluating ScalarArrayOpExpr expressions to allow them to be evaluated by first building a hash table containing each element, then on subsequent evaluations, we just probe that hash table to determine if there is a match. The planner is in charge of determining when this optimization is possible and it enables it by setting hashfuncid in the ScalarArrayOpExpr. The executor will only perform the hash table evaluation when the hashfuncid is set. This means that not all cases are optimized. For example CHECK constraints containing an IN clause won't go through the planner, so won't get the hashfuncid set. We could maybe do something about that at some later date. The reason we're not doing it now is from fear that we may slow down cases where the expression is evaluated only once. Those cases can be common, for example, a single row INSERT to a table with a CHECK constraint containing an IN clause. In the planner, we enable this when there are suitable hash functions for the ScalarArrayOpExpr's operator and only when there is at least MIN_ARRAY_SIZE_FOR_HASHED_SAOP elements in the array. The threshold is currently set to 9. Author: James Coleman, David Rowley Reviewed-by: David Rowley, Tomas Vondra, Heikki Linnakangas Discussion: https://postgr.es/m/CAAaqYe8x62+=wn0zvNKCj55tPpg-JBHzhZFFc6ANovdqFw7-dA@mail.gmail.com
* Rework planning and execution of UPDATE and DELETE.Tom Lane2021-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes two closely related sets of changes: 1. For UPDATE, the subplan of the ModifyTable node now only delivers the new values of the changed columns (i.e., the expressions computed in the query's SET clause) plus row identity information such as CTID. ModifyTable must re-fetch the original tuple to merge in the old values of any unchanged columns. The core advantage of this is that the changed columns are uniform across all tables of an inherited or partitioned target relation, whereas the other columns might not be. A secondary advantage, when the UPDATE involves joins, is that less data needs to pass through the plan tree. The disadvantage of course is an extra fetch of each tuple to be updated. However, that seems to be very nearly free in context; even worst-case tests don't show it to add more than a couple percent to the total query cost. At some point it might be interesting to combine the re-fetch with the tuple access that ModifyTable must do anyway to mark the old tuple dead; but that would require a good deal of refactoring and it seems it wouldn't buy all that much, so this patch doesn't attempt it. 2. For inherited UPDATE/DELETE, instead of generating a separate subplan for each target relation, we now generate a single subplan that is just exactly like a SELECT's plan, then stick ModifyTable on top of that. To let ModifyTable know which target relation a given incoming row refers to, a tableoid junk column is added to the row identity information. This gets rid of the horrid hack that was inheritance_planner(), eliminating O(N^2) planning cost and memory consumption in cases where there were many unprunable target relations. Point 2 of course requires point 1, so that there is a uniform definition of the non-junk columns to be returned by the subplan. We can't insist on uniform definition of the row identity junk columns however, if we want to keep the ability to have both plain and foreign tables in a partitioning hierarchy. Since it wouldn't scale very far to have every child table have its own row identity column, this patch includes provisions to merge similar row identity columns into one column of the subplan result. In particular, we can merge the whole-row Vars typically used as row identity by FDWs into one column by pretending they are type RECORD. (It's still okay for the actual composite Datums to be labeled with the table's rowtype OID, though.) There is more that can be done to file down residual inefficiencies in this patch, but it seems to be committable now. FDW authors should note several API changes: * The argument list for AddForeignUpdateTargets() has changed, and so has the method it must use for adding junk columns to the query. Call add_row_identity_var() instead of manipulating the parse tree directly. You might want to reconsider exactly what you're adding, too. * PlanDirectModify() must now work a little harder to find the ForeignScan plan node; if the foreign table is part of a partitioning hierarchy then the ForeignScan might not be the direct child of ModifyTable. See postgres_fdw for sample code. * To check whether a relation is a target relation, it's no longer sufficient to compare its relid to root->parse->resultRelation. Instead, check it against all_result_relids or leaf_result_relids, as appropriate. Amit Langote and Tom Lane Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
* Allow estimate_num_groups() to pass back further details about the estimationDavid Rowley2021-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here we add a new output parameter to estimate_num_groups() to allow it to inform the caller of additional, possibly useful information about the estimation. The new output parameter is a struct that currently contains just a single field with a set of flags. This was done rather than having the flags as an output parameter to allow future fields to be added without having to change the signature of the function at a later date when we want to pass back further information that might not be suitable to store in the flags field. It seems reasonable that one day in the future that the planner would want to know more about the estimation. For example, how many individual sets of statistics was the estimation generated from? The planner may want to take that into account if we ever want to consider risks as well as costs when generating plans. For now, there's only 1 flag we set in the flags field. This is to indicate if the estimation fell back on using the hard-coded constants in any part of the estimation. Callers may like to change their behavior if this is set, and this gives them the ability to do so. Callers may pass the flag pointer as NULL if they have no interest in obtaining any additional information about the estimate. We're not adding any actual usages of these flags here. Some follow-up commits will make use of this feature. Additionally, we're also not making any changes to add support for clauselist_selectivity() and clauselist_selectivity_ext(). However, if this is required in the future then the same struct being added here should be fine to use as a new output argument for those functions too. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvqQqpk=1W-G_ds7A9CsXX3BggWj_7okinzkLVhDubQzjA@mail.gmail.com