aboutsummaryrefslogtreecommitdiff
path: root/src/backend/optimizer/prep/prepunion.c
Commit message (Collapse)AuthorAge
* Treat number of disabled nodes in a path as a separate cost metric.Robert Haas2024-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, when a path type was disabled by e.g. enable_seqscan=false, we either avoided generating that path type in the first place, or more commonly, we added a large constant, called disable_cost, to the estimated startup cost of that path. This latter approach can distort planning. For instance, an extremely expensive non-disabled path could seem to be worse than a disabled path, especially if the full cost of that path node need not be paid (e.g. due to a Limit). Or, as in the regression test whose expected output changes with this commit, the addition of disable_cost can make two paths that would normally be distinguishible in cost seem to have fuzzily the same cost. To fix that, we now count the number of disabled path nodes and consider that a high-order component of both the startup cost and the total cost. Hence, the path list is now sorted by disabled_nodes and then by total_cost, instead of just by the latter, and likewise for the partial path list. It is important that this number is a count and not simply a Boolean; else, as soon as we're unable to respect disabled path types in all portions of the path, we stop trying to avoid them where we can. Because the path list is now sorted by the number of disabled nodes, the join prechecks must compute the count of disabled nodes during the initial cost phase instead of postponing it to final cost time. Counts of disabled nodes do not cross subquery levels; at present, there is no reason for them to do so, since the we do not postpone path selection across subquery boundaries (see make_subplan). Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley. Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com
* Fix generate_union_paths for non-sortable types.REL_17_BETA1Robert Haas2024-05-21
| | | | | | | | | | | | The previous logic would fail to set groupList when grouping_is_sortable() returned false, which could cause queries to return wrong answers when some columns were not sortable. David Rowley, reviewed by Heikki Linnakangas and Álvaro Herrera. Reported by Hubert "depesz" Lubaczewski. Discussion: http://postgr.es/m/Zktzf926vslR35Fv@depesz.com Discussion: http://postgr.es/m/CAApHDvra=c8_zZT0J-TftByWN2Y+OJfnjNJFg4Dfdi2s+QzmqA@mail.gmail.com
* Re-allow planner to use Merge Append to efficiently implement UNION.Robert Haas2024-05-21
| | | | | | | | | | | This reverts commit 7204f35919b7e021e8d1bc9f2d76fd6bfcdd2070, thus restoring 66c0185a3 (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits d5d2205c8, 3b1a7eb28, 7487044d6. Per further discussion on pgsql-release, we wish to ship beta1 with this feature, and patch the bug that was found just before wrap, rather than shipping beta1 with the feature reverted.
* Revert commit 66c0185a3 and follow-on patches.Tom Lane2024-05-20
| | | | | | | | | | | | | | | | | | | This reverts 66c0185a3 (Allow planner to use Merge Append to efficiently implement UNION) as well as the follow-on commits d5d2205c8, 3b1a7eb28, 7487044d6. In addition to those, 07746a8ef had to be removed then re-applied in a different place, because 66c0185a3 moved the relevant code. The reason for this last-minute thrashing is that depesz found a case in which the patched code creates a completely wrong plan that silently gives incorrect query results. It's unclear what the cause is or how many cases are affected, but with beta1 wrap staring us in the face, there's no time for closer investigation. After we figure that out, we can decide whether to un-revert this for beta2 or hold it for v18. Discussion: https://postgr.es/m/Zktzf926vslR35Fv@depesz.com (also some private discussion among pgsql-release)
* Finish incomplete revert of ec63622c0.Tom Lane2024-05-06
| | | | | | | | | | The code change this made might well be fine to keep, but the comment justifying it by reference to self-join removal isn't. Let's just go back to the status quo ante, pending a more thorough review/redesign of SJE. (I found this by grepping to see if any references to self-join removal remained in the tree.)
* Fix typos and duplicate wordsDaniel Gustafsson2024-04-18
| | | | | | | | | | | | This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
* Fix assert failure when planning setop subqueries with CTEsDavid Rowley2024-04-02
| | | | | | | | | | | | | | | | | | | | | 66c0185a3 adjusted the UNION planner to request that union child queries produce Paths correctly ordered to implement the UNION by way of MergeAppend followed by Unique. The code there made a bad assumption that if the root->parent_root->parse had setOperations set that the query must be the child subquery of a set operation. That's not true when it comes to planning a non-inlined CTE which is parented by a set operation. This causes issues as the CTE's targetlist has no requirement to match up to the SetOperationStmt's groupClauses Fix this by adding a new parameter to both subquery_planner() and grouping_planner() to explicitly pass the SetOperationStmt only when planning set operation child subqueries. Thank you to Tom Lane for helping to rationalize the decision on the best function signature for subquery_planner(). Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/242fc7c6-a8aa-2daf-ac4c-0a231e2619c1@gmail.com
* Allow planner to use Merge Append to efficiently implement UNIONDavid Rowley2024-03-25
| | | | | | | | | | | | | | | | | | | | | Until now, UNION queries have often been suboptimal as the planner has only ever considered using an Append node and making the results unique by either using a Hash Aggregate, or by Sorting the entire Append result and running it through the Unique operator. Both of these methods always require reading all rows from the union subqueries. Here we adjust the union planner so that it can request that each subquery produce results in target list order so that these can be Merge Appended together and made unique with a Unique node. This can improve performance significantly as the union child can make use of the likes of btree indexes and/or Merge Joins to provide the top-level UNION with presorted input. This is especially good if the top-level UNION contains a LIMIT node that limits the output rows to a small subset of the unioned rows as cheap startup plans can be used. Author: David Rowley Reviewed-by: Richard Guo, Andy Fan Discussion: https://postgr.es/m/CAApHDvpb_63XQodmxKUF8vb9M7CxyUyT4sWvEgqeQU-GB7QFoQ@mail.gmail.com
* Remove unused #include's from backend .c filesPeter Eisentraut2024-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Fix usage of the parse tree for estimate_num_groups() in set operationsAlexander Korotkov2023-11-04
| | | | | | | | | | | | | | | | | | | | | recurse_set_operations() uses the parse tree for the group number estimation, because of the "varno 0" hack. At the same time 2489d76c49 made root->parse and corresponding parent_root->simple_rte_array[]->subquery distinct copies of the parse tree, while d3d55ce571 introduced self-join removal replacing relid of removed relation only in one of the copies. The present commit fixes this bug by making recurse_set_operations() call estimate_num_groups() with the copy of the parse tree processed by self-join removal. In future, we may think about maintaining just one copy of the parse tree and/or keeping removed relids as aliases. Reported-by: Zuming Jiang Bug: #18170 Discussion: https://postgr.es/m/flat/18170-f1d17bf9a0d58b24%40postgresql.org Author: Richard Guo, Alexander Korotkov Reviewed-by: Andrei Lepikhov
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Rename shadowed local variablesDavid Rowley2022-10-05
| | | | | | | | | | | | In a similar effort to f01592f91, here we mostly rename shadowed local variables to remove the warnings produced when compiling with -Wshadow=compatible-local. This fixes 63 warnings and leaves just 5. Author: Justin Pryzby, David Rowley Reviewed-by: Justin Pryzby Discussion https://postgr.es/m/20220817145434.GC26426%40telsasoft.com
* More -Wshadow=compatible-local warning fixesDavid Rowley2022-08-26
| | | | | | | | | | | | In a similar effort to f01592f91, here we're targetting fixing the warnings where we've deemed the shadowing variable to serve a close enough purpose to the shadowed variable just to reuse the shadowed version and not declare the shadowing variable at all. By my count, this takes the warning count from 106 down to 71. Author: Justin Pryzby Discussion: https://postgr.es/m/20220825020839.GT2342@telsasoft.com
* Remove fls(), use pg_leftmost_one_pos32() instead.Thomas Munro2022-07-22
| | | | | | | | | | | | | | Commit 4f658dc8 provided the traditional BSD fls() function in src/port/fls.c so it could be used in several places. Later we added a bunch of similar facilities in pg_bitutils.h, based on compiler builtins that map to hardware instructions. It's a bit confusing to have both 1-based and 0-based variants of this operation in use in different parts of the tree, and neither is blessed by a standard. Let's drop fls.c and the configure probe, and reuse the newer code. Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CA%2BhUKG%2B7dSX1XF8yFGmYk-%3D48dbjH2kmzZj16XvhbrWP-9BzRg%40mail.gmail.com
* Estimate cost of elided SubqueryScan, Append, MergeAppend nodes better.Tom Lane2022-07-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | setrefs.c contains logic to discard no-op SubqueryScan nodes, that is, ones that have no qual to check and copy the input targetlist unchanged. (Formally it's not very nice to be applying such optimizations so late in the planner, but there are practical reasons for it; mostly that we can't unify relids between the subquery and the parent query until we flatten the rangetable during setrefs.c.) This behavior falsifies our previous cost estimates, since we would've charged cpu_tuple_cost per row just to pass data through the node. Most of the time that's little enough to not matter, but there are cases where this effect visibly changes the plan compared to what you would've gotten with no sub-select. To improve the situation, make the callers of cost_subqueryscan tell it whether they think the targetlist is trivial. cost_subqueryscan already has the qual list, so it can check the other half of the condition easily. It could make its own determination of tlist triviality too, but doing so would be repetitive (for callers that may call it several times) or unnecessarily expensive (for callers that can determine this more cheaply than a general test would do). This isn't a 100% solution, because createplan.c also does things that can falsify any earlier estimate of whether the tlist is trivial. However, it fixes nearly all cases in practice, if results for the regression tests are anything to go by. setrefs.c also contains logic to discard no-op Append and MergeAppend nodes. We did have knowledge of that behavior at costing time, but somebody failed to update it when a check on parallel-awareness was added to the setrefs.c logic. Fix that while we're here. These changes result in two minor changes in query plans shown in our regression tests. Neither is relevant to the purposes of its test case AFAICT. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/2581077.1651703520@sss.pgh.pa.us
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Get rid of artificial restriction on hash table sizes on Windows.Tom Lane2021-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The point of introducing the hash_mem_multiplier GUC was to let users reproduce the old behavior of hash aggregation, i.e. that it could use more than work_mem at need. However, the implementation failed to get the job done on Win64, where work_mem is clamped to 2GB to protect various places that calculate memory sizes using "long int". As written, the same clamp was applied to hash_mem. This resulted in severe performance regressions for queries requiring a bit more than 2GB for hash aggregation, as they now spill to disk and there's no way to stop that. Getting rid of the work_mem restriction seems like a good idea, but it's a big job and could not conceivably be back-patched. However, there's only a fairly small number of places that are concerned with the hash_mem value, and it turns out to be possible to remove the restriction there without too much code churn or any ABI breaks. So, let's do that for now to fix the regression, and leave the larger task for another day. This patch does introduce a bit more infrastructure that should help with the larger task, namely pg_bitutils.h support for working with size_t values. Per gripe from Laurent Hasson. Back-patch to v13 where the behavior change came in. Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
* Allow estimate_num_groups() to pass back further details about the estimationDavid Rowley2021-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here we add a new output parameter to estimate_num_groups() to allow it to inform the caller of additional, possibly useful information about the estimation. The new output parameter is a struct that currently contains just a single field with a set of flags. This was done rather than having the flags as an output parameter to allow future fields to be added without having to change the signature of the function at a later date when we want to pass back further information that might not be suitable to store in the flags field. It seems reasonable that one day in the future that the planner would want to know more about the estimation. For example, how many individual sets of statistics was the estimation generated from? The planner may want to take that into account if we ever want to consider risks as well as costs when generating plans. For now, there's only 1 flag we set in the flags field. This is to indicate if the estimation fell back on using the hard-coded constants in any part of the estimation. Callers may like to change their behavior if this is set, and this gives them the ability to do so. Callers may pass the flag pointer as NULL if they have no interest in obtaining any additional information about the estimate. We're not adding any actual usages of these flags here. Some follow-up commits will make use of this feature. Additionally, we're also not making any changes to add support for clauselist_selectivity() and clauselist_selectivity_ext(). However, if this is required in the future then the same struct being added here should be fine to use as a new output argument for those functions too. Author: David Rowley Discussion: https://postgr.es/m/CAApHDvqQqpk=1W-G_ds7A9CsXX3BggWj_7okinzkLVhDubQzjA@mail.gmail.com
* Remove [Merge]AppendPath.partitioned_rels.Tom Lane2021-02-01
| | | | | | | | | | | | | | | | | It turns out that the calculation of [Merge]AppendPath.partitioned_rels in allpaths.c is faulty and sometimes omits relevant non-leaf partitions, allowing an assertion added by commit a929e17e5a8 to trigger. Rather than fix that, it seems better to get rid of those fields altogether. We don't really need the info until create_plan time, and calculating it once for the selected plan should be cheaper than calculating it for each append path we consider. The preceding two commits did away with all use of the partitioned_rels values; this commit just mechanically removes the fields and the code that calculated them. Discussion: https://postgr.es/m/87sg8tqhsl.fsf@aurora.ydns.eu Discussion: https://postgr.es/m/CAJKUy5gCXDSmFs2c=R+VGgn7FiYcLCsEFEuDNNLGfoha=pBE_g@mail.gmail.com
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Suppress unnecessary RelabelType nodes in yet more cases.Tom Lane2020-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a477bfc1d fixed eval_const_expressions() to ensure that it didn't generate unnecessary RelabelType nodes, but I failed to notice that some other places in the planner had the same issue. Really noplace in the planner should be using plain makeRelabelType(), for fear of generating expressions that should be equal() to semantically equivalent trees, but aren't. An example is that because canonicalize_ec_expression() failed to be careful about this, we could end up with an equivalence class containing both a plain Const, and a Const-with-RelabelType representing exactly the same value. So far as I can tell this led to no visible misbehavior, but we did waste a bunch of cycles generating and evaluating "Const = Const-with-RelabelType" to prove such entries are redundant. Hence, move the support function added by a477bfc1d to where it can be more generally useful, and use it in the places where planner code previously used makeRelabelType. Back-patch to v12, like the previous patch. While I have no concrete evidence of any real misbehavior here, it's certainly possible that I overlooked a case where equivalent expressions that aren't equal() could cause a user-visible problem. In any case carrying extra RelabelType nodes through planning to execution isn't very desirable. Discussion: https://postgr.es/m/1311836.1597781384@sss.pgh.pa.us
* Add hash_mem_multiplier GUC.Peter Geoghegan2020-07-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a GUC that acts as a multiplier on work_mem. It gets applied when sizing executor node hash tables that were previously size constrained using work_mem alone. The new GUC can be used to preferentially give hash-based nodes more memory than the generic work_mem limit. It is intended to enable admin tuning of the executor's memory usage. Overall system throughput and system responsiveness can be improved by giving hash-based executor nodes more memory (especially over sort-based alternatives, which are often much less sensitive to being memory constrained). The default value for hash_mem_multiplier is 1.0, which is also the minimum valid value. This means that hash-based nodes continue to apply work_mem in the traditional way by default. hash_mem_multiplier is generally useful. However, it is being added now due to concerns about hash aggregate performance stability for users that upgrade to Postgres 13 (which added disk-based hash aggregation in commit 1f39bce0). While the old hash aggregate behavior risked out-of-memory errors, it is nevertheless likely that many users actually benefited. Hash agg's previous indifference to work_mem during query execution was not just faster; it also accidentally made aggregation resilient to grouping estimate problems (at least in cases where this didn't create destabilizing memory pressure). hash_mem_multiplier can provide a certain kind of continuity with the behavior of Postgres 12 hash aggregates in cases where the planner incorrectly estimates that all groups (plus related allocations) will fit in work_mem/hash_mem. This seems necessary because hash-based aggregation is usually much slower when only a small fraction of all groups can fit. Even when it isn't possible to totally avoid hash aggregates that spill, giving hash aggregation more memory will reliably improve performance (the same cannot be said for external sort operations, which appear to be almost unaffected by memory availability provided it's at least possible to get a single merge pass). The PostgreSQL 13 release notes should advise users that increasing hash_mem_multiplier can help with performance regressions associated with hash aggregation. That can be taken care of by a later commit. Author: Peter Geoghegan Reviewed-By: Álvaro Herrera, Jeff Davis Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com Backpatch: 13-, where disk-based hash aggregation was introduced.
* Correct obsolete UNION hash aggs comment.Peter Geoghegan2020-07-28
| | | | | | Oversight in commit 1f39bce0, which added disk-based hash aggregation. Backpatch: 13-, where disk-based hash aggregation was introduced.
* Disk-based Hash Aggregation.Jeff Davis2020-03-18
| | | | | | | | | | | | | | | | | | | | | | While performing hash aggregation, track memory usage when adding new groups to a hash table. If the memory usage exceeds work_mem, enter "spill mode". In spill mode, new groups are not created in the hash table(s), but existing groups continue to be advanced if input tuples match. Tuples that would cause a new group to be created are instead spilled to a logical tape to be processed later. The tuples are spilled in a partitioned fashion. When all tuples from the outer plan are processed (either by advancing the group or spilling the tuple), finalize and emit the groups from the hash table. Then, create new batches of work from the spilled partitions, and select one of the saved batches and process it (possibly spilling recursively). Author: Jeff Davis Reviewed-by: Tomas Vondra, Adam Lee, Justin Pryzby, Taylor Vesely, Melanie Plageman Discussion: https://postgr.es/m/507ac540ec7c20136364b5272acbcd4574aa76ef.camel@j-davis.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Cosmetic improvements in setup of planner's per-RTE arrays.Tom Lane2019-08-09
| | | | | | | | | | | | | Merge setup_append_rel_array into setup_simple_rel_arrays. There's no particularly good reason to keep them separate, and it's inconsistent with the lack of separation in expand_planner_arrays. The only apparent benefit was that the fast path for trivial queries in query_planner() doesn't need to set up the append_rel_array; but all we're saving there is an if-test and NULL assignment, which surely ought to be negligible. Also improve some obsolete comments. Discussion: https://postgr.es/m/17220.1565301350@sss.pgh.pa.us
* Speed up finding EquivalenceClasses for a given set of relsDavid Rowley2019-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously in order to determine which ECs a relation had members in, we had to loop over all ECs stored in PlannerInfo's eq_classes and check if ec_relids mentioned the relation. For the most part, this was fine, as generally, unless queries were fairly complex, the overhead of performing the lookup would have not been that significant. However, when queries contained large numbers of joins and ECs, the overhead to find the set of classes matching a given set of relations could become a significant portion of the overall planning effort. Here we allow a much more efficient method to access the ECs which match a given relation or set of relations. A new Bitmapset field in RelOptInfo now exists to store the indexes into PlannerInfo's eq_classes list which each relation is mentioned in. This allows very fast lookups to find all ECs belonging to a single relation. When we need to lookup ECs belonging to a given pair of relations, we can simply bitwise-AND the Bitmapsets from each relation and use the result to perform the lookup. We also take the opportunity to write a new implementation of generate_join_implied_equalities which makes use of the new indexes. generate_join_implied_equalities_for_ecs must remain as is as it can be given a custom list of ECs, which we can't easily determine the indexes of. This was originally intended to fix the performance penalty of looking up foreign keys matching a join condition which was introduced by 100340e2d. However, we're speeding up much more than just that here. Author: David Rowley, Tom Lane Reviewed-by: Tom Lane, Tomas Vondra Discussion: https://postgr.es/m/6970.1545327857@sss.pgh.pa.us
* Fix inconsistencies and typos in the treeMichael Paquier2019-07-16
| | | | | | | | | | | This is numbered take 7, and addresses a set of issues around: - Fixes for typos and incorrect reference names. - Removal of unneeded comments. - Removal of unreferenced functions and structures. - Fixes regarding variable name consistency. Author: Alexander Lakhin Discussion: https://postgr.es/m/10bfd4ac-3e7c-40ab-2b2e-355ed15495e8@gmail.com
* Represent Lists as expansible arrays, not chains of cons-cells.Tom Lane2019-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally, Postgres Lists were a more or less exact reimplementation of Lisp lists, which consist of chains of separately-allocated cons cells, each having a value and a next-cell link. We'd hacked that once before (commit d0b4399d8) to add a separate List header, but the data was still in cons cells. That makes some operations -- notably list_nth() -- O(N), and it's bulky because of the next-cell pointers and per-cell palloc overhead, and it's very cache-unfriendly if the cons cells end up scattered around rather than being adjacent. In this rewrite, we still have List headers, but the data is in a resizable array of values, with no next-cell links. Now we need at most two palloc's per List, and often only one, since we can allocate some values in the same palloc call as the List header. (Of course, extending an existing List may require repalloc's to enlarge the array. But this involves just O(log N) allocations not O(N).) Of course this is not without downsides. The key difficulty is that addition or deletion of a list entry may now cause other entries to move, which it did not before. For example, that breaks foreach() and sister macros, which historically used a pointer to the current cons-cell as loop state. We can repair those macros transparently by making their actual loop state be an integer list index; the exposed "ListCell *" pointer is no longer state carried across loop iterations, but is just a derived value. (In practice, modern compilers can optimize things back to having just one loop state value, at least for simple cases with inline loop bodies.) In principle, this is a semantics change for cases where the loop body inserts or deletes list entries ahead of the current loop index; but I found no such cases in the Postgres code. The change is not at all transparent for code that doesn't use foreach() but chases lists "by hand" using lnext(). The largest share of such code in the backend is in loops that were maintaining "prev" and "next" variables in addition to the current-cell pointer, in order to delete list cells efficiently using list_delete_cell(). However, we no longer need a previous-cell pointer to delete a list cell efficiently. Keeping a next-cell pointer doesn't work, as explained above, but we can improve matters by changing such code to use a regular foreach() loop and then using the new macro foreach_delete_current() to delete the current cell. (This macro knows how to update the associated foreach loop's state so that no cells will be missed in the traversal.) There remains a nontrivial risk of code assuming that a ListCell * pointer will remain good over an operation that could now move the list contents. To help catch such errors, list.c can be compiled with a new define symbol DEBUG_LIST_MEMORY_USAGE that forcibly moves list contents whenever that could possibly happen. This makes list operations significantly more expensive so it's not normally turned on (though it is on by default if USE_VALGRIND is on). There are two notable API differences from the previous code: * lnext() now requires the List's header pointer in addition to the current cell's address. * list_delete_cell() no longer requires a previous-cell argument. These changes are somewhat unfortunate, but on the other hand code using either function needs inspection to see if it is assuming anything it shouldn't, so it's not all bad. Programmers should be aware of these significant performance changes: * list_nth() and related functions are now O(1); so there's no major access-speed difference between a list and an array. * Inserting or deleting a list element now takes time proportional to the distance to the end of the list, due to moving the array elements. (However, it typically *doesn't* require palloc or pfree, so except in long lists it's probably still faster than before.) Notably, lcons() used to be about the same cost as lappend(), but that's no longer true if the list is long. Code that uses lcons() and list_delete_first() to maintain a stack might usefully be rewritten to push and pop at the end of the list rather than the beginning. * There are now list_insert_nth...() and list_delete_nth...() functions that add or remove a list cell identified by index. These have the data-movement penalty explained above, but there's no search penalty. * list_concat() and variants now copy the second list's data into storage belonging to the first list, so there is no longer any sharing of cells between the input lists. The second argument is now declared "const List *" to reflect that it isn't changed. This patch just does the minimum needed to get the new implementation in place and fix bugs exposed by the regression tests. As suggested by the foregoing, there's a fair amount of followup work remaining to do. Also, the ENABLE_LIST_COMPAT macros are finally removed in this commit. Code using those should have been gone a dozen years ago. Patch by me; thanks to David Rowley, Jesper Pedersen, and others for review. Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Use Append rather than MergeAppend for scanning ordered partitions.Tom Lane2019-04-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we need ordered output from a scan of a partitioned table, but the ordering matches the partition ordering, then we don't need to use a MergeAppend to combine the pre-ordered per-partition scan results: a plain Append will produce the same results. This both saves useless comparison work inside the MergeAppend proper, and allows us to start returning tuples after istarting up just the first child node not all of them. However, all is not peaches and cream, because if some of the child nodes have high startup costs then there will be big discontinuities in the tuples-returned-versus-elapsed-time curve. The planner's cost model cannot handle that (yet, anyway). If we model the Append's startup cost as being just the first child's startup cost, we may drastically underestimate the cost of fetching slightly more tuples than are available from the first child. Since we've had bad experiences with over-optimistic choices of "fast start" plans for ORDER BY LIMIT queries, that seems scary. As a klugy workaround, set the startup cost estimate for an ordered Append to be the sum of its children's startup costs (as MergeAppend would). This doesn't really describe reality, but it's less likely to cause a bad plan choice than an underestimated startup cost would. In practice, the cases where we really care about this optimization will have child plans that are IndexScans with zero startup cost, so that the overly conservative estimate is still just zero. David Rowley, reviewed by Julien Rouhaud and Antonin Houska Discussion: https://postgr.es/m/CAKJS1f-hAqhPLRk_RaSFTgYxd=Tz5hA7kQ2h4-DhJufQk8TGuw@mail.gmail.com
* Standardize some more loops that chase down parallel lists.Tom Lane2019-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | We have forboth() and forthree() macros that simplify iterating through several parallel lists, but not everyplace that could reasonably use those was doing so. Also invent forfour() and forfive() macros to do the same for four or five parallel lists, and use those where applicable. The immediate motivation for doing this is to reduce the number of ad-hoc lnext() calls, to reduce the footprint of a WIP patch. However, it seems like good cleanup and error-proofing anyway; the places that were combining forthree() with a manually iterated loop seem particularly illegible and bug-prone. There was some speculation about restructuring related parsetree representations to reduce the need for parallel list chasing of this sort. Perhaps that's a win, or perhaps not, but in any case it would be considerably more invasive than this patch; and it's not particularly related to my immediate goal of improving the List infrastructure. So I'll leave that question for another day. Patch by me; thanks to David Rowley for review. Discussion: https://postgr.es/m/11587.1550975080@sss.pgh.pa.us
* Remove heapam.h include made superfluous by b60c3975990.Andres Freund2019-01-12
| | | | | | Noticed this while working on another patch. Author: Andres Freund
* Move inheritance expansion code into its own fileAlvaro Herrera2019-01-10
| | | | | | | | | | | | | | | | | This commit moves expand_inherited_tables and underlings from optimizer/prep/prepunionc.c to optimizer/utils/inherit.c. Also, all of the AppendRelInfo-based expression manipulation routines are moved to optimizer/utils/appendinfo.c. No functional code changes. One exception is the introduction of make_append_rel_info, but that's still just moving around code. Also, stop including <limits.h> in prepunion.c, which no longer needs it since 3fc6e2d7f5b6. I (Álvaro) noticed this because Amit was copying that to inherit.c, which likewise doesn't need it. Author: Amit Langote Discussion: https://postgr.es/m/3be67028-a00a-502c-199a-da00eec8fb6e@lab.ntt.co.jp
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Remove extra semicolons.Amit Kapila2018-12-17
| | | | | | | | Reported-by: David Rowley Author: David Rowley Reviewed-by: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/CAKJS1f8EneeYyzzvdjahVZ6gbAHFkHbSFB5m_C0Y6TUJs9Dgdg@mail.gmail.com
* Redesign initialization of partition routing structuresAlvaro Herrera2018-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well as the future MERGE) on partitioned tables. This changes the setup for tuple routing so that it does far less work during the initial setup and pushes more work out to when partitions receive tuples. PartitionDispatchData structs for sub-partitioned tables are only created when a tuple gets routed through it. The possibly large arrays in the PartitionTupleRouting struct have largely been removed. The partitions[] array remains but now never contains any NULL gaps. Previously the NULLs had to be skipped during ExecCleanupTupleRouting(), which could add a large overhead to the cleanup when the number of partitions was large. The partitions[] array is allocated small to start with and only enlarged when we route tuples to enough partitions that it runs out of space. This allows us to keep simple single-row partition INSERTs running quickly. Redesign The arrays in PartitionTupleRouting which stored the tuple translation maps have now been removed. These have been moved out into a PartitionRoutingInfo struct which is an additional field in ResultRelInfo. The find_all_inheritors() call still remains by far the slowest part of ExecSetupPartitionTupleRouting(). This commit just removes the other slow parts. In passing also rename the tuple translation maps from being ParentToChild and ChildToParent to being RootToPartition and PartitionToRoot. The old names mislead you into thinking that a partition of some sub-partitioned table would translate to the rowtype of the sub-partitioned table rather than the root partitioned table. Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera Testing help from Jesper Pedersen and Kato Sho. Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com
* Change rewriter/planner/executor/plancache to depend on RTE rellockmode.Tom Lane2018-10-02
| | | | | | | | | | | | | | | | | | | | | | | | | Instead of recomputing the required lock levels in all these places, just use what commit fdba460a2 made the parser store in the RTE fields. This already simplifies the code measurably in these places, and follow-on changes will remove a bunch of no-longer-needed infrastructure. In a few cases, this change causes us to acquire a higher lock level than we did before. This is OK primarily because said higher lock level should've been acquired already at query parse time; thus, we're saving a useless extra trip through the shared lock manager to acquire a lesser lock alongside the original lock. The only known exception to this is that re-execution of a previously planned SELECT FOR UPDATE/SHARE query, for a table that uses ROW_MARK_REFERENCE or ROW_MARK_COPY methods, might have gotten only AccessShareLock before. Now it will get RowShareLock like the first execution did, which seems fine. While there's more to do, push it in this state anyway, to let the buildfarm help verify that nothing bad happened. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
* Improve performance of tuple conversion map generationHeikki Linnakangas2018-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Previously convert_tuples_by_name_map naively performed a search of each outdesc column starting at the first column in indesc and searched each indesc column until a match was found. When partitioned tables had many columns this could result in slow generation of the tuple conversion maps. For INSERT and UPDATE statements that touched few rows, this could mean a very large overhead indeed. We can do a bit better with this loop. It's quite likely that the columns in partitioned tables and their partitions are in the same order, so it makes sense to start searching for each column outer column at the inner column position 1 after where the previous match was found (per idea from Alexander Kuzmenkov). This makes the best case search O(N) instead of O(N^2). The worst case is still O(N^2), but it seems unlikely that would happen. Likewise, in the planner, make_inh_translation_list's search for the matching column could often end up falling back on an O(N^2) type search. This commit also improves that by first checking the column that follows the previous match, instead of the column with the same attnum. If we fail to match here we fallback on the syscache's hashtable lookup. Author: David Rowley Reviewed-by: Alexander Kuzmenkov Discussion: https://www.postgresql.org/message-id/CAKJS1f9-wijVgMdRp6_qDMEQDJJ%2BA_n%3DxzZuTmLx5Fz6cwf%2B8A%40mail.gmail.com
* Remove dead code for temporary relations in partition planningMichael Paquier2018-07-04
| | | | | | | | | | | | | | | | | | | Since recent commit 1c7c317c, temporary relations cannot be mixed with permanent relations within the same partition tree, and the same counts for temporary relations created by other sessions, which the planner simply discarded. Instead be paranoid and issue an error, as those should be blocked at definition time, at least for now. At the same time, a test case is added to stress what has been moved when expand_partitioned_rtentry gets called recursively but bumps on a partitioned relation with no partitions which should be handled the same way as the non-inheritance case. This code may be reworked in a close future, and covering this code path will limit surprises. Reported-by: David Rowley Author: David Rowley Reviewed-by: Amit Langote, Robert Haas, Michael Paquier Discussion: https://postgr.es/m/CAKJS1f_HyV1txn_4XSdH5EOhBMYaCwsXyAj6bHXk9gOu4JKsbw@mail.gmail.com
* Allow direct lookups of AppendRelInfo by child relidAlvaro Herrera2018-06-26
| | | | | | | | | | | | | | | | | | | | | find_appinfos_by_relids had quite a large overhead when the number of items in the append_rel_list was high, as it had to trawl through the append_rel_list looking for AppendRelInfos belonging to the given childrelids. Since there can only be a single AppendRelInfo for each child rel, it seems much better to store an array in PlannerInfo which indexes these by child relid, making the function O(1) rather than O(N). This function was only called once inside the planner, so just replace that call with a lookup to the new array. find_childrel_appendrelinfo is now unused and thus removed. This fixes a planner performance regression new to v11 reported by Thomas Reiss. Author: David Rowley Reported-by: Thomas Reiss Reviewed-by: Ashutosh Bapat Reviewed-by: Álvaro Herrera Discussion: https://postgr.es/m/94dd7a4b-5e50-0712-911d-2278e055c622@dalibo.com
* Post-feature-freeze pgindent run.Tom Lane2018-04-26
| | | | Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
* Prevent generation of bogus subquery scan paths.Robert Haas2018-04-25
| | | | | | | | | | | | | | | Commit 0927d2f46ddd4cf7d6bf2cc84b3be923e0aedc52 didn't check that consider_parallel was set for the target relation or account for the possibility that required_outer might be non-empty. To prevent future bugs of this ilk, add some assertions to add_partial_path and do a bit of future-proofing of the code recently added to recurse_set_operations. Report by Andreas Seltenreich. Patch by Jeevan Chalke. Review by Amit Kapila and by me. Discussion: http://postgr.es/m/CAM2+6=U+9otsyF2fYB8x_2TBeHTR90itarqW=qAEjN-kHaC7kw@mail.gmail.com
* Merge catalog/pg_foo_fn.h headers back into pg_foo.h headers.Tom Lane2018-04-08
| | | | | | | | | | | | | Traditionally, include/catalog/pg_foo.h contains extern declarations for functions in backend/catalog/pg_foo.c, in addition to its function as the authoritative definition of the pg_foo catalog's rowtype. In some cases, we'd been forced to split out those extern declarations into separate pg_foo_fn.h headers so that the catalog definitions could be #include'd by frontend code. That problem is gone as of commit 9c0a0de4c, so let's undo the splits to make things less confusing. Discussion: https://postgr.es/m/23690.1523031777@sss.pgh.pa.us
* Support partition pruning at execution timeAlvaro Herrera2018-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Existing partition pruning is only able to work at plan time, for query quals that appear in the parsed query. This is good but limiting, as there can be parameters that appear later that can be usefully used to further prune partitions. This commit adds support for pruning subnodes of Append which cannot possibly contain any matching tuples, during execution, by evaluating Params to determine the minimum set of subnodes that can possibly match. We support more than just simple Params in WHERE clauses. Support additionally includes: 1. Parameterized Nested Loop Joins: The parameter from the outer side of the join can be used to determine the minimum set of inner side partitions to scan. 2. Initplans: Once an initplan has been executed we can then determine which partitions match the value from the initplan. Partition pruning is performed in two ways. When Params external to the plan are found to match the partition key we attempt to prune away unneeded Append subplans during the initialization of the executor. This allows us to bypass the initialization of non-matching subplans meaning they won't appear in the EXPLAIN or EXPLAIN ANALYZE output. For parameters whose value is only known during the actual execution then the pruning of these subplans must wait. Subplans which are eliminated during this stage of pruning are still visible in the EXPLAIN output. In order to determine if pruning has actually taken place, the EXPLAIN ANALYZE must be viewed. If a certain Append subplan was never executed due to the elimination of the partition then the execution timing area will state "(never executed)". Whereas, if, for example in the case of parameterized nested loops, the number of loops stated in the EXPLAIN ANALYZE output for certain subplans may appear lower than others due to the subplan having been scanned fewer times. This is due to the list of matching subnodes having to be evaluated whenever a parameter which was found to match the partition key changes. This commit required some additional infrastructure that permits the building of a data structure which is able to perform the translation of the matching partition IDs, as returned by get_matching_partitions, into the list index of a subpaths list, as exist in node types such as Append, MergeAppend and ModifyTable. This allows us to translate a list of clauses into a Bitmapset of all the subpath indexes which must be included to satisfy the clause list. Author: David Rowley, based on an earlier effort by Beena Emerson Reviewers: Amit Langote, Robert Haas, Amul Sul, Rajkumar Raghuwanshi, Jesper Pedersen Discussion: https://postgr.es/m/CAOG9ApE16ac-_VVZVvv0gePSgkg_BwYEV1NBqZFqDR2bBE0X0A@mail.gmail.com
* Faster partition pruningAlvaro Herrera2018-04-06
| | | | | | | | | | | | | | | | | | | | | Add a new module backend/partitioning/partprune.c, implementing a more sophisticated algorithm for partition pruning. The new module uses each partition's "boundinfo" for pruning instead of constraint exclusion, based on an idea proposed by Robert Haas of a "pruning program": a list of steps generated from the query quals which are run iteratively to obtain a list of partitions that must be scanned in order to satisfy those quals. At present, this targets planner-time partition pruning, but there exist further patches to apply partition pruning at execution time as well. This commit also moves some definitions from include/catalog/partition.h to a new file include/partitioning/partbounds.h, in an attempt to rationalize partitioning related code. Authors: Amit Langote, David Rowley, Dilip Kumar Reviewers: Robert Haas, Kyotaro Horiguchi, Ashutosh Bapat, Jesper Pedersen. Discussion: https://postgr.es/m/098b9c71-1915-1a2a-8d52-1a7a50ce79e8@lab.ntt.co.jp
* postgres_fdw: Push down partition-wise aggregation.Robert Haas2018-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 7012b132d07c2b4ea15b0b3cb1ea9f3278801d98, postgres_fdw has been able to push down the toplevel aggregation operation to the remote server. Commit e2f1eb0ee30d144628ab523432320f174a2c8966 made it possible to break down the toplevel aggregation into one aggregate per partition. This commit lets postgres_fdw push down aggregation in that case just as it does at the top level. In order to make this work, this commit adds an additional argument to the GetForeignUpperPaths FDW API. A matching argument is added to the signature for create_upper_paths_hook. Third-party code using either of these will need to be updated. Also adjust create_foreignscan_plan() so that it picks up the correct set of relids in this case. Jeevan Chalke, reviewed by Ashutosh Bapat and by me and with some adjustments by me. The larger patch series of which this patch is a part was also reviewed and tested by Antonin Houska, Rajkumar Raghuwanshi, David Rowley, Dilip Kumar, Konstantin Knizhnik, Pascal Legrand, and Rafia Sabih. Discussion: http://postgr.es/m/CAM2+6=V64_xhstVHie0Rz=KPEQnLJMZt_e314P0jaT_oJ9MR8A@mail.gmail.com Discussion: http://postgr.es/m/CAM2+6=XPWujjmj5zUaBTGDoB38CemwcPmjkRy0qOcsQj_V+2sQ@mail.gmail.com
* Consider Parallel Append of partial paths for UNION [ALL].Robert Haas2018-03-22
| | | | | | | | | | | | | | | | Without this patch, we can implement a UNION or UNION ALL as an Append where Gather appears beneath one or more of the Append branches, but this lets us put the Gather node on top, with a partial path for each relation underneath. There is considerably more work that could be done to improve planning in this area, but that will probably need to wait for a future release. Patch by me, reviewed and tested by Ashutosh Bapat and Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CA+TgmoaLRAOqHmMZx=ESM3VDEPceg+-XXZsRXQ8GtFJO_zbMSw@mail.gmail.com
* Generate a separate upper relation for each stage of setop planning.Robert Haas2018-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | Commit 3fc6e2d7f5b652b417fa6937c34de2438d60fa9f made setop planning stages return paths rather than plans, but all such paths were loosely associated with a single RelOptInfo, and only the final path was added to the RelOptInfo. Even at the time, it was foreseen that this should be changed, because there is otherwise no good way for a single stage of setop planning to return multiple paths. With this patch, each stage of set operation planning now creates a separate RelOptInfo; these are distinguished by using appropriate relid sets. Note that this patch does nothing whatsoever about actually returning multiple paths for the same set operation; it just makes it possible for a future patch to do so. Along the way, adjust things so that create_upper_paths_hook is called for each of these new RelOptInfos rather than just once, since that might be useful to extensions using that hook. It might be a good to provide an FDW API here as well, but I didn't try to do that for now. Patch by me, reviewed and tested by Ashutosh Bapat and Rajkumar Raghuwanshi. Discussion: http://postgr.es/m/CA+TgmoaLRAOqHmMZx=ESM3VDEPceg+-XXZsRXQ8GtFJO_zbMSw@mail.gmail.com