aboutsummaryrefslogtreecommitdiff
path: root/src/backend/commands/explain.c
Commit message (Collapse)AuthorAge
...
* Pre-beta mechanical code beautification.Tom Lane2023-05-19
| | | | | | | | | | | | | | | Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
* Support "Right Anti Join" plan shapes.Tom Lane2023-04-05
| | | | | | | | | | | | | Merge and hash joins can support antijoin with the non-nullable input on the right, using very simple combinations of their existing logic for right join and anti join. This gives the planner more freedom about how to order the join. It's particularly useful for hash join, since we may now have the option to hash the smaller table instead of the larger. Richard Guo, reviewed by Ronan Dunklau and myself Discussion: https://postgr.es/m/CAMbWs48xh9hMzXzSy3VaPzGAz+fkxXXTUbCLohX1_L8THFRm2Q@mail.gmail.com
* Simplify useless 0L constantsPeter Eisentraut2023-03-29
| | | | | | | In ancient times, these belonged to arguments or fields that were actually of type long, but now they are not anymore, so this "L" decoration is just confusing. (Some other 0L and other "L" constants remain, where they are actually associated with a long type.)
* Invent GENERIC_PLAN option for EXPLAIN.Tom Lane2023-03-24
| | | | | | | | | | | | | | | | | This provides a very simple way to see the generic plan for a parameterized query. Without this, it's necessary to define a prepared statement and temporarily change plan_cache_mode, which is a bit tedious. One thing that's a bit of a hack perhaps is that we disable execution-time partition pruning when the GENERIC_PLAN option is given. That's because the pruning code may attempt to fetch the value of one of the parameters, which would fail. Laurenz Albe, reviewed by Julien Rouhaud, Christoph Berg, Michel Pelletier, Jim Jones, and myself Discussion: https://postgr.es/m/0a29b954b10b57f0d135fe12aa0909bd41883eb0.camel@cybertec.at
* Rename force_parallel_mode to debug_parallel_queryDavid Rowley2023-02-15
| | | | | | | | | | | | | | | | | | | | force_parallel_mode is meant to be used to allow us to exercise the parallel query infrastructure to ensure that it's working as we expect. It seems some users think this GUC is for forcing the query planner into picking a parallel plan regardless of the costs. A quick look at the documentation would have made them realize that they were wrong, but the GUC is likely too conveniently named which, evidently, seems to often result in users expecting that it forces the planner into usefully parallelizing queries. Here we rename the GUC to something which casual users are less likely to mistakenly think is what they need to make their query run more quickly. For now, the old name can still be used. We'll revisit if the old name mapping can be removed once the buildfarm configs are all updated. Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAApHDvrsOi92_uA7PEaHZMH-S4Xv+MGhQWA+GrP8b1kjpS1HjQ@mail.gmail.com
* Remove dead NoMovementScanDirection codeDavid Rowley2023-02-01
| | | | | | | | | | | | | | | | | | | | | | Here remove some dead code from heapgettup() and heapgettup_pagemode() which was trying to support NoMovementScanDirection scans. This code can never be reached as standard_ExecutorRun() never calls ExecutePlan with NoMovementScanDirection. Additionally, plans which were scanning an unordered index would use NoMovementScanDirection rather than ForwardScanDirection. There was no real need for this, so here we adjust this so we use ForwardScanDirection for unordered index scans. A comment in pathnodes.h claimed that NoMovementScanDirection was used for PathKey reasons, but if that was true, it no longer is, per code in build_index_paths(). This does change the non-text format of the EXPLAIN output so that unordered index scans now have a "Forward" scan direction rather than "NoMovement". The text format of EXPLAIN has not changed. Author: Melanie Plageman Reviewed-by: Tom Lane, David Rowley Discussion: https://postgr.es/m/CAAKRu_bvkhka0CZQun28KTqhuUh5ZqY=_T8QEqZqOL02rpi2bw@mail.gmail.com
* Make Vars be outer-join-aware.Tom Lane2023-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Traditionally we used the same Var struct to represent the value of a table column everywhere in parse and plan trees. This choice predates our support for SQL outer joins, and it's really a pretty bad idea with outer joins, because the Var's value can depend on where it is in the tree: it might go to NULL above an outer join. So expression nodes that are equal() per equalfuncs.c might not represent the same value, which is a huge correctness hazard for the planner. To improve this, decorate Var nodes with a bitmapset showing which outer joins (identified by RTE indexes) may have nulled them at the point in the parse tree where the Var appears. This allows us to trust that equal() Vars represent the same value. A certain amount of klugery is still needed to cope with cases where we re-order two outer joins, but it's possible to make it work without sacrificing that core principle. PlaceHolderVars receive similar decoration for the same reason. In the planner, we include these outer join bitmapsets into the relids that an expression is considered to depend on, and in consequence also add outer-join relids to the relids of join RelOptInfos. This allows us to correctly perceive whether an expression can be calculated above or below a particular outer join. This change affects FDWs that want to plan foreign joins. They *must* follow suit when labeling foreign joins in order to match with the core planner, but for many purposes (if postgres_fdw is any guide) they'd prefer to consider only base relations within the join. To support both requirements, redefine ForeignScan.fs_relids as base+OJ relids, and add a new field fs_base_relids that's set up by the core planner. Large though it is, this commit just does the minimum necessary to install the new mechanisms and get check-world passing again. Follow-up patches will perform some cleanup. (The README additions and comments mention some stuff that will appear in the follow-up.) Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/830269.1656693747@sss.pgh.pa.us
* Make auto_explain print the query identifier in verbose modeMichael Paquier2023-01-26
| | | | | | | | | | | | | | | | | | | | | | | When auto_explain.log_verbose is on, auto_explain should print in the logs plans equivalent to the EXPLAIN (VERBOSE). However, when compute_query_id is on, query identifiers were not showing up, being only handled by EXPLAIN (VERBOSE). This brings auto_explain on par with EXPLAIN regarding that. Note that like EXPLAIN, auto_explain does not show the query identifier when compute_query_id=regress. The change is done so as the choice of printing the query identifier is done in ExplainPrintPlan() rather than in ExplainOnePlan(), to avoid a duplication of the logic dealing with the query ID. auto_explain is the only in-core caller of ExplainPrintPlan(). While looking at the area, I have noticed that more consolidation between EXPLAIN and auto_explain would be in order for the logging of the plan duration and the buffer usage. This refactoring is left as a future change. Author: Atsushi Torikoshi Reviewed-by: Justin Pryzby, Julien Rouhaud Discussion: https://postgr.es/m/1ea21936981f161bccfce05765c03bee@oss.nttdata.com
* Use appendStringInfoSpaces in more placesDavid Rowley2023-01-20
| | | | | | | | | | | | | | | | | | | | | | | This adjusts a few places which were appending a string constant containing spaces onto a StringInfo. We have appendStringInfoSpaces for that job, so let's use that instead. For the change to jsonb.c's add_indent() function, appendStringInfoString was being called inside a loop to append 4 spaces on each loop. This meant that enlargeStringInfo would get called once per loop. Here it should be much more efficient to get rid of the loop and just calculate the number of spaces with "level * 4" and just append all the spaces in one go. Here we additionally adjust the appendStringInfoSpaces function so it makes use of memset rather than a while loop to apply the required spaces to the StringInfo. One of the problems with the while loop was that it was incrementing one variable and decrementing another variable once per loop. That's more work than what's required to get the job done. We may as well use memset for this rather than trying to optimize the existing loop. Some testing has shown memset is faster even for very small sizes. Discussion: https://postgr.es/m/CAApHDvp_rKkvwudBKgBHniNRg67bzXVjyvVKfX0G2zS967K43A@mail.gmail.com
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Harmonize more parameter names in bulk.Peter Geoghegan2022-09-20
| | | | | | | | | | | | | | | | Make sure that function declarations use names that exactly match the corresponding names from function definitions in optimizer, parser, utility, libpq, and "commands" code, as well as in remaining library code. Do the same for all code related to frontend programs (with the exception of pg_dump/pg_dumpall related code). Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Later commits will handle ecpg and pg_dump/pg_dumpall. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
* Revert SQL/JSON featuresAndrew Dunstan2022-09-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reverts the following and makes some associated cleanups: commit f79b803dc: Common SQL/JSON clauses commit f4fb45d15: SQL/JSON constructors commit 5f0adec25: Make STRING an unreserved_keyword. commit 33a377608: IS JSON predicate commit 1a36bc9db: SQL/JSON query functions commit 606948b05: SQL JSON functions commit 49082c2cc: RETURNING clause for JSON() and JSON_SCALAR() commit 4e34747c8: JSON_TABLE commit fadb48b00: PLAN clauses for JSON_TABLE commit 2ef6f11b0: Reduce running time of jsonb_sqljson test commit 14d3f24fa: Further improve jsonb_sqljson parallel test commit a6baa4bad: Documentation for SQL/JSON features commit b46bcf7a4: Improve readability of SQL/JSON documentation. commit 112fdb352: Fix finalization for json_objectagg and friends commit fcdb35c32: Fix transformJsonBehavior commit 4cd8717af: Improve a couple of sql/json error messages commit f7a605f63: Small cleanups in SQL/JSON code commit 9c3d25e17: Fix JSON_OBJECTAGG uniquefying bug commit a79153b7a: Claim SQL standard compliance for SQL/JSON features commit a1e7616d6: Rework SQL/JSON documentation commit 8d9f9634e: Fix errors in copyfuncs/equalfuncs support for JSON node types. commit 3c633f32b: Only allow returning string types or bytea from json_serialize commit 67b26703b: expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size. The release notes are also adjusted. Backpatch to release 15. Discussion: https://postgr.es/m/40d2c882-bcac-19a9-754d-4299e1d87ac7@postgresql.org
* Fix formatting and comment typosJohn Naylor2022-08-04
| | | | | | Justin Pryzby Discussion: https://www.postgresql.org/message-id/20220801181136.GJ15006%40telsasoft.com
* autho_explain: Add GUC to log query parametersMichael Paquier2022-07-06
| | | | | | | | | | | | | auto_explain.log_parameter_max_length is a new GUC part of the extension, similar to the corresponding core setting, that controls the inclusion of query parameters in the logged explain output. More tests are added to check the behavior of this new parameter: when parameters logged in full (the default of -1), when disabled (value of 0) and when partially truncated (value different than the two others). Author: Dagfinn Ilmari Mannsåker Discussion: https://postgr.es/m/87ee09mohb.fsf@wibble.ilmari.org
* Fix EXPLAIN MERGE output when no tuples are processedAlvaro Herrera2022-05-18
| | | | | | | | | An 'else' clause was misplaced in commit 598ac10be1c2, making zero-rows output look a bit silly. Add a test case for it. Pointed out by Tom Lane. Discussion: https://postgr.es/m/21030.1652893083@sss.pgh.pa.us
* Make EXPLAIN MERGE output format more compactAlvaro Herrera2022-05-18
| | | | | | | | | | We can use a single line to print all tuple counts that MERGE processed, for conciseness, and elide those that are zeroes. Non-text formats report all numbers, as is typical. Per comment from Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20220511163350.GL19626@telsasoft.com
* Pre-beta mechanical code beautification.Tom Lane2022-05-12
| | | | | Run pgindent, pgperltidy, and reformat-dat-files. I manually fixed a couple of comments that pgindent uglified.
* Track I/O timing for temporary file blocks in EXPLAIN (BUFFERS)Michael Paquier2022-04-08
| | | | | | | | | | | | | | | | | | | | Previously, the output of EXPLAIN (BUFFERS) option showed only the I/O timing spent reading and writing shared and local buffers. This commit adds on top of that the I/O timing for temporary buffers in the output of EXPLAIN (for spilled external sorts, hashes, materialization. etc). This can be helpful for users in cases where the I/O related to temporary buffers is the bottleneck. Like its cousin, this information is available only when track_io_timing is enabled. Playing the patch, this is showing an extra overhead of up to 1% even when using gettimeofday() as implementation for interval timings, which is slightly within the usual range noise still that's measurable. Author: Masahiko Sawada Reviewed-by: Georgios Kokolatos, Melanie Plageman, Julien Rouhaud, Ranier Vilela Discussion: https://postgr.es/m/CAD21AoAJgotTeP83p6HiAGDhs_9Fw9pZ2J=_tYTsiO5Ob-V5GQ@mail.gmail.com
* Teach planner and executor about monotonic window funcsDavid Rowley2022-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Window functions such as row_number() always return a value higher than the previously returned value for tuples in any given window partition. Traditionally queries such as; SELECT * FROM ( SELECT *, row_number() over (order by c) rn FROM t ) t WHERE rn <= 10; were executed fairly inefficiently. Neither the query planner nor the executor knew that once rn made it to 11 that nothing further would match the outer query's WHERE clause. It would blindly continue until all tuples were exhausted from the subquery. Here we implement means to make the above execute more efficiently. This is done by way of adding a pg_proc.prosupport function to various of the built-in window functions and adding supporting code to allow the support function to inform the planner if the window function is monotonically increasing, monotonically decreasing, both or neither. The planner is then able to make use of that information and possibly allow the executor to short-circuit execution by way of adding a "run condition" to the WindowAgg to allow it to determine if some of its execution work can be skipped. This "run condition" is not like a normal filter. These run conditions are only built using quals comparing values to monotonic window functions. For monotonic increasing functions, quals making use of the btree operators for <, <= and = can be used (assuming the window function column is on the left). You can see here that once such a condition becomes false that a monotonic increasing function could never make it subsequently true again. For monotonically decreasing functions the >, >= and = btree operators for the given type can be used for run conditions. The best-case situation for this is when there is a single WindowAgg node without a PARTITION BY clause. Here when the run condition becomes false the WindowAgg node can simply return NULL. No more tuples will ever match the run condition. It's a little more complex when there is a PARTITION BY clause. In this case, we cannot return NULL as we must still process other partitions. To speed this case up we pull tuples from the outer plan to check if they're from the same partition and simply discard them if they are. When we find a tuple belonging to another partition we start processing as normal again until the run condition becomes false or we run out of tuples to process. When there are multiple WindowAgg nodes to evaluate then this complicates the situation. For intermediate WindowAggs we must ensure we always return all tuples to the calling node. Any filtering done could lead to incorrect results in WindowAgg nodes above. For all intermediate nodes, we can still save some work when the run condition becomes false. We've no need to evaluate the WindowFuncs anymore. Other WindowAgg nodes cannot reference the value of these and these tuples will not appear in the final result anyway. The savings here are small in comparison to what can be saved in the top-level WingowAgg, but still worthwhile. Intermediate WindowAgg nodes never filter out tuples, but here we change WindowAgg so that the top-level WindowAgg filters out tuples that don't match the intermediate WindowAgg node's run condition. Such filters appear in the "Filter" clause in EXPLAIN for the top-level WindowAgg node. Here we add prosupport functions to allow the above to work for; row_number(), rank(), dense_rank(), count(*) and count(expr). It appears technically possible to do the same for min() and max(), however, it seems unlikely to be useful enough, so that's not done here. Bump catversion Author: David Rowley Reviewed-by: Andy Fan, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvqvp3At8++yF8ij06sdcoo1S_b2YoaT9D4Nf+MObzsrLQ@mail.gmail.com
* JSON_TABLEAndrew Dunstan2022-04-04
| | | | | | | | | | | | | | | | This feature allows jsonb data to be treated as a table and thus used in a FROM clause like other tabular data. Data can be selected from the jsonb using jsonpath expressions, and hoisted out of nested structures in the jsonb to form multiple rows, more or less like an outer join. Nikita Glukhov Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zhihong Yu (whose name I previously misspelled), Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby. Discussion: https://postgr.es/m/7e2cb85d-24cf-4abb-30a5-1a33715959bd@postgrespro.ru
* Add support for MERGE SQL commandAlvaro Herrera2022-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MERGE performs actions that modify rows in the target table using a source table or query. MERGE provides a single SQL statement that can conditionally INSERT/UPDATE/DELETE rows -- a task that would otherwise require multiple PL statements. For example, MERGE INTO target AS t USING source AS s ON t.tid = s.sid WHEN MATCHED AND t.balance > s.delta THEN UPDATE SET balance = t.balance - s.delta WHEN MATCHED THEN DELETE WHEN NOT MATCHED AND s.delta > 0 THEN INSERT VALUES (s.sid, s.delta) WHEN NOT MATCHED THEN DO NOTHING; MERGE works with regular tables, partitioned tables and inheritance hierarchies, including column and row security enforcement, as well as support for row and statement triggers and transition tables therein. MERGE is optimized for OLTP and is parameterizable, though also useful for large scale ETL/ELT. MERGE is not intended to be used in preference to existing single SQL commands for INSERT, UPDATE or DELETE since there is some overhead. MERGE can be used from PL/pgSQL. MERGE does not support targetting updatable views or foreign tables, and RETURNING clauses are not allowed either. These limitations are likely fixable with sufficient effort. Rewrite rules are also not supported, but it's not clear that we'd want to support them. Author: Pavan Deolasee <pavan.deolasee@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Amit Langote <amitlangote09@gmail.com> Author: Simon Riggs <simon.riggs@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Andres Freund <andres@anarazel.de> (earlier versions) Reviewed-by: Peter Geoghegan <pg@bowt.ie> (earlier versions) Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CANP8+jKitBSrB7oTgT9CY2i1ObfOt36z0XMraQc+Xrz8QB0nXA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkJdBuxj9PO=2QaO9-3h3xGbQPZ34kJH=HukRekwM-GZg@mail.gmail.com Discussion: https://postgr.es/m/20201231134736.GA25392@alvherre.pgsql
* Fix collection of typos in the code and the documentationMichael Paquier2022-03-15
| | | | | | | | Some words were duplicated while other places were grammatically incorrect, including one variable name in the code. Author: Otto Kekalainen, Justin Pryzby Discussion: https://postgr.es/m/7DDBEFC5-09B6-4325-B942-B563D1A24BDC@amazon.com
* Add compute_query_id = regressMichael Paquier2022-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | "regress" is a new mode added to compute_query_id aimed at facilitating regression testing when a module computing query IDs is loaded into the backend, like pg_stat_statements. It works the same way as "auto", meaning that query IDs are computed if a module enables it, except that query IDs are hidden in EXPLAIN outputs to ensure regression output stability. Like any GUCs of the kind (force_parallel_mode, etc.), this new configuration can be added to an instance's postgresql.conf, or just passed down with PGOPTIONS at command level. compute_query_id uses an enum for its set of option values, meaning that this addition ensures ABI compatibility. Using this new configuration mode allows installcheck-world to pass when running the tests on an instance with pg_stat_statements enabled, stabilizing the test output while checking the paths doing query ID computations. Reported-by: Anton Melnikov Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/1634283396.372373993@f75.i.mail.ru Discussion: https://postgr.es/m/YgHlxgc/OimuPYhH@paquier.xyz Backpatch-through: 14
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Fix index-only scan plans, take 2.Tom Lane2022-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4ace45677 failed to fix the problem fully, because the same issue of attempting to fetch a non-returnable index column can occur when rechecking the indexqual after using a lossy index operator. Moreover, it broke EXPLAIN for such indexquals (which indicates a gap in our test cases :-(). Revert the code changes of 4ace45677 in favor of adding a new field to struct IndexOnlyScan, containing a version of the indexqual that can be executed against the index-returned tuple without using any non-returnable columns. (The restrictions imposed by check_index_only guarantee this is possible, although we may have to recompute indexed expressions.) Support construction of that during setrefs.c processing by marking IndexOnlyScan.indextlist entries as resjunk if they can't be returned, rather than removing them entirely. (We could alternatively require setrefs.c to look up the IndexOptInfo again, but abusing resjunk this way seems like a reasonably safe way to avoid needing to do that.) This solution isn't great from an API-stability standpoint: if there are any extensions out there that build IndexOnlyScan structs directly, they'll be broken in the next minor releases. However, only a very invasive extension would be likely to do such a thing. There's no change in the Path representation, so typical planner extensions shouldn't have a problem. As before, back-patch to all supported branches. Discussion: https://postgr.es/m/3179992.1641150853@sss.pgh.pa.us Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org
* Allow Memoize to operate in binary comparison modeDavid Rowley2021-11-24
| | | | | | | | | | | | | | | | | | | | | | Memoize would always use the hash equality operator for the cache key types to determine if the current set of parameters were the same as some previously cached set. Certain types such as floating points where -0.0 and +0.0 differ in their binary representation but are classed as equal by the hash equality operator may cause problems as unless the join uses the same operator it's possible that whichever join operator is being used would be able to distinguish the two values. In which case we may accidentally return in the incorrect rows out of the cache. To fix this here we add a binary mode to Memoize to allow it to the current set of parameters to previously cached values by comparing bit-by-bit rather than logically using the hash equality operator. This binary mode is always used for LATERAL joins and it's used for normal joins when any of the join operators are not hashable. Reported-by: Tom Lane Author: David Rowley Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us Backpatch-through: 14, where Memoize was added
* Use ExplainPropertyInteger for queryid in EXPLAINDavid Rowley2021-08-09
| | | | | | | | | | | This saves a few lines of code. Also add a comment to mention why we use ExplainPropertyInteger instead of ExplainPropertyUInteger given that queryid is a uint64 type. Author: David Rowley Reviewed-by: Julien Rouhaud Discussion: https://postgr.es/m/CAApHDvqhSLYpSU_EqUdN39w9Uvb8ogmHV7_3YhJ0S3aScGBjsg@mail.gmail.com Backpatch-through: 14, where this code was originally added
* Use the "pg_temp" schema alias in EXPLAIN and related output.Tom Lane2021-07-27
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch causes EXPLAIN output to refer to objects that are in the current session's temp schema with the "pg_temp" schema alias rather than that schema's actual name. This is useful for our own testing purposes since it will stabilize EXPLAIN VERBOSE output for such cases, allowing us to use that in regression tests. It should be less confusing for end users too. Since ruleutils.c needs to change behavior for this, the change also leaks into a few other users of ruleutils.c, for example pg_get_viewdef(). AFAICS that won't cause any problems. We did find that aggressively trying to change this behavior across-the-board would cause issues, but as long as "pg_temp" only appears within generated SQL text, I think it'll be fine. Along the way, make get_namespace_name_or_temp conform to the same API as get_namespace_name, ie that it returns a palloc'd string or NULL. The current behavior hasn't caused any bugs since no callers attempt to pfree the result, but if it gets more widespread usage that could become a problem. Amul Sul, reviewed and extended by me Discussion: https://postgr.es/m/CAAJ_b97W=QaGmag9AhWNbmx3uEYsNkXWL+OVW1_E1D3BtgWvtw@mail.gmail.com
* Change the name of the Result Cache node to MemoizeDavid Rowley2021-07-14
| | | | | | | | | | | "Result Cache" was never a great name for this node, but nobody managed to come up with another name that anyone liked enough. That was until David Johnston mentioned "Node Memoization", which Tom Lane revised to just "Memoize". People seem to like "Memoize", so let's do the rename. Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/20210708165145.GG1176@momjian.us Backpatch-through: 14, where Result Cache was introduced
* Centralize the logic for protective copying of utility statements.Tom Lane2021-06-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the "simple Query" code path, it's fine for parse analysis or execution of a utility statement to scribble on the statement's node tree, since that'll just be thrown away afterwards. However it's not fine if the node tree is in the plan cache, as then it'd be corrupted for subsequent executions. Up to now we've dealt with that by having individual utility-statement functions apply copyObject() if they were going to modify the tree. But that's prone to errors of omission. Bug #17053 from Charles Samborski shows that CREATE/ALTER DOMAIN didn't get this memo, and can crash if executed repeatedly from plan cache. In the back branches, we'll just apply a narrow band-aid for that, but in HEAD it seems prudent to have a more principled fix that will close off the possibility of other similar bugs in future. Hence, let's hoist the responsibility for doing copyObject up into ProcessUtility from its children, thus ensuring that it happens for all utility statement types. Also, modify ProcessUtility's API so that its callers can tell it whether a copy step is necessary. It turns out that in all cases, the immediate caller knows whether the node tree is transient, so this doesn't involve a huge amount of code thrashing. In this way, while we lose a little bit in the execute-from-cache code path due to sometimes copying node trees that wouldn't be mutated anyway, we gain something in the simple-Query code path by not copying throwaway node trees. Statements that are complex enough to be expensive to copy are almost certainly ones that would have to be copied anyway, so the loss in the cache code path shouldn't be much. (Note that this whole problem applies only to utility statements. Optimizable statements don't have the issue because we long ago made the executor treat Plan trees as read-only. Perhaps someday we will make utility statement execution act likewise, but I'm not holding my breath.) Discussion: https://postgr.es/m/931771.1623893989@sss.pgh.pa.us Discussion: https://postgr.es/m/17053-3ca3f501bbc212b4@postgresql.org
* Allow compute_query_id to be set to 'auto' and make it defaultAlvaro Herrera2021-05-15
| | | | | | | | | | | | | | | | | Allowing only on/off meant that all either all existing configuration guides would become obsolete if we disabled it by default, or that we would have to accept a performance loss in the default config if we enabled it by default. By allowing 'auto' as a middle ground, the performance cost is only paid by those who enable pg_stat_statements and similar modules. I only edited the release notes to comment-out a paragraph that is now factually wrong; further edits are probably needed to describe the related change in more detail. Author: Julien Rouhaud <rjuju123@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20210513002623.eugftm4nk2lvvks3@nol
* Initial pgindent and pgperltidy run for v14.Tom Lane2021-05-12
| | | | | | | | Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.
* Change data type of counters in BufferUsage and WalUsage from long to int64.Fujii Masao2021-05-12
| | | | | | | | | | | | | | | | Previously long was used as the data type for some counters in BufferUsage and WalUsage. But long is only four byte, e.g., on Windows, and it's entirely possible to wrap a four byte counter. For example, emitting more than four billion WAL records in one transaction isn't actually particularly rare. To avoid the overflows of those counters, this commit changes the data type of them from long to int64. Suggested-by: Andres Freund Author: Masahiro Ikeda Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20201221211650.k7b53tcnadrciqjo@alap3.anarazel.de Discussion: https://postgr.es/m/af0964ac-7080-1984-dc23-513754987716@oss.nttdata.com
* Adjust EXPLAIN output for parallel Result Cache plansDavid Rowley2021-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here we adjust the EXPLAIN ANALYZE output for Result Cache so that we don't show any Result Cache stats for parallel workers who don't contribute anything to Result Cache plan nodes. I originally had ideas that workers who don't help could still have their Result Cache stats displayed. The idea with that was so that I could write some parallel Result Cache regression tests that show the EXPLAIN ANALYZE output. However, I realized a little too late that such tests would just not be possible to have run in a stable way on the buildfarm. With that knowledge, before 9eacee2e6 went in, I had removed all of the tests that were showing the EXPLAIN ANALYZE output of a parallel Result Cache plan, however, I forgot to put back the code that adjusts the EXPLAIN output to hide the Result Cache stats for parallel workers who were not fast enough to help out before query execution was over. All other nodes behave this way and so should Result Cache. Additionally, with this change, it now seems safe enough to remove the SET force_parallel_mode = off that I had added to the regression tests. Also, perform some cleanup in the partition_prune tests. I had adjusted the explain_parallel_append() function to sanitize the Result Cache EXPLAIN ANALYZE output. However, since I didn't actually include any parallel Result Cache tests that show their EXPLAIN ANALYZE output, that code does nothing and can be removed. In passing, move the setting of memPeakKb into the scope where it's used. Reported-by: Amit Khandekar Author: David Rowley, Amit Khandekar Discussion: https://postgr.es/m/CAJ3gD9d8SkfY95GpM1zmsOtX2-Ogx5q-WLsf8f0ykEb0hCRK3w@mail.gmail.com
* Fix wrong units in two ExplainPropertyFloat calls.Tom Lane2021-04-16
| | | | | | | | | | This is only a latent bug, since these calls are only reached for non-text output formats, and currently none of those will print the units. Still, we should get it right in case that ever changes. Justin Pryzby Discussion: https://postgr.es/m/20210415163846.GA3315@telsasoft.com
* Make use of in-core query id added by commit 5fd9dfa5f5Bruce Momjian2021-04-07
| | | | | | | | | | | | | | | | | | | | | | | Use the in-core query id computation for pg_stat_activity, log_line_prefix, and EXPLAIN VERBOSE. Similar to other fields in pg_stat_activity, only the queryid from the top level statements are exposed, and if the backends status isn't active then the queryid from the last executed statements is displayed. Add a %Q placeholder to include the queryid in log_line_prefix, which will also only expose top level statements. For EXPLAIN VERBOSE, if a query identifier has been computed, either by enabling compute_query_id or using a third-party module, display it. Bump catalog version. Discussion: https://postgr.es/m/20210407125726.tkvjdbw76hxnpwfi@nol Author: Julien Rouhaud Reviewed-by: Alvaro Herrera, Nitin Jadhav, Zhihong Yu
* Add Result Cache executor node (take 2)David Rowley2021-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu, Hou Zhijie Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
* Revert b6002a796David Rowley2021-04-01
| | | | | | | | | | | | | This removes "Add Result Cache executor node". It seems that something weird is going on with the tracking of cache hits and misses as highlighted by many buildfarm animals. It's not yet clear what the problem is as other parts of the plan indicate that the cache did work correctly, it's just the hits and misses that were being reported as 0. This is especially a bad time to have the buildfarm so broken, so reverting before too many more animals go red. Discussion: https://postgr.es/m/CAApHDvq_hydhfovm4=izgWs+C5HqEeRScjMbOgbpC-jRAeK3Yw@mail.gmail.com
* Add Result Cache executor nodeDavid Rowley2021-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here we add a new executor node type named "Result Cache". The planner can include this node type in the plan to have the executor cache the results from the inner side of parameterized nested loop joins. This allows caching of tuples for sets of parameters so that in the event that the node sees the same parameter values again, it can just return the cached tuples instead of rescanning the inner side of the join all over again. Internally, result cache uses a hash table in order to quickly find tuples that have been previously cached. For certain data sets, this can significantly improve the performance of joins. The best cases for using this new node type are for join problems where a large portion of the tuples from the inner side of the join have no join partner on the outer side of the join. In such cases, hash join would have to hash values that are never looked up, thus bloating the hash table and possibly causing it to multi-batch. Merge joins would have to skip over all of the unmatched rows. If we use a nested loop join with a result cache, then we only cache tuples that have at least one join partner on the outer side of the join. The benefits of using a parameterized nested loop with a result cache increase when there are fewer distinct values being looked up and the number of lookups of each value is large. Also, hash probes to lookup the cache can be much faster than the hash probe in a hash join as it's common that the result cache's hash table is much smaller than the hash join's due to result cache only caching useful tuples rather than all tuples from the inner side of the join. This variation in hash probe performance is more significant when the hash join's hash table no longer fits into the CPU's L3 cache, but the result cache's hash table does. The apparent "random" access of hash buckets with each hash probe can cause a poor L3 cache hit ratio for large hash tables. Smaller hash tables generally perform better. The hash table used for the cache limits itself to not exceeding work_mem * hash_mem_multiplier in size. We maintain a dlist of keys for this cache and when we're adding new tuples and realize we've exceeded the memory budget, we evict cache entries starting with the least recently used ones until we have enough memory to add the new tuples to the cache. For parameterized nested loop joins, we now consider using one of these result cache nodes in between the nested loop node and its inner node. We determine when this might be useful based on cost, which is primarily driven off of what the expected cache hit ratio will be. Estimating the cache hit ratio relies on having good distinct estimates on the nested loop's parameters. For now, the planner will only consider using a result cache for parameterized nested loop joins. This works for both normal joins and also for LATERAL type joins to subqueries. It is possible to use this new node for other uses in the future. For example, to cache results from correlated subqueries. However, that's not done here due to some difficulties obtaining a distinct estimation on the outer plan to calculate the estimated cache hit ratio. Currently we plan the inner plan before planning the outer plan so there is no good way to know if a result cache would be useful or not since we can't estimate the number of times the subplan will be called until the outer plan is generated. The functionality being added here is newly introducing a dependency on the return value of estimate_num_groups() during the join search. Previously, during the join search, we only ever needed to perform selectivity estimations. With this commit, we need to use estimate_num_groups() in order to estimate what the hit ratio on the result cache will be. In simple terms, if we expect 10 distinct values and we expect 1000 outer rows, then we'll estimate the hit ratio to be 99%. Since cache hits are very cheap compared to scanning the underlying nodes on the inner side of the nested loop join, then this will significantly reduce the planner's cost for the join. However, it's fairly easy to see here that things will go bad when estimate_num_groups() incorrectly returns a value that's significantly lower than the actual number of distinct values. If this happens then that may cause us to make use of a nested loop join with a result cache instead of some other join type, such as a merge or hash join. Our distinct estimations have been known to be a source of trouble in the past, so the extra reliance on them here could cause the planner to choose slower plans than it did previous to having this feature. Distinct estimations are also fairly hard to estimate accurately when several tables have been joined already or when a WHERE clause filters out a set of values that are correlated to the expressions we're estimating the number of distinct value for. For now, the costing we perform during query planning for result caches does put quite a bit of faith in the distinct estimations being accurate. When these are accurate then we should generally see faster execution times for plans containing a result cache. However, in the real world, we may find that we need to either change the costings to put less trust in the distinct estimations being accurate or perhaps even disable this feature by default. There's always an element of risk when we teach the query planner to do new tricks that it decides to use that new trick at the wrong time and causes a regression. Users may opt to get the old behavior by turning the feature off using the enable_resultcache GUC. Currently, this is enabled by default. It remains to be seen if we'll maintain that setting for the release. Additionally, the name "Result Cache" is the best name I could think of for this new node at the time I started writing the patch. Nobody seems to strongly dislike the name. A few people did suggest other names but no other name seemed to dominate in the brief discussion that there was about names. Let's allow the beta period to see if the current name pleases enough people. If there's some consensus on a better name, then we can change it before the release. Please see the 2nd discussion link below for the discussion on the "Result Cache" name. Author: David Rowley Reviewed-by: Andy Fan, Justin Pryzby, Zhihong Yu Tested-By: Konstantin Knizhnik Discussion: https://postgr.es/m/CAApHDvrPcQyQdWERGYWx8J%2B2DLUNgXu%2BfOSbQ1UscxrunyXyrQ%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvq=yQXr5kqhRviT2RhNKwToaWr9JAN5t+5_PzhuRJ3wvg@mail.gmail.com
* Rework planning and execution of UPDATE and DELETE.Tom Lane2021-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes two closely related sets of changes: 1. For UPDATE, the subplan of the ModifyTable node now only delivers the new values of the changed columns (i.e., the expressions computed in the query's SET clause) plus row identity information such as CTID. ModifyTable must re-fetch the original tuple to merge in the old values of any unchanged columns. The core advantage of this is that the changed columns are uniform across all tables of an inherited or partitioned target relation, whereas the other columns might not be. A secondary advantage, when the UPDATE involves joins, is that less data needs to pass through the plan tree. The disadvantage of course is an extra fetch of each tuple to be updated. However, that seems to be very nearly free in context; even worst-case tests don't show it to add more than a couple percent to the total query cost. At some point it might be interesting to combine the re-fetch with the tuple access that ModifyTable must do anyway to mark the old tuple dead; but that would require a good deal of refactoring and it seems it wouldn't buy all that much, so this patch doesn't attempt it. 2. For inherited UPDATE/DELETE, instead of generating a separate subplan for each target relation, we now generate a single subplan that is just exactly like a SELECT's plan, then stick ModifyTable on top of that. To let ModifyTable know which target relation a given incoming row refers to, a tableoid junk column is added to the row identity information. This gets rid of the horrid hack that was inheritance_planner(), eliminating O(N^2) planning cost and memory consumption in cases where there were many unprunable target relations. Point 2 of course requires point 1, so that there is a uniform definition of the non-junk columns to be returned by the subplan. We can't insist on uniform definition of the row identity junk columns however, if we want to keep the ability to have both plain and foreign tables in a partitioning hierarchy. Since it wouldn't scale very far to have every child table have its own row identity column, this patch includes provisions to merge similar row identity columns into one column of the subplan result. In particular, we can merge the whole-row Vars typically used as row identity by FDWs into one column by pretending they are type RECORD. (It's still okay for the actual composite Datums to be labeled with the table's rowtype OID, though.) There is more that can be done to file down residual inefficiencies in this patch, but it seems to be committable now. FDW authors should note several API changes: * The argument list for AddForeignUpdateTargets() has changed, and so has the method it must use for adding junk columns to the query. Call add_row_identity_var() instead of manipulating the parse tree directly. You might want to reconsider exactly what you're adding, too. * PlanDirectModify() must now work a little harder to find the ForeignScan plan node; if the foreign table is part of a partitioning hierarchy then the ForeignScan might not be the direct child of ModifyTable. See postgres_fdw for sample code. * To check whether a relation is a target relation, it's no longer sufficient to compare its relid to root->parse->resultRelation. Instead, check it against all_result_relids or leaf_result_relids, as appropriate. Amit Langote and Tom Lane Discussion: https://postgr.es/m/CA+HiwqHpHdqdDn48yCEhynnniahH78rwcrv1rEX65-fsZGBOLQ@mail.gmail.com
* Add support for asynchronous execution.Etsuro Fujita2021-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements asynchronous execution, which runs multiple parts of a non-parallel-aware Append concurrently rather than serially to improve performance when possible. Currently, the only node type that can be run concurrently is a ForeignScan that is an immediate child of such an Append. In the case where such ForeignScans access data on different remote servers, this would run those ForeignScans concurrently, and overlap the remote operations to be performed simultaneously, so it'll improve the performance especially when the operations involve time-consuming ones such as remote join and remote aggregation. We may extend this to other node types such as joins or aggregates over ForeignScans in the future. This also adds the support for postgres_fdw, which is enabled by the table-level/server-level option "async_capable". The default is false. Robert Haas, Kyotaro Horiguchi, Thomas Munro, and myself. This commit is mostly based on the patch proposed by Robert Haas, but also uses stuff from the patch proposed by Kyotaro Horiguchi and from the patch proposed by Thomas Munro. Reviewed by Kyotaro Horiguchi, Konstantin Knizhnik, Andrey Lepikhov, Movead Li, Thomas Munro, Justin Pryzby, and others. Discussion: https://postgr.es/m/CA%2BTgmoaXQEt4tZ03FtQhnzeDEMzBck%2BLrni0UWHVVgOTnA6C1w%40mail.gmail.com Discussion: https://postgr.es/m/CA%2BhUKGLBRyu0rHrDCMC4%3DRn3252gogyp1SjOgG8SEKKZv%3DFwfQ%40mail.gmail.com Discussion: https://postgr.es/m/20200228.170650.667613673625155850.horikyota.ntt%40gmail.com
* Add TID Range Scans to support efficient scanning ranges of TIDsDavid Rowley2021-02-27
| | | | | | | | | | | | | | | | | | | | | This adds a new executor node named TID Range Scan. The query planner will generate paths for TID Range scans when quals are discovered on base relations which search for ranges on the table's ctid column. These ranges may be open at either end. For example, WHERE ctid >= '(10,0)'; will return all tuples on page 10 and over. To support this, two new optional callback functions have been added to table AM. scan_set_tidrange is used to set the scan range to just the given range of TIDs. scan_getnextslot_tidrange fetches the next tuple in the given range. For AMs were scanning ranges of TIDs would not make sense, these functions can be set to NULL in the TableAmRoutine. The query planner won't generate TID Range Scan Paths in that case. Author: Edmund Horner, David Rowley Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
* Fix permission checks on constraint violation errors on partitions.Heikki Linnakangas2021-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a cross-partition UPDATE violates a constraint on the target partition, and the columns in the new partition are in different physical order than in the parent, the error message can reveal columns that the user does not have SELECT permission on. A similar bug was fixed earlier in commit 804b6b6db4. The cause of the bug is that the callers of the ExecBuildSlotValueDescription() function got confused when constructing the list of modified columns. If the tuple was routed from a parent, we converted the tuple to the parent's format, but the list of modified columns was grabbed directly from the child's RTE entry. ExecUpdateLockMode() had a similar issue. That lead to confusion on which columns are key columns, leading to wrong tuple lock being taken on tables referenced by foreign keys, when a row is updated with INSERT ON CONFLICT UPDATE. A new isolation test is added for that corner case. With this patch, the ri_RangeTableIndex field is no longer set for partitions that don't have an entry in the range table. Previously, it was set to the RTE entry of the parent relation, but that was confusing. NOTE: This modifies the ResultRelInfo struct, replacing the ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to backpatch, because it breaks any extensions accessing the field. The change that ri_RangeTableIndex is not set for partitions could potentially break extensions, too. The ResultRelInfos are visible to FDWs at least, and this patch required small changes to postgres_fdw. Nevertheless, this seem like the least bad option. I don't think these fields widely used in extensions; I don't think there are FDWs out there that uses the FDW "direct update" API, other than postgres_fdw. If there is, you will get a compilation error, so hopefully it is caught quickly. Backpatch to 11, where support for both cross-partition UPDATEs, and unique indexes on partitioned tables, were added. Reviewed-by: Amit Langote Security: CVE-2021-3393
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Sanitize IF NOT EXISTS in EXPLAIN for CTAS and matviewsMichael Paquier2020-12-30
| | | | | | | | | | | | | | | | | | IF NOT EXISTS was ignored when specified in an EXPLAIN query for CREATE MATERIALIZED VIEW or CREATE TABLE AS. Hence, if this clause was specified, the caller would get a failure if the relation already exists instead of a success with a NOTICE message. This commit makes the behavior of IF NOT EXISTS in EXPLAIN consistent with the non-EXPLAIN'd DDL queries, preventing a failure with IF NOT EXISTS if the relation to-be-created already exists. The skip is done before the SELECT query used for the relation is planned or executed, and a "dummy" plan is generated instead depending on the format used by EXPLAIN. Author: Bharath Rupireddy Reviewed-by: Zhijie Hou, Michael Paquier Discussion: https://postgr.es/m/CALj2ACVa3oJ9O_wcGd+FtHWZds04dEKcakxphGz5POVgD4wC7Q@mail.gmail.com
* Fix broken XML formatting in EXPLAIN output for incremental sorts.Tom Lane2020-10-23
| | | | | | | | | | The ExplainCloseGroup arguments for incremental sort usage data didn't match the corresponding ExplainOpenGroup. This only matters for XML-format output, which is probably why we'd not noticed. Daniel Gustafsson, per bug #16683 from Frits Jalvingh Discussion: https://postgr.es/m/16683-8005033324ad34e9@postgresql.org
* Fixup some appendStringInfo and appendPQExpBuffer callsDavid Rowley2020-10-15
| | | | | | | | | | | | | | | | | | A number of places were using appendStringInfo() when they could have been using appendStringInfoString() instead. While there's no functionality change there, it's just more efficient to use appendStringInfoString() when no formatting is required. Likewise for some appendStringInfoString() calls which were just appending a single char. We can just use appendStringInfoChar() for that. Additionally, many places were using appendPQExpBuffer() when they could have used appendPQExpBufferStr(). Change those too. Patch by Zhijie Hou, but further searching by me found significantly more places that deserved the same treatment. Author: Zhijie Hou, David Rowley Discussion: https://postgr.es/m/cb172cf4361e4c7ba7167429070979d4@G08CNEXMBPEKD05.g08.fujitsu.local
* Create ResultRelInfos later in InitPlan, index them by RT index.Heikki Linnakangas2020-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of allocating all the ResultRelInfos upfront in one big array, allocate them in ExecInitModifyTable(). es_result_relations is now an array of ResultRelInfo pointers, rather than an array of structs, and it is indexed by the RT index. This simplifies things: we get rid of the separate concept of a "result rel index", and don't need to set it in setrefs.c anymore. This also allows follow-up optimizations (not included in this commit yet) to skip initializing ResultRelInfos for target relations that were not needed at runtime, and removal of the es_result_relation_info pointer. The EState arrays of regular result rels and root result rels are merged into one array. Similarly, the resultRelations and rootResultRelations lists in PlannedStmt are merged into one. It's not actually clear to me why they were kept separate in the first place, but now that the es_result_relations array is indexed by RT index, it certainly seems pointless. The PlannedStmt->resultRelations list is now only needed for ExecRelationIsTargetRelation(). One visible effect of this change is that ExecRelationIsTargetRelation() will now return 'true' also for the partition root, if a partitioned table is updated. That seems like a good thing, although the function isn't used in core code, and I don't see any reason for an FDW to call it on a partition root. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
* Rework EXPLAIN for planner's buffer usage.Fujii Masao2020-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ce77abe63c allowed EXPLAIN (BUFFERS) to report the information on buffer usage during planning phase. However three issues were reported regarding this feature. (1) Previously, EXPLAIN option BUFFERS required ANALYZE. So the query had to be actually executed by specifying ANALYZE even when we want to see only the planner's buffer usage. This was inconvenient especially when the query was write one like DELETE. (2) EXPLAIN included the planner's buffer usage in summary information. So SUMMARY option had to be enabled to report that. Also this format was confusing. (3) The output structure for planning information was not consistent between TEXT format and the others. For example, "Planning" tag was output in JSON format, but not in TEXT format. For (1), this commit allows us to perform EXPLAIN (BUFFERS) without ANALYZE to report the planner's buffer usage. For (2), this commit changed EXPLAIN output so that the planner's buffer usage is reported before summary information. For (3), this commit made the output structure for planning information more consistent between the formats. Back-patch to v13 where the planner's buffer usage was allowed to be reported in EXPLAIN. Reported-by: Pierre Giraud, David Rowley Author: Fujii Masao Reviewed-by: David Rowley, Julien Rouhaud, Pierre Giraud Discussion: https://postgr.es/m/07b226e6-fa49-687f-b110-b7c37572f69e@dalibo.com
* Fix bogus EXPLAIN output for Hash AggregateDavid Rowley2020-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | 9bdb300de modified the EXPLAIN output for Hash Aggregate to show details from parallel workers. However, it neglected to consider that a given parallel worker may not have assisted with the given Hash Aggregate. This can occur when workers fail to start or during Parallel Append with enable_partitionwise_join enabled when only a single worker is working on a non-parallel aware sub-plan. It could also happen if a worker simply wasn't fast enough to get any work done before other processes went and finished all the work. The bogus output came from the fact that ExplainOpenWorker() skipped showing any details for non-initialized workers but show_hashagg_info() did show details from the worker. This meant that the worker properties that were shown were not properly attributed to the worker that they belong to. In passing, we also now don't show Hash Aggregate properties for the leader process when it did not contribute any work to the Hash Aggregate. This can occur either during Parallel Append when only a parallel worker worked on a given sub plan or with parallel_leader_participation set to off. This aims to make the behavior of Hash Aggregate's EXPLAIN output more similar to Sort's. Reported-by: Justin Pryzby Discussion: https://postgr.es/m/20200805012105.GZ28072%40telsasoft.com Backpatch-through: 13, where the original breakage was introduced