aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor
Commit message (Collapse)AuthorAge
...
* Don't use static storage for SaveTransactionCharacteristics().Tom Lane2022-02-28
| | | | | | | | | | | This is pretty queasy-making on general principles, and the more so once you notice that CommitTransactionCommand() is actually stomping on the values saved by _SPI_commit(). It's okay as long as the active values didn't change during HoldPinnedPortals(); but that's a larger assumption than I think we want to make, especially since the fix is so simple. Discussion: https://postgr.es/m/1533956.1645731245@sss.pgh.pa.us
* Fix SPI's handling of errors during transaction commit.Tom Lane2022-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPI_commit previously left it up to the caller to recover from any error occurring during commit. Since that's complicated and requires use of low-level xact.c facilities, it's not too surprising that no caller got it right. Let's move the responsibility for cleanup into spi.c. Doing that requires redefining SPI_commit as starting a new transaction, so that it becomes equivalent to SPI_commit_and_chain except that you get default transaction characteristics instead of preserving the prior transaction's characteristics. We can make this pretty transparent API-wise by redefining SPI_start_transaction() as a no-op. Callers that expect to do something in between might be surprised, but available evidence is that no callers do so. Having made that API redefinition, we can fix this mess by having SPI_commit[_and_chain] trap errors and start a new, clean transaction before re-throwing the error. Likewise for SPI_rollback[_and_chain]. Some cleanup is also needed in AtEOXact_SPI, which was nowhere near smart enough to deal with SPI contexts nested inside a committing context. While plperl and pltcl need no changes beyond removing their now-useless SPI_start_transaction() calls, plpython needs some more work because it hadn't gotten the memo about catching commit/rollback errors in the first place. Such an error resulted in longjmp'ing out of the Python interpreter, which leaks Python stack entries at present and is reported to crash Python 3.11 altogether. Add the missing logic to catch such errors and convert them into Python exceptions. We are probably going to have to back-patch this once Python 3.11 ships, but it's a sufficiently basic change that I'm a bit nervous about doing so immediately. Let's let it bake awhile in HEAD first. Peter Eisentraut and Tom Lane Discussion: https://postgr.es/m/3375ffd8-d71c-2565-e348-a597d6e739e3@enterprisedb.com Discussion: https://postgr.es/m/17416-ed8fe5d7213d6c25@postgresql.org
* Fix statenames in mergejoin commentsDaniel Gustafsson2022-02-23
| | | | | | | | The names in the comments were on a few states not consistent with the documented state. Author: Zhihong Yu <zyu@yugabyte.com> Discussion: https://postgr.es/m/CALNJ-vQVthfQXVqmrHR8BKHtC4fMGbhM1xbvJNJAPexTq_dH=w@mail.gmail.com
* Allow specifying row filters for logical replication of tables.Amit Kapila2022-02-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This feature adds row filtering for publication tables. When a publication is defined or modified, an optional WHERE clause can be specified. Rows that don't satisfy this WHERE clause will be filtered out. This allows a set of tables to be partially replicated. The row filter is per table. A new row filter can be added simply by specifying a WHERE clause after the table name. The WHERE clause must be enclosed by parentheses. The row filter WHERE clause for a table added to a publication that publishes UPDATE and/or DELETE operations must contain only columns that are covered by REPLICA IDENTITY. The row filter WHERE clause for a table added to a publication that publishes INSERT can use any column. If the row filter evaluates to NULL, it is regarded as "false". The WHERE clause only allows simple expressions that don't have user-defined functions, user-defined operators, user-defined types, user-defined collations, non-immutable built-in functions, or references to system columns. These restrictions could be addressed in the future. If you choose to do the initial table synchronization, only data that satisfies the row filters is copied to the subscriber. If the subscription has several publications in which a table has been published with different WHERE clauses, rows that satisfy ANY of the expressions will be copied. If a subscriber is a pre-15 version, the initial table synchronization won't use row filters even if they are defined in the publisher. The row filters are applied before publishing the changes. If the subscription has several publications in which the same table has been published with different filters (for the same publish operation), those expressions get OR'ed together so that rows satisfying any of the expressions will be replicated. This means all the other filters become redundant if (a) one of the publications have no filter at all, (b) one of the publications was created using FOR ALL TABLES, (c) one of the publications was created using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema. If your publication contains a partitioned table, the publication parameter publish_via_partition_root determines if it uses the partition's row filter (if the parameter is false, the default) or the root partitioned table's row filter. Psql commands \dRp+ and \d <table-name> will display any row filters. Author: Hou Zhijie, Euler Taveira, Peter Smith, Ajin Cherian Reviewed-by: Greg Nancarrow, Haiying Tang, Amit Kapila, Tomas Vondra, Dilip Kumar, Vignesh C, Alvaro Herrera, Andres Freund, Wei Wang Discussion: https://www.postgresql.org/message-id/flat/CAHE3wggb715X%2BmK_DitLXF25B%3DjE6xyNCH4YOwM860JR7HarGQ%40mail.gmail.com
* Use bitwise rotate functions in more placesJohn Naylor2022-02-20
| | | | | | | | | | | | | There were a number of places in the code that used bespoke bit-twiddling expressions to do bitwise rotation. While we've had pg_rotate_right32() for a while now, we hadn't gotten around to standardizing on that. Do so now. Since many potential call sites look more natural with the "left" equivalent, add that function too. Reviewed by Tom Lane and Yugo Nagata Discussion: https://www.postgresql.org/message-id/CAFBsxsH7c1LC0CGZ0ADCBXLHU5-%3DKNXx-r7tHYPAW51b2HK4Qw%40mail.gmail.com
* Fix memory leak in IndexScan node with reorderingAlexander Korotkov2022-02-14
| | | | | | | | | | Fix ExecReScanIndexScan() to free the referenced tuples while emptying the priority queue. Backpatch to all supported versions. Discussion: https://postgr.es/m/CAHqSB9gECMENBQmpbv5rvmT3HTaORmMK3Ukg73DsX5H7EJV7jw%40mail.gmail.com Author: Aliaksandr Kalenik Reviewed-by: Tom Lane, Alexander Korotkov Backpatch-through: 10
* Test, don't just Assert, that mergejoin's inputs are in order.Tom Lane2022-02-05
| | | | | | | | | | | | | | | | | There are two Asserts in nodeMergejoin.c that are reachable if the input data is not in the expected order. This seems way too fragile. Alexander Lakhin reported a case where the assertions could be triggered with misconfigured foreign-table partitions, and bitter experience with unstable operating system collation definitions suggests another easy route to hitting them. Neither Assert is in a place where we can't afford one more test-and-branch, so replace 'em with plain test-and-elog logic. Per bug #17395. While the reported symptom is relatively recent, collation changes could happen anytime, so back-patch to all supported branches. Discussion: https://postgr.es/m/17395-8c326292078d1a57@postgresql.org
* Fix compiler warning in non-assert builds, introduced in f862d57057f.Andres Freund2022-02-03
| | | | | Discussion: https://postgr.es/m/20220203183655.ralgkh54sdcgysmn@alap3.anarazel.de Backpatch: 14-, like f862d57057f
* Further fix for EvalPlanQual with mix of local and foreign partitions.Etsuro Fujita2022-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | We assume that direct-modify ForeignScan nodes cannot be re-evaluated during EvalPlanQual processing, but the rework for inherited UPDATE/DELETE in commit 86dc90056 changed things, without considering that, so that such ForeignScan nodes get called as part of the EvalPlanQual subtree during EvalPlanQual processing in the case of an inherited UPDATE/DELETE where the inheritance set contains foreign target relations. To avoid re-evaluating such ForeignScan nodes during EvalPlanQual processing, commit c3928b467 modified nodeForeignscan.c, but the assumption made there that ExecForeignScan() should never be called for such ForeignScan nodes during EvalPlanQual processing turned out to be wrong in some cases, leading to a segmentation fault or a "cannot re-evaluate a Foreign Update or Delete during EvalPlanQual" error. Fix by modifying nodeForeignscan.c further to avoid re-evaluating such ForeignScan nodes even in ExecForeignScan()/ExecReScanForeignScan() during EvalPlanQual processing. Since this makes non-reachable the test-and-elog added to ForeignNext() by commit c3928b467 that produced the aforesaid error, convert the test-and-elog to an Assert. Per bug #17355 from Alexander Lakhin. Back-patch to v14 where both commits came in. Patch by me, reviewed and tested by Alexander Lakhin and Amit Langote. Discussion: https://postgr.es/m/17355-de8e362eb7001a96@postgresql.org
* Fix typo in comment.Etsuro Fujita2022-01-28
|
* Fix memory leak in indexUnchanged hint mechanism.Peter Geoghegan2022-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9dc718bd added a "logically unchanged by UPDATE" hinting mechanism, which is currently used within nbtree indexes only (see commit d168b666). This mechanism determined whether or not the incoming item is a logically unchanged duplicate (a duplicate needed only for MVCC versioning purposes) once per row updated per non-HOT update. This approach led to memory leaks which were noticeable with an UPDATE statement that updated sufficiently many rows, at least on tables that happen to have an expression index. On HEAD, fix the issue by adding a cache to the executor's per-index IndexInfo struct. Take a different approach on Postgres 14 to avoid an ABI break: simply pass down the hint to all indexes unconditionally with non-HOT UPDATEs. This is deemed acceptable because the hint is currently interpreted within btinsert() as "perform a bottom-up index deletion pass if and when the only alternative is splitting the leaf page -- prefer to delete any LP_DEAD-set items first". nbtree must always treat the hint as a noisy signal about what might work, as a strategy of last resort, with costs imposed on non-HOT updaters. (The same thing might not be true within another index AM that applies the hint, which is why the original behavior is preserved on HEAD.) Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Klaudie Willis <Klaudie.Willis@protonmail.com> Diagnosed-By: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/261065.1639497535@sss.pgh.pa.us Backpatch: 14-, where the hinting mechanism was added.
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Fix index-only scan plans, take 2.Tom Lane2022-01-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4ace45677 failed to fix the problem fully, because the same issue of attempting to fetch a non-returnable index column can occur when rechecking the indexqual after using a lossy index operator. Moreover, it broke EXPLAIN for such indexquals (which indicates a gap in our test cases :-(). Revert the code changes of 4ace45677 in favor of adding a new field to struct IndexOnlyScan, containing a version of the indexqual that can be executed against the index-returned tuple without using any non-returnable columns. (The restrictions imposed by check_index_only guarantee this is possible, although we may have to recompute indexed expressions.) Support construction of that during setrefs.c processing by marking IndexOnlyScan.indextlist entries as resjunk if they can't be returned, rather than removing them entirely. (We could alternatively require setrefs.c to look up the IndexOptInfo again, but abusing resjunk this way seems like a reasonably safe way to avoid needing to do that.) This solution isn't great from an API-stability standpoint: if there are any extensions out there that build IndexOnlyScan structs directly, they'll be broken in the next minor releases. However, only a very invasive extension would be likely to do such a thing. There's no change in the Path representation, so typical planner extensions shouldn't have a problem. As before, back-patch to all supported branches. Discussion: https://postgr.es/m/3179992.1641150853@sss.pgh.pa.us Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org
* Always use ReleaseTupleDesc after lookup_rowtype_tupdesc et al.Tom Lane2021-12-15
| | | | | | | | | | | | | | | | | | | | The API spec for lookup_rowtype_tupdesc previously said you could use either ReleaseTupleDesc or DecrTupleDescRefCount. However, the latter choice means the caller must be certain that the returned tupdesc is refcounted. I don't recall right now whether that was always true when this spec was written, but it's certainly not always true since we introduced shared record typcaches for parallel workers. That means that callers using DecrTupleDescRefCount are dependent on typcache behavior details that they probably shouldn't be. Hence, change the API spec to say that you must call ReleaseTupleDesc, and fix the half-dozen callers that weren't. AFAICT this is just future-proofing, there's no live bug here. So no back-patch. Per gripe from Chapman Flack. Discussion: https://postgr.es/m/61B901A4.1050808@anastigmatix.net
* Replace random(), pg_erand48(), etc with a better PRNG API and algorithm.Tom Lane2021-11-28
| | | | | | | | | | | | | | | | | | | Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating a bunch of platform dependencies as well as fundamentally-obsolete PRNG code. In addition, this API replacement will ease replacing the algorithm again in future, should that become necessary. xoroshiro128** is a few percent slower than the drand48 family, but it can produce full-width 64-bit random values not only 48-bit, and it should be much more trustworthy. It's likely to be noticeably faster than the platform's random(), depending on which platform you are thinking about; and we can have non-global state vectors easily, unlike with random(). It is not cryptographically strong, but neither are the functions it replaces. Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
* Flush Memoize cache when non-key parameters change, take 2David Rowley2021-11-24
| | | | | | | | | | | | | | | | It's possible that a subplan below a Memoize node contains a parameter from above the Memoize node. If this parameter changes then cache entries may become out-dated due to the new parameter value. Previously Memoize was mistakenly not aware of this. We fix this here by flushing the cache whenever a parameter that's not part of the cache key changes. Bug: #17213 Reported by: Elvis Pranskevichus Author: David Rowley Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org Backpatch-through: 14, where Memoize was added
* Revert "Flush Memoize cache when non-key parameters change"David Rowley2021-11-24
| | | | This reverts commit 1050048a315790a505465bfcceb26eaf8dbc7e2e.
* Flush Memoize cache when non-key parameters changeDavid Rowley2021-11-24
| | | | | | | | | | | | | | | | It's possible that a subplan below a Memoize node contains a parameter from above the Memoize node. If this parameter changes then cache entries may become out-dated due to the new parameter value. Previously Memoize was mistakenly not aware of this. We fix this here by flushing the cache whenever a parameter that's not part of the cache key changes. Bug: #17213 Reported by: Elvis Pranskevichus Author: David Rowley Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org Backpatch-through: 14, where Memoize was added
* Allow Memoize to operate in binary comparison modeDavid Rowley2021-11-24
| | | | | | | | | | | | | | | | | | | | | | Memoize would always use the hash equality operator for the cache key types to determine if the current set of parameters were the same as some previously cached set. Certain types such as floating points where -0.0 and +0.0 differ in their binary representation but are classed as equal by the hash equality operator may cause problems as unless the join uses the same operator it's possible that whichever join operator is being used would be able to distinguish the two values. In which case we may accidentally return in the incorrect rows out of the cache. To fix this here we add a binary mode to Memoize to allow it to the current set of parameters to previously cached values by comparing bit-by-bit rather than logically using the hash equality operator. This binary mode is always used for LATERAL joins and it's used for normal joins when any of the join operators are not hashable. Reported-by: Tom Lane Author: David Rowley Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us Backpatch-through: 14, where Memoize was added
* Fix variable lifespan in ExecInitCoerceToDomain().Tom Lane2021-11-02
| | | | | | | | | | | | This undoes a mistake in 1ec7679f1: domainval and domainnull were meant to live across loop iterations, but they were incorrectly moved inside the loop. The effect was only to emit useless extra EEOP_MAKE_READONLY steps, so it's not a big deal; nonetheless, back-patch to v13 where the mistake was introduced. Ranier Vilela Discussion: https://postgr.es/m/CAEudQAqXuhbkaAp-sGH6dR6Nsq7v28_0TPexHOm6FiDYqwQD-w@mail.gmail.com
* Avoid some other O(N^2) hazards in list manipulation.Tom Lane2021-11-01
| | | | | | | | | | | | In the same spirit as 6301c3ada, fix some more places where we were using list_delete_first() in a loop and thereby risking O(N^2) behavior. It's not clear that the lists manipulated in these spots can get long enough to be really problematic ... but it's not clear that they can't, either, and the fixes are simple enough. As before, back-patch to v13. Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
* Fix assignment to array of domain over composite.Tom Lane2021-10-19
| | | | | | | | | | | | | | | An update such as "UPDATE ... SET fld[n].subfld = whatever" failed if the array elements were domains rather than plain composites. That's because isAssignmentIndirectionExpr() failed to cope with the CoerceToDomain node that would appear in the expression tree in this case. The result would typically be a crash, and even if we accidentally didn't crash, we'd not correctly preserve other fields of the same array element. Per report from Onder Kalaci. Back-patch to v11 where arrays of domains came in. Discussion: https://postgr.es/m/PH0PR21MB132823A46AA36F0685B7A29AD8BD9@PH0PR21MB1328.namprd21.prod.outlook.com
* Refactor LogicalTapeSet/LogicalTape interface.Heikki Linnakangas2021-10-18
| | | | | | | | | | | | All the tape functions, like LogicalTapeRead and LogicalTapeWrite, now take a LogicalTape as argument, instead of LogicalTapeSet+tape number. You can create any number of LogicalTapes in a single LogicalTapeSet, and you don't need to decide the number upfront, when you create the tape set. This makes the tape management in hash agg spilling in nodeAgg.c simpler. Discussion: https://www.postgresql.org/message-id/420a0ec7-602c-d406-1e75-1ef7ddc58d83%40iki.fi Reviewed-by: Peter Geoghegan, Zhihong Yu, John Naylor
* shm_mq: Update mq_bytes_written less often.Robert Haas2021-10-14
| | | | | | | | | | | | | | Do not update shm_mq's mq_bytes_written until we have written an amount of data greater than 1/4th of the ring size, unless the caller of shm_mq_send(v) requests a flush at the end of the message. This reduces the number of calls to SetLatch(), and also the number of CPU cache misses, considerably, and thus makes shm_mq significantly faster. Dilip Kumar, reviewed by Zhihong Yu and Tomas Vondra. Some minor cosmetic changes by me. Discussion: http://postgr.es/m/CAFiTN-tVXqn_OG7tHNeSkBbN+iiCZTiQ83uakax43y1sQb2OBA@mail.gmail.com
* Clean up more code using "(expr) ? true : false"Michael Paquier2021-10-11
| | | | | | | | | This is similar to fd0625c, taking care of any remaining code paths that are worth the cleanup. This also changes some cases using opposite expression patterns. Author: Justin Pryzby, Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoCdF8dnUvr-BUWWGvA_XhKSoANacBMZb6jKyCk4TYfQ2Q@mail.gmail.com
* Fix checking of query type in plpgsql's RETURN QUERY command.Tom Lane2021-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to v14, we insisted that the query in RETURN QUERY be of a type that returns tuples. (For instance, INSERT RETURNING was allowed, but not plain INSERT.) That happened indirectly because we opened a cursor for the query, so spi.c checked SPI_is_cursor_plan(). As a consequence, the error message wasn't terribly on-point, but at least it was there. Commit 2f48ede08 lost this detail. Instead, plain RETURN QUERY insisted that the query be a SELECT (by checking for SPI_OK_SELECT) while RETURN QUERY EXECUTE failed to check the query type at all. Neither of these changes was intended. The only convenient place to check this in the EXECUTE case is inside _SPI_execute_plan, because we haven't done parse analysis until then. So we need to pass down a flag saying whether to enforce that the query returns tuples. Fortunately, we can squeeze another boolean into struct SPIExecuteOptions without an ABI break, since there's padding space there. (It's unlikely that any extensions would already be using this new struct, but preserving ABI in v14 seems like a smart idea anyway.) Within spi.c, it seemed like _SPI_execute_plan's parameter list was already ridiculously long, and I didn't want to make it longer. So I thought of passing SPIExecuteOptions down as-is, allowing that parameter list to become much shorter. This makes the patch a bit more invasive than it might otherwise be, but it's all internal to spi.c, so that seems fine. Per report from Marc Bachmann. Back-patch to v14 where the faulty code came in. Discussion: https://postgr.es/m/1F2F75F0-27DF-406F-848D-8B50C7EEF06A@gmail.com
* Fix typos and grammar in code commentsMichael Paquier2021-09-27
| | | | | | | | | | | Several mistakes have piled in the code comments over the time, including incorrect grammar, function names and simple typos. This commit takes care of a portion of these. No backpatch is done as this is only cosmetic. Author: Justin Pryzby Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com
* Remove arbitrary 64K-or-so limit on rangetable size.Tom Lane2021-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now the size of a query's rangetable has been limited by the constants INNER_VAR et al, which mustn't be equal to any real rangetable index. 65000 doubtless seemed like enough for anybody, and it still is orders of magnitude larger than the number of joins we can realistically handle. However, we need a rangetable entry for each child partition that is (or might be) processed by a query. Queries with a few thousand partitions are getting more realistic, so that the day when that limit becomes a problem is in sight, even if it's not here yet. Hence, let's raise the limit. Rather than just increase the values of INNER_VAR et al, this patch adopts the approach of making them small negative values, so that rangetables could theoretically become as long as INT_MAX. The bulk of the patch is concerned with changing Var.varno and some related variables from "Index" (unsigned int) to plain "int". This is basically cosmetic, with little actual effect other than to help debuggers print their values nicely. As such, I've only bothered with changing places that could actually see INNER_VAR et al, which the parser and most of the planner don't. We do have to be careful in places that are performing less/greater comparisons on varnos, but there are very few such places, other than the IS_SPECIAL_VARNO macro itself. A notable side effect of this patch is that while it used to be possible to add INNER_VAR et al to a Bitmapset, that will now draw an error. I don't see any likelihood that it wouldn't be a bug to include these fake varnos in a bitmapset of real varnos, so I think this is all to the good. Although this touches outfuncs/readfuncs, I don't think a catversion bump is required, since stored rules would never contain Vars with these fake varnos. Andrey Lepikhov and Tom Lane, after a suggestion by Peter Eisentraut Discussion: https://postgr.es/m/43c7f2f5-1e27-27aa-8c65-c91859d15190@postgrespro.ru
* Remove Value node structPeter Eisentraut2021-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | The Value node struct is a weird construct. It is its own node type, but most of the time, it actually has a node type of Integer, Float, String, or BitString. As a consequence, the struct name and the node type don't match most of the time, and so it has to be treated specially a lot. There doesn't seem to be any value in the special construct. There is very little code that wants to accept all Value variants but nothing else (and even if it did, this doesn't provide any convenient way to check it), and most code wants either just one particular node type (usually String), or it accepts a broader set of node types besides just Value. This change removes the Value struct and node type and replaces them by separate Integer, Float, String, and BitString node types that are proper node types and structs of their own and behave mostly like normal node types. Also, this removes the T_Null node tag, which was previously also a possible variant of Value but wasn't actually used outside of the Value contained in A_Const. Replace that by an isnull field in A_Const. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/5ba6bc5b-3f95-04f2-2419-f8ddb4c046fb@enterprisedb.com
* Clean up some code using "(expr) ? true : false"Michael Paquier2021-09-08
| | | | | | | | | | All the code paths simplified here were already using a boolean or used an expression that led to zero or one, making the extra bits unnecessary. Author: Justin Pryzby Reviewed-by: Tom Lane, Michael Paquier, Peter Smith Discussion: https://postgr.es/m/20210428182936.GE27406@telsasoft.com
* Fix missing words in comment.Heikki Linnakangas2021-09-07
| | | | | | | Introduced by commit c3928b467a, backpatch to v14 like that one. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA+HiwqFQgNLS6VGntMcuJV6erBFV425xA6wBVnY=41GK4zC0Bw@mail.gmail.com
* Improve error messages about misuse of SELECT INTO.Tom Lane2021-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | Improve two places in plpgsql, and one in spi.c, where an error message would confusingly tell you that you couldn't use a SELECT query, when what you had written *was* a SELECT query. The actual problem is that you can't use SELECT ... INTO in these contexts, but the messages failed to make that apparent. Special-case SELECT INTO to make these errors more helpful. Also, fix the same spots in plpgsql, as well as several messages in exec_eval_expr(), to not quote the entire complained-of query or expression in the primary error message. That behavior very easily led to violating our message style guideline about keeping the primary error message short and single-line. Also, since the important part of the message was after the inserted text, it could make the real problem very hard to see. We can report the query or expression as the first line of errcontext instead. Per complaint from Roger Mason. Back-patch to v14, since (a) some of these messages are new in v14 and (b) v14's translatable strings are still somewhat in flux. The problem's older than that of course, but I'm hesitant to change the behavior further back. Discussion: https://postgr.es/m/1914708.1629474624@sss.pgh.pa.us
* Use appropriate tuple descriptor in FDW batchingTomas Vondra2021-08-12
| | | | | | | | | | | | | | | The FDW batching code was using the same tuple descriptor both for all slots (regular and plan slots), but that's incorrect - the subplan may use a different descriptor. Currently this is benign, because batching is used only for INSERTs, and in that case the descriptors always match. But that would change if we allow batching UPDATEs. Fix by copying the appropriate tuple descriptor. Backpatch to 14, where the FDW batching was implemented. Author: Amit Langote Backpatch-through: 14, where FDW batching was added Discussion: https://postgr.es/m/CA%2BHiwqEWd5B0-e-RvixGGUrNvGkjH2s4m95%3DJcwUnyV%3Df0rAKQ%40mail.gmail.com
* Fix segfault during EvalPlanQual with mix of local and foreign partitions.Heikki Linnakangas2021-08-12
| | | | | | | | | | | | | | | | | | | | | It's not sensible to re-evaluate a direct-modify Foreign Update or Delete during EvalPlanQual. However, ExecInitForeignScan() can still get called if a table mixes local and foreign partitions. EvalPlanQualStart() left the es_result_relations array uninitialized in the child EPQ EState, but ExecInitForeignScan() still expected to find it. That caused a segfault. Fix by skipping the es_result_relations lookup during EvalPlanQual processing. To make things a bit more robust, also skip the BeginDirectModify calls, and add a runtime check that ExecForeignScan() is not called on direct-modify foreign scans during EvalPlanQual processing. This is new in v14, commit 1375422c782. Before that, EvalPlanQualStart() copied the whole ResultRelInfo array to the EPQ EState. Backpatch to v14. Report and diagnosis by Andrey Lepikhov. Discussion: https://www.postgresql.org/message-id/cb2b808d-cbaa-4772-76ee-c8809bafcf3d%40postgrespro.ru
* Change SeqScan node to contain Scan nodePeter Eisentraut2021-08-08
| | | | | | | This makes the structure of all Scan-derived nodes the same, independent of whether they have additional fields. Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
* Fix oversight in commit 1ec7fca8592178281cd5cdada0f27a340fb813fc.Etsuro Fujita2021-08-02
| | | | | | | | | | | | | | | | | I failed to account for the possibility that when ExecAppendAsyncEventWait() notifies multiple async-capable nodes using postgres_fdw, a preceding node might invoke process_pending_request() to process a pending asynchronous request made by a succeeding node. In that case the succeeding node should produce a tuple to return to the parent Append node from tuples fetched by process_pending_request() when notified. Repair. Per buildfarm via Michael Paquier. Back-patch to v14, like the previous commit. Thanks to Tom Lane for testing. Discussion: https://postgr.es/m/YQP0UPT8KmPiHTMs%40paquier.xyz
* postgres_fdw: Fix handling of pending asynchronous requests.Etsuro Fujita2021-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A pending asynchronous request is handled by process_pending_request(), which previously not only processed an in-progress remote query but performed ExecForeignScan() to produce a tuple to return to the local server asynchronously from the result of the remote query. But that led to a server crash when executing a query or led to an "InstrStartNode called twice in a row" or "InstrEndLoop called on running node" failure when doing EXPLAIN ANALYZE of it, in cases where the plan tree for it contained multiple async-capable nodes accessing the same initplan/subplan that contained multiple async-capable nodes scanning the same foreign tables as for the parent async-capable nodes, as reported by Andrey Lepikhov. The reason is that the second step in process_pending_request() invoked when executing the initplan/subplan for one of the parent async-capable nodes caused recursive execution of the initplan/subplan for another of the parent async-capable nodes. To fix, split process_pending_request() into the two steps and postpone the second step until ForeignAsyncConfigureWait() is called for each of the pending asynchronous requests. Also, in ExecAppendAsyncEventWait() we assumed that FDWs would register at least one wait event in a WaitEventSet created there when they were called from ForeignAsyncConfigureWait() in that function, but allow FDWs to register zero wait events in the WaitEventSet; modify ExecAppendAsyncEventWait() to just return in that case. Oversight in commit 27e1f1456. Back-patch to v14 where that commit went in. Andrey Lepikhov and Etsuro Fujita Discussion: https://postgr.es/m/fe5eaa19-1704-e4a4-76ee-3b9d37ade399@postgrespro.ru
* Get rid of artificial restriction on hash table sizes on Windows.Tom Lane2021-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The point of introducing the hash_mem_multiplier GUC was to let users reproduce the old behavior of hash aggregation, i.e. that it could use more than work_mem at need. However, the implementation failed to get the job done on Win64, where work_mem is clamped to 2GB to protect various places that calculate memory sizes using "long int". As written, the same clamp was applied to hash_mem. This resulted in severe performance regressions for queries requiring a bit more than 2GB for hash aggregation, as they now spill to disk and there's no way to stop that. Getting rid of the work_mem restriction seems like a good idea, but it's a big job and could not conceivably be back-patched. However, there's only a fairly small number of places that are concerned with the hash_mem value, and it turns out to be possible to remove the restriction there without too much code churn or any ABI breaks. So, let's do that for now to fix the regression, and leave the larger task for another day. This patch does introduce a bit more infrastructure that should help with the larger task, namely pg_bitutils.h support for working with size_t values. Per gripe from Laurent Hasson. Back-patch to v13 where the behavior change came in. Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com
* Make nodeSort.c use Datum sorts for single column sortsDavid Rowley2021-07-22
| | | | | | | | | | | | | | | Datum sorts can be significantly faster than tuple sorts, especially when the data type being sorted is a pass-by-value type. Something in the region of 50-70% performance improvements appear to be possible. Just in case there's any confusion; the Datum sort is only used when the targetlist of the Sort node contains a single column, not when there's a single column in the sort key and multiple items in the target list. Author: Ronan Dunklau Reviewed-by: James Coleman, David Rowley, Ranier Vilela, Hou Zhijie Tested-by: John Naylor Discussion: https://postgr.es/m/3177670.itZtoPt7T5@aivenronan
* Rename NodeTag of ExprStatePeter Eisentraut2021-07-21
| | | | | | Rename from tag to type, for consistency with all other node structs. Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce@enterprisedb.com
* More improvements of error messages about mismatching relkindPeter Eisentraut2021-07-21
| | | | | | | | | | | Follow-up to 2ed532ee8c474e9767e76e1f3251cc3a0224358c, a few error messages in the logical replication area currently only deal with tables, but if we're anticipating more relkinds such as sequences being handled, then these messages also fall into the category affected by the previous patch, so adjust them too. Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/c9ba5c6a-4bd5-e12c-1b3c-edbcaedbf392@enterprisedb.com
* Use l*_node() family of functions where appropriatePeter Eisentraut2021-07-19
| | | | | | | Instead of castNode(…, lfoo(…)) Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/87eecahraj.fsf@wibble.ilmari.org
* Change the name of the Result Cache node to MemoizeDavid Rowley2021-07-14
| | | | | | | | | | | "Result Cache" was never a great name for this node, but nobody managed to come up with another name that anyone liked enough. That was until David Johnston mentioned "Node Memoization", which Tom Lane revised to just "Memoize". People seem to like "Memoize", so let's do the rename. Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/20210708165145.GG1176@momjian.us Backpatch-through: 14, where Result Cache was introduced
* Use a hash table to speed up NOT IN(values)David Rowley2021-07-07
| | | | | | | | | | | | | | | | | | | | | Similar to 50e17ad28, which allowed hash tables to be used for IN clauses with a set of constants, here we add the same feature for NOT IN clauses. NOT IN evaluates the same as: WHERE a <> v1 AND a <> v2 AND a <> v3. Obviously, if we're using a hash table we must be exactly equivalent to that and return the same result taking into account that either side of the condition could contain a NULL. This requires a little bit of special handling to make work with the hash table version. When processing NOT IN, the ScalarArrayOpExpr's operator will be the <> operator. To be able to build and lookup a hash table we must use the <>'s negator operator. The planner checks if that exists and is hashable and sets the relevant fields in ScalarArrayOpExpr to instruct the executor to use hashing. Author: David Rowley, James Coleman Reviewed-by: James Coleman, Zhihong Yu Discussion: https://postgr.es/m/CAApHDvoF1mum_FRk6D621edcB6KSHBi2+GAgWmioj5AhOu2vwQ@mail.gmail.com
* Allow CustomScan providers to say whether they support projections.Tom Lane2021-07-06
| | | | | | | | | | | | | | | | | Previously, all CustomScan providers had to support projections, but there may be cases where this is inconvenient. Add a flag bit to say if it's supported. Important item for the release notes: this is non-backwards-compatible since the default is now to assume that CustomScan providers can't project, instead of assuming that they can. It's fail-soft, but could result in visible performance penalties due to adding unnecessary Result nodes. Sven Klemm, reviewed by Aleksander Alekseev; some cosmetic fiddling by me. Discussion: https://postgr.es/m/CAMCrgp1kyakOz6c8aKhNDJXjhQ1dEjEnp+6KNT3KxPrjNtsrDg@mail.gmail.com
* Cleanup some aggregate code in the executorDavid Rowley2021-07-04
| | | | | | | | | | | | | | | | | | | | | Here we alter the code that calls build_pertrans_for_aggref() so that the function no longer needs to special-case whether it's dealing with an aggtransfn or an aggcombinefn. This allows us to reuse the build_aggregate_transfn_expr() function and just get rid of the build_aggregate_combinefn_expr() completely. All of the special case code that was in build_pertrans_for_aggref() has been moved up to the calling functions. This saves about a dozen lines of code in nodeAgg.c and a few dozen more in parse_agg.c Also, rename a few variables in nodeAgg.c to try to make it more clear that we're working with either a aggtransfn or an aggcombinefn. Some of the old names would have you believe that we were always working with an aggtransfn. Discussion: https://postgr.es/m/CAApHDvptMQ9FmF0D67zC_w88yVnoNVR2+kkOQGUrCmdxWxLULQ@mail.gmail.com
* Pre branch pgindent / pgperltidy runAndrew Dunstan2021-06-28
| | | | | Along the way make a slight adjustment to src/include/utils/queryjumble.h to avoid an unused typedef.
* Centralize the logic for protective copying of utility statements.Tom Lane2021-06-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the "simple Query" code path, it's fine for parse analysis or execution of a utility statement to scribble on the statement's node tree, since that'll just be thrown away afterwards. However it's not fine if the node tree is in the plan cache, as then it'd be corrupted for subsequent executions. Up to now we've dealt with that by having individual utility-statement functions apply copyObject() if they were going to modify the tree. But that's prone to errors of omission. Bug #17053 from Charles Samborski shows that CREATE/ALTER DOMAIN didn't get this memo, and can crash if executed repeatedly from plan cache. In the back branches, we'll just apply a narrow band-aid for that, but in HEAD it seems prudent to have a more principled fix that will close off the possibility of other similar bugs in future. Hence, let's hoist the responsibility for doing copyObject up into ProcessUtility from its children, thus ensuring that it happens for all utility statement types. Also, modify ProcessUtility's API so that its callers can tell it whether a copy step is necessary. It turns out that in all cases, the immediate caller knows whether the node tree is transient, so this doesn't involve a huge amount of code thrashing. In this way, while we lose a little bit in the execute-from-cache code path due to sometimes copying node trees that wouldn't be mutated anyway, we gain something in the simple-Query code path by not copying throwaway node trees. Statements that are complex enough to be expensive to copy are almost certainly ones that would have to be copied anyway, so the loss in the cache code path shouldn't be much. (Note that this whole problem applies only to utility statements. Optimizable statements don't have the issue because we long ago made the executor treat Plan trees as read-only. Perhaps someday we will make utility statement execution act likewise, but I'm not holding my breath.) Discussion: https://postgr.es/m/931771.1623893989@sss.pgh.pa.us Discussion: https://postgr.es/m/17053-3ca3f501bbc212b4@postgresql.org
* Fix copying data into slots with FDW batchingTomas Vondra2021-06-16
| | | | | | | | | | | | | | | | Commit b676ac443b optimized handling of tuple slots with bulk inserts into foreign tables, so that the slots are initialized only once and reused for all batches. The data was however copied into the slots only after the initialization, inserting duplicate values when the slot gets reused. Fixed by moving the ExecCopySlot outside the init branch. The existing postgres_fdw tests failed to catch this due to inserting data into foreign tables without unique indexes, and then checking only the number of inserted rows. This adds a new test with both a unique index and a check of inserted values. Reported-by: Alexander Pyhalov Discussion: https://postgr.es/m/7a8cf8d56b3d18e5c0bccd6cd42d04ac%40postgrespro.ru
* Optimize creation of slots for FDW bulk insertsTomas Vondra2021-06-11
| | | | | | | | | | | | | | | | | | | | | | Commit b663a41363 introduced bulk inserts for FDW, but the handling of tuple slots turned out to be problematic for two reasons. Firstly, the slots were re-created for each individual batch. Secondly, all slots referenced the same tuple descriptor - with reasonably small batches this is not an issue, but with large batches this triggers O(N^2) behavior in the resource owner code. These two issues work against each other - to reduce the number of times a slot has to be created/dropped, larger batches are needed. However, the larger the batch, the more expensive the resource owner gets. For practical batch sizes (100 - 1000) this would not be a big problem, as the benefits (latency savings) greatly exceed the resource owner costs. But for extremely large batches it might be much worse, possibly even losing with non-batching mode. Fixed by initializing tuple slots only once (and reusing them across batches) and by using a new tuple descriptor copy for each slot. Discussion: https://postgr.es/m/ebbbcc7d-4286-8c28-0272-61b4753af761%40enterprisedb.com