aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor/execMain.c
Commit message (Collapse)AuthorAge
...
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Fix initialization of es_result_relations in EvalPlanQualStart().Heikki Linnakangas2020-10-23
| | | | | | | | | | | | Thinko in commit 1375422c782. EvalPlanQualStart() was mistakenly resetting the parent EState's es_result_relations, when it should initialize the field in the child EPQ EState it just created. That was clearly wrong, but it didn't cause any ill effects, because es_result_relations is currently not used after the ExecInit* phase. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA%2BHiwqFEuq8AAAmxXsTDVZ1r38cHbfYuiPQx_%3DYyKe2DC-6q4A%40mail.gmail.com
* Remove PartitionRoutingInfo struct.Heikki Linnakangas2020-10-19
| | | | | | | | | | The extra indirection neeeded to access its members via its enclosing ResultRelInfo seems pointless. Move all the fields from PartitionRoutingInfo to ResultRelInfo. Author: Amit Langote Reviewed-by: Alvaro Herrera Discussion: https://www.postgresql.org/message-id/CA%2BHiwqFViT47Zbr_ASBejiK7iDG8%3DQ1swQ-tjM6caRPQ67pT%3Dw%40mail.gmail.com
* Revise child-to-root tuple conversion map management.Heikki Linnakangas2020-10-19
| | | | | | | | | | | | | | | | | | | | | | | Store the tuple conversion map to convert a tuple from a child table's format to the root format in a new ri_ChildToRootMap field in ResultRelInfo. It is initialized if transition tuple capture for FOR STATEMENT triggers or INSERT tuple routing on a partitioned table is needed. Previously, ModifyTable kept the maps in the per-subplan ModifyTableState->mt_per_subplan_tupconv_maps array, or when tuple routing was used, in ResultRelInfo->ri_Partitioninfo->pi_PartitionToRootMap. The new field replaces both of those. Now that the child-to-root tuple conversion map is always available in ResultRelInfo (when needed), remove the TransitionCaptureState.tcs_map field. The callers of Exec*Trigger() functions no longer need to set or save it, which is much less confusing and bug-prone. Also, as a future optimization, this will allow us to delay creating the map for a given result relation until the relation is actually processed during execution. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA%2BHiwqHtCWLdK-LO%3DNEsvOdHx%2B7yv4mE_zYK0i3BH7dXb-wxog%40mail.gmail.com
* Remove es_result_relation_info from EState.Heikki Linnakangas2020-10-14
| | | | | | | | | | | | | | Maintaining 'es_result_relation_info' correctly at all times has become cumbersome, especially with partitioning where each partition gets its own result relation info. Having to set and reset it across arbitrary operations has caused bugs in the past. This changes all the places that used 'es_result_relation_info', to receive the currently active ResultRelInfo via function parameters instead. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
* Create ResultRelInfos later in InitPlan, index them by RT index.Heikki Linnakangas2020-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of allocating all the ResultRelInfos upfront in one big array, allocate them in ExecInitModifyTable(). es_result_relations is now an array of ResultRelInfo pointers, rather than an array of structs, and it is indexed by the RT index. This simplifies things: we get rid of the separate concept of a "result rel index", and don't need to set it in setrefs.c anymore. This also allows follow-up optimizations (not included in this commit yet) to skip initializing ResultRelInfos for target relations that were not needed at runtime, and removal of the es_result_relation_info pointer. The EState arrays of regular result rels and root result rels are merged into one array. Similarly, the resultRelations and rootResultRelations lists in PlannedStmt are merged into one. It's not actually clear to me why they were kept separate in the first place, but now that the es_result_relations array is indexed by RT index, it certainly seems pointless. The PlannedStmt->resultRelations list is now only needed for ExecRelationIsTargetRelation(). One visible effect of this change is that ExecRelationIsTargetRelation() will now return 'true' also for the partition root, if a partitioned table is updated. That seems like a good thing, although the function isn't used in core code, and I don't see any reason for an FDW to call it on a partition root. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/CA%2BHiwqGEmiib8FLiHMhKB%2BCH5dRgHSLc5N5wnvc4kym%2BZYpQEQ%40mail.gmail.com
* Don't fetch partition check expression during InitResultRelInfo.Tom Lane2020-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since there is only one place that actually needs the partition check expression, namely ExecPartitionCheck, it's better to fetch it from the relcache there. In this way we will never fetch it at all if the query never has use for it, and we still fetch it just once when we do need it. The reason for taking an interest in this is that if the relcache doesn't already have the check expression cached, fetching it requires obtaining AccessShareLock on the partition root. That means that operations that look like they should only touch the partition itself will also take a lock on the root. In particular we observed that TRUNCATE on a partition may take a lock on the partition's root, contributing to a deadlock situation in parallel pg_restore. As written, this patch does have a small cost, which is that we are microscopically reducing efficiency for the case where a partition has an empty check expression. ExecPartitionCheck will be called, and will go through the motions of setting up and checking an empty qual, where before it would not have been called at all. We could avoid that by adding a separate boolean flag to track whether there is a partition expression to test. However, this case only arises for a default partition with no siblings, which surely is not an interesting case in practice. Hence adding complexity for it does not seem like a good trade-off. Amit Langote, per a suggestion by me Discussion: https://postgr.es/m/VI1PR03MB31670CA1BD9625C3A8C5DD05EB230@VI1PR03MB3167.eurprd03.prod.outlook.com
* Add object names to partition integrity violations.Amit Kapila2020-03-23
| | | | | | | | | | | | All errors of SQLSTATE class 23 should include the name of an object associated with the error in separate fields of the error report message. We do this so that applications need not try to extract them from the possibly-localized human-readable text of the message. Reported-by: Chris Bandy Author: Chris Bandy Reviewed-by: Amit Kapila and Amit Langote Discussion: https://postgr.es/m/0aa113a3-3c7f-db48-bcd8-f9290b2269ae@gmail.com
* Represent command completion tags as structsAlvaro Herrera2020-03-02
| | | | | | | | | | | | | | | | | | | | | | The backend was using strings to represent command tags and doing string comparisons in multiple places, but that's slow and unhelpful. Create a new command list with a supporting structure to use instead; this is stored in a tag-list-file that can be tailored to specific purposes with a caller-definable C macro, similar to what we do for WAL resource managers. The first first such uses are a new CommandTag enum and a CommandTagBehavior struct. Replace numerous occurrences of char *completionTag with a QueryCompletion struct so that the code no longer stores information about completed queries in a cstring. Only at the last moment, in EndCommand(), does this get converted to a string. EventTriggerCacheItem no longer holds an array of palloc’d tag strings in sorted order, but rather just a Bitmapset over the CommandTags. Author: Mark Dilger, with unsolicited help from Álvaro Herrera Reviewed-by: John Naylor, Tom Lane Discussion: https://postgr.es/m/981A9DB4-3F0C-4DA5-88AD-CB9CFF4D6CAD@enterprisedb.com
* Fix dangling pointer in EvalPlanQual machinery.Tom Lane2020-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | EvalPlanQualStart() supposed that it could re-use the relsubs_rowmark and relsubs_done arrays from a prior instantiation. But since they are allocated in the es_query_cxt of the recheckestate, that's just wrong; EvalPlanQualEnd() will blow away that storage. Therefore we were using storage that could have been reallocated to something else, causing all sorts of havoc. I think this was modeled on the old code's handling of es_epqTupleSlot, but since the code was anyway clearing the arrays at re-use, there's clearly no expectation of importing any outside state. So it's just a dubious savings of a couple of pallocs, which is negligible compared to setting up a new planstate tree. Therefore, just allocate the arrays always. (I moved the allocations slightly for readability.) In principle this bug could cause a problem whenever EPQ rechecks are needed in more than one target table of a ModifyTable plan node. In practice it seems not quite so easy to trigger as that; I couldn't readily duplicate a crash with a partitioned target table, for instance. That's probably down to incidental choices about when to free or reallocate stuff. The added isolation test case does seem to reliably show an assertion failure, though. Per report from Oleksii Kliukin. Back-patch to v12 where the bug was introduced (evidently by commit 3fb307bc4). Discussion: https://postgr.es/m/EEF05F66-2871-4786-992B-5F45C92FEE2E@hintbits.com
* Added relation name in error messages for constraint checks.Amit Kapila2020-01-28
| | | | | | | | | | This gives more information to the user about the error and it makes such messages consistent with the other similar messages in the code. Reported-by: Simon Riggs Author: Mahendra Singh and Simon Riggs Reviewed-by: Beena Emerson and Amit Kapila Discussion: https://postgr.es/m/CANP8+j+7YUvQvGxTrCiw77R23enMJ7DFmyA3buR+fa2pKs4XhA@mail.gmail.com
* Make rewriter prevent auto-updates on views with conditional INSTEAD rules.Dean Rasheed2020-01-14
| | | | | | | | | | | | | | | | | | | | A view with conditional INSTEAD rules and no unconditional INSTEAD rules or INSTEAD OF triggers is not auto-updatable. Previously we relied on a check in the executor to catch this, but that's problematic since the planner may fail to properly handle such a query and thus return a particularly unhelpful error to the user, before reaching the executor check. Instead, trap this in the rewriter and report the correct error there. Doing so also allows us to include more useful error detail than the executor check can provide. This doesn't change the existing behaviour of updatable views; it merely ensures that useful error messages are reported when a view isn't updatable. Per report from Pengzhou Tang, though not adopting that suggested fix. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAG4reAQn+4xB6xHJqWdtE0ve_WqJkdyCV4P=trYr4Kn8_3_PEA@mail.gmail.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Refactor attribute mappings used in logical tuple conversionMichael Paquier2019-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | Tuple conversion support in tupconvert.c is able to convert rowtypes between two relations, inner and outer, which are logically equivalent but have a different ordering or even dropped columns (used mainly for inheritance tree and partitions). This makes use of attribute mappings, which are simple arrays made of AttrNumber elements with a length matching the number of attributes of the outer relation. The length of the attribute mapping has been treated as completely independent of the mapping itself until now, making it easy to pass down an incorrect mapping length. This commit refactors the code related to attribute mappings and moves it into an independent facility called attmap.c, extracted from tupconvert.c. This merges the attribute mapping with its length, avoiding to try to guess what is the length of a mapping to use as this is computed once, when the map is built. This will avoid mistakes like what has been fixed in dc816e58, which has used an incorrect mapping length by matching it with the number of attributes of an inner relation (a child partition) instead of an outer relation (a partitioned table). Author: Michael Paquier Reviewed-by: Amit Langote Discussion: https://postgr.es/m/20191121042556.GD153437@paquier.xyz
* Always call ExecShutdownNode() if appropriate.Thomas Munro2019-11-16
| | | | | | | | | | | | | | Call ExecShutdownNode() after ExecutePlan()'s loop, rather than at each break. We had forgotten to do that in one case. The omission caused intermittent "temporary file leak" warnings from multi-batch parallel hash joins with a LIMIT clause. Back-patch to 11. Though the problem exists in theory in earlier parallel query releases, nothing really depended on it. Author: Kyotaro Horiguchi Reviewed-by: Thomas Munro, Amit Kapila Discussion: https://postgr.es/m/20191111.212418.2222262873417235945.horikyota.ntt%40gmail.com
* Fix bogus sizeof calculations.Tom Lane2019-09-15
| | | | | Noted by Coverity. Typo in 27cc7cd2b, so back-patch to v12 as that was.
* Reorder EPQ work, to fix rowmark related bugs and improve efficiency.Andres Freund2019-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ad0bda5d24ea I changed the EvalPlanQual machinery to store substitution tuples in slot, instead of using plain HeapTuples. The main motivation for that was that using HeapTuples will be inefficient for future tableams. But it turns out that that conversion was buggy for non-locking rowmarks - the wrong tuple descriptor was used to create the slot. As a secondary issue 5db6df0c0 changed ExecLockRows() to begin EPQ earlier, to allow to fetch the locked rows directly into the EPQ slots, instead of having to copy tuples around. Unfortunately, as Tom complained, that forces some expensive initialization to happen earlier. As a third issue, the test coverage for EPQ was clearly insufficient. Fixing the first issue is unfortunately not trivial: Non-locked row marks were fetched at the start of EPQ, and we don't have the type information for the rowmarks available at that point. While we could change that, it's not easy. It might be worthwhile to change that at some point, but to fix this bug, it seems better to delay fetching non-locking rowmarks when they're actually needed, rather than eagerly. They're referenced at most once, and in cases where EPQ fails, might never be referenced. Fetching them when needed also increases locality a bit. To be able to fetch rowmarks during execution, rather than initialization, we need to be able to access the active EPQState, as that contains necessary data. To do so move EPQ related data from EState to EPQState, and, only for EStates creates as part of EPQ, reference the associated EPQState from EState. To fix the second issue, change EPQ initialization to allow use of EvalPlanQualSlot() to be used before EvalPlanQualBegin() (but obviously still requiring EvalPlanQualInit() to have been done). As these changes made struct EState harder to understand, e.g. by adding multiple EStates, significantly reorder the members, and add a lot more comments. Also add a few more EPQ tests, including one that fails for the first issue above. More is needed. Reported-By: yi huang Author: Andres Freund Reviewed-By: Tom Lane Discussion: https://postgr.es/m/CAHU7rYZo_C4ULsAx_LAj8az9zqgrD8WDd4hTegDTMM1LMqrBsg@mail.gmail.com https://postgr.es/m/24530.1562686693@sss.pgh.pa.us Backpatch: 12-, where the EPQ changes were introduced
* Remove 'msg' parameter from convert_tuples_by_nameAlvaro Herrera2019-09-03
| | | | | | | | | | The message was included as a parameter when this function was added in dcb2bda9b704, but I don't think it has ever served any useful purpose. Let's stop spreading it pointlessly. Reviewed by Amit Langote and Peter Eisentraut. Discussion: https://postgr.es/m/20190806224728.GA17233@alvherre.pgsql
* Remove EState.es_range_table_array.Tom Lane2019-08-12
| | | | | | | | | | Now that list_nth is O(1), there's no good reason to maintain a separate array of RTE pointers rather than indexing into estate->es_range_table. Deleting the array doesn't save all that much either; but just on cleanliness grounds, it's better not to have duplicate representations of the identical information. Discussion: https://postgr.es/m/14960.1565384592@sss.pgh.pa.us
* Use appendBinaryStringInfo in more places where the length is knownDavid Rowley2019-07-23
| | | | | | | | | | When we already know the length that we're going to append, then it makes sense to use appendBinaryStringInfo instead of appendStringInfoString so that the append can be performed with a simple memcpy() using a known length rather than having to first perform a strlen() call to obtain the length. Discussion: https://postgr.es/m/CAKJS1f8+FRAM1s5+mAa3isajeEoAaicJ=4e0WzrH3tAusbbiMQ@mail.gmail.com
* Pass QueryEnvironment down to EvalPlanQual's EState.Thomas Munro2019-07-10
| | | | | | | | | | | Otherwise the executor can't see trigger transition tables during EPQ evaluation. Fixes bug #15900 and almost certainly also #15720. Back-patch to 10, where trigger transition tables landed. Author: Alex Aktsipetrov Reviewed-by: Thomas Munro, Tom Lane Discussion: https://postgr.es/m/15900-bc482754fe8d7415%40postgresql.org Discussion: https://postgr.es/m/15720-38c2b29e5d720187%40postgresql.org
* Fix more typos and inconsistencies in the treeMichael Paquier2019-06-17
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/0a5419ea-1452-a4e6-72ff-545b1a5a8076@gmail.com
* Fix typos.Amit Kapila2019-05-26
| | | | | | | Reported-by: Alexander Lakhin Author: Alexander Lakhin Reviewed-by: Amit Kapila and Tom Lane Discussion: https://postgr.es/m/7208de98-add8-8537-91c0-f8b089e2928c@gmail.com
* tableam: Rename wrapper functions to match callback names.Andres Freund2019-05-23
| | | | | | | | | | | | | | | | Some of the wrapper functions didn't match the callback names. Many of them due to staying "consistent" with historic naming of the wrapped functionality. We decided that for most cases it's more important to be for tableam to be consistent going forward, than with the past. The one exception is beginscan/endscan/... because it'd have looked odd to have systable_beginscan/endscan/... with a different naming scheme, and changing the systable_* APIs would have caused way too much churn (including breaking a lot of external users). Author: Ashwin Agrawal, with some small additions by Andres Freund Reviewed-By: Andres Freund Discussion: https://postgr.es/m/CALfoeiugyrXZfX7n0ORCa4L-m834dzmaE8eFdbNR6PMpetU4Ww@mail.gmail.com
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Initial pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
* Fix EvalPlanQualStart to handle partitioned result rels correctly.Tom Lane2019-04-08
| | | | | | | | | | | | The es_root_result_relations array needs to be shallow-copied in the same way as the main es_result_relations array, else EPQ rechecks on partitioned result relations fail, as seen in bug #15677 from Norbert Benkocs. Amit Langote, isolation test case added by me Discussion: https://postgr.es/m/15677-0bf089579b4cd02d@postgresql.org Discussion: https://postgr.es/m/19321.1554567786@sss.pgh.pa.us
* tableam: Add table_multi_insert() and revamp/speed-up COPY FROM buffering.Andres Freund2019-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | This adds table_multi_insert(), and converts COPY FROM, the only user of heap_multi_insert, to it. A simple conversion of COPY FROM use slots would have yielded a slowdown when inserting into a partitioned table for some workloads. Different partitions might need different slots (both slot types and their descriptors), and dropping / creating slots when there's constant partition changes is measurable. Thus instead revamp the COPY FROM buffering for partitioned tables to allow to buffer inserts into multiple tables, flushing only when limits are reached across all partition buffers. By only dropping slots when there've been inserts into too many different partitions, the aforementioned overhead is gone. By allowing larger batches, even when there are frequent partition changes, we actuall speed such cases up significantly. By using slots COPY of very narrow rows into unlogged / temporary might slow down very slightly (due to the indirect function calls). Author: David Rowley, Andres Freund, Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20190327054923.t3epfuewxfqdt22e@alap3.anarazel.de
* Generated columnsPeter Eisentraut2019-03-30
| | | | | | | | | | | | | | This is an SQL-standard feature that allows creating columns that are computed from expressions rather than assigned, similar to a view or materialized view but on a column basis. This implements one kind of generated column: stored (computed on write). Another kind, virtual (computed on read), is planned for the future, and some room is left for it. Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
* tableam: Add and use table_fetch_row_version().Andres Freund2019-03-25
| | | | | | | | | | | | | | | | | | This is essentially the tableam version of heapam_fetch(), i.e. fetching a tuple identified by a tid, performing visibility checks. Note that this different from table_index_fetch_tuple(), which is for index lookups. It therefore has to handle a tid pointing to an earlier version of a tuple if the AM uses an optimization like heap's HOT. Add comments to that end. This commit removes the stats_relation argument from heap_fetch, as it's been unused for a long time. Author: Andres Freund Reviewed-By: Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* tableam: Add tuple_{insert, delete, update, lock} and use.Andres Freund2019-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds new, required, table AM callbacks for insert/delete/update and lock_tuple. To be able to reasonably use those, the EvalPlanQual mechanism had to be adapted, moving more logic into the AM. Previously both delete/update/lock call-sites and the EPQ mechanism had to have awareness of the specific tuple format to be able to fetch the latest version of a tuple. Obviously that needs to be abstracted away. To do so, move the logic that find the latest row version into the AM. lock_tuple has a new flag argument, TUPLE_LOCK_FLAG_FIND_LAST_VERSION, that forces it to lock the last version, rather than the current one. It'd have been possible to do so via a separate callback as well, but finding the last version usually also necessitates locking the newest version, making it sensible to combine the two. This replaces the previous use of EvalPlanQualFetch(). Additionally HeapTupleUpdated, which previously signaled either a concurrent update or delete, is now split into two, to avoid callers needing AM specific knowledge to differentiate. The move of finding the latest row version into tuple_lock means that encountering a row concurrently moved into another partition will now raise an error about "tuple to be locked" rather than "tuple to be updated/deleted" - which is accurate, as that always happens when locking rows. While possible slightly less helpful for users, it seems like an acceptable trade-off. As part of this commit HTSU_Result has been renamed to TM_Result, and its members been expanded to differentiated between updating and deleting. HeapUpdateFailureData has been renamed to TM_FailureData. The interface to speculative insertion is changed so nodeModifyTable.c does not have to set the speculative token itself anymore. Instead there's a version of tuple_insert, tuple_insert_speculative, that performs the speculative insertion (without requiring a flag to signal that fact), and the speculative insertion is either made permanent with table_complete_speculative(succeeded = true) or aborted with succeeded = false). Note that multi_insert is not yet routed through tableam, nor is COPY. Changing multi_insert requires changes to copy.c that are large enough to better be done separately. Similarly, although simpler, CREATE TABLE AS and CREATE MATERIALIZED VIEW are also only going to be adjusted in a later commit. Author: Andres Freund and Haribabu Kommi Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20190313003903.nwvrxi7rw3ywhdel@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
* Remove unused #includePeter Eisentraut2019-03-14
|
* tableam: Add and use scan APIs.Andres Freund2019-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Too allow table accesses to be not directly dependent on heap, several new abstractions are needed. Specifically: 1) Heap scans need to be generalized into table scans. Do this by introducing TableScanDesc, which will be the "base class" for individual AMs. This contains the AM independent fields from HeapScanDesc. The previous heap_{beginscan,rescan,endscan} et al. have been replaced with a table_ version. There's no direct replacement for heap_getnext(), as that returned a HeapTuple, which is undesirable for a other AMs. Instead there's table_scan_getnextslot(). But note that heap_getnext() lives on, it's still used widely to access catalog tables. This is achieved by new scan_begin, scan_end, scan_rescan, scan_getnextslot callbacks. 2) The portion of parallel scans that's shared between backends need to be able to do so without the user doing per-AM work. To achieve that new parallelscan_{estimate, initialize, reinitialize} callbacks are introduced, which operate on a new ParallelTableScanDesc, which again can be subclassed by AMs. As it is likely that several AMs are going to be block oriented, block oriented callbacks that can be shared between such AMs are provided and used by heap. table_block_parallelscan_{estimate, intiialize, reinitialize} as callbacks, and table_block_parallelscan_{nextpage, init} for use in AMs. These operate on a ParallelBlockTableScanDesc. 3) Index scans need to be able to access tables to return a tuple, and there needs to be state across individual accesses to the heap to store state like buffers. That's now handled by introducing a sort-of-scan IndexFetchTable, which again is intended to be subclassed by individual AMs (for heap IndexFetchHeap). The relevant callbacks for an AM are index_fetch_{end, begin, reset} to create the necessary state, and index_fetch_tuple to retrieve an indexed tuple. Note that index_fetch_tuple implementations need to be smarter than just blindly fetching the tuples for AMs that have optimizations similar to heap's HOT - the currently alive tuple in the update chain needs to be fetched if appropriate. Similar to table_scan_getnextslot(), it's undesirable to continue to return HeapTuples. Thus index_fetch_heap (might want to rename that later) now accepts a slot as an argument. Core code doesn't have a lot of call sites performing index scans without going through the systable_* API (in contrast to loads of heap_getnext calls and working directly with HeapTuples). Index scans now store the result of a search in IndexScanDesc->xs_heaptid, rather than xs_ctup->t_self. As the target is not generally a HeapTuple anymore that seems cleaner. To be able to sensible adapt code to use the above, two further callbacks have been introduced: a) slot_callbacks returns a TupleTableSlotOps* suitable for creating slots capable of holding a tuple of the AMs type. table_slot_callbacks() and table_slot_create() are based upon that, but have additional logic to deal with views, foreign tables, etc. While this change could have been done separately, nearly all the call sites that needed to be adapted for the rest of this commit also would have been needed to be adapted for table_slot_callbacks(), making separation not worthwhile. b) tuple_satisfies_snapshot checks whether the tuple in a slot is currently visible according to a snapshot. That's required as a few places now don't have a buffer + HeapTuple around, but a slot (which in heap's case internally has that information). Additionally a few infrastructure changes were needed: I) SysScanDesc, as used by systable_{beginscan, getnext} et al. now internally uses a slot to keep track of tuples. While systable_getnext() still returns HeapTuples, and will so for the foreseeable future, the index API (see 1) above) now only deals with slots. The remainder, and largest part, of this commit is then adjusting all scans in postgres to use the new APIs. Author: Andres Freund, Haribabu Kommi, Alvaro Herrera Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de https://postgr.es/m/20160812231527.GA690404@alvherre.pgsql
* Store tuples for EvalPlanQual in slots, rather than as HeapTuples.Andres Freund2019-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | For the upcoming pluggable table access methods it's quite inconvenient to store tuples as HeapTuples, as that'd require converting tuples from a their native format into HeapTuples. Instead use slots to manage epq tuples. To fit into that scheme, change the foreign data wrapper callback RefetchForeignRow, to store the tuple in a slot. Insist on using the caller provided slot, so it conveniently can be stored in the corresponding EPQ slot. As there is no in core user of RefetchForeignRow, that change was done blindly, but we plan to test that soon. To avoid duplicating that work for row locks, move row locks to just directly use the EPQ slots - it previously temporarily stored tuples in LockRowsState.lr_curtuples, but that doesn't seem beneficial, given we'd possibly end up with a significant number of additional slots. The behaviour of es_epqTupleSet[rti -1] is now checked by es_epqTupleSlot[rti -1] != NULL, as that is distinguishable from a slot containing an empty tuple. Author: Andres Freund, Haribabu Kommi, Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Use slots in trigger infrastructure, except for the actual invocation.Andres Freund2019-02-26
| | | | | | | | | | | | | | | | | | | | | In preparation for abstracting table storage, convert trigger.c to track tuples in slots. Which also happens to make code calling triggers simpler. As the calling interface for triggers themselves is not changed in this patch, HeapTuples still are extracted from the slot at that time. But that's handled solely inside trigger.c, not visible to callers. It's quite likely that we'll want to revise the external trigger interface, but that's a separate large project. As part of this work the slots used for old/new/return tuples are moved from EState into ResultRelInfo, as different updated tables might need different slots. The slots are now also now created on-demand, which is good both from an efficiency POV, but also makes the modifying code simpler. Author: Andres Freund, Amit Khandekar and Ashutosh Bapat Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Refactor planner's header files.Tom Lane2019-01-29
| | | | | | | | | | | | | | | | | | | | | | | | Create a new header optimizer/optimizer.h, which exposes just the planner functions that can be used "at arm's length", without need to access Paths or the other planner-internal data structures defined in nodes/relation.h. This is intended to provide the whole planner API seen by most of the rest of the system; although FDWs still need to use additional stuff, and more thought is also needed about just what selfuncs.c should rely on. The main point of doing this now is to limit the amount of new #include baggage that will be needed by "planner support functions", which I expect to introduce later, and which will be in relevant datatype modules rather than anywhere near the planner. This commit just moves relevant declarations into optimizer.h from other header files (a couple of which go away because everything got moved), and adjusts #include lists to match. There's further cleanup that could be done if we want to decide that some stuff being exposed by optimizer.h doesn't belong in the planner at all, but I'll leave that for another day. Discussion: https://postgr.es/m/11460.1548706639@sss.pgh.pa.us
* Remove superfluous tqual.h includes.Andres Freund2019-01-21
| | | | | | | | | | | | Most of these had been obsoleted by 568d4138c / the SnapshotNow removal. This is is preparation for moving most of tqual.[ch] into either snapmgr.h or heapam.h, which in turn is in preparation for pluggable table AMs. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Replace uses of heap_open et al with the corresponding table_* function.Andres Freund2019-01-21
| | | | | Author: Andres Freund Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
* Don't include heapam.h from others headers.Andres Freund2019-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | heapam.h previously was included in a number of widely used headers (e.g. execnodes.h, indirectly in executor.h, ...). That's problematic on its own, as heapam.h contains a lot of low-level details that don't need to be exposed that widely, but becomes more problematic with the upcoming introduction of pluggable table storage - it seems inappropriate for heapam.h to be included that widely afterwards. heapam.h was largely only included in other headers to get the HeapScanDesc typedef (which was defined in heapam.h, even though HeapScanDescData is defined in relscan.h). The better solution here seems to be to just use the underlying struct (forward declared where necessary). Similar for BulkInsertState. Another problem was that LockTupleMode was used in executor.h - parts of the file tried to cope without heapam.h, but due to the fact that it indirectly included it, several subsequent violations of that goal were not not noticed. We could just reuse the approach of declaring parameters as int, but it seems nicer to move LockTupleMode to lockoptions.h - that's not a perfect location, but also doesn't seem bad. As a number of files relied on implicitly included heapam.h, a significant number of files grew an explicit include. It's quite probably that a few external projects will need to do the same. Author: Andres Freund Reviewed-By: Alvaro Herrera Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Fix thinko in previous commitAlvaro Herrera2018-12-28
|
* Rewrite ExecPartitionCheckEmitError for clarityAlvaro Herrera2018-12-28
| | | | | | The original was hard to follow and failed to comply with DRY principle. Discussion: https://postgr.es/m/20181206222221.g5witbsklvqthjll@alvherre.pgsql
* Remove WITH OIDS support, change oid catalog column visibility.Andres Freund2018-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously tables declared WITH OIDS, including a significant fraction of the catalog tables, stored the oid column not as a normal column, but as part of the tuple header. This special column was not shown by default, which was somewhat odd, as it's often (consider e.g. pg_class.oid) one of the more important parts of a row. Neither pg_dump nor COPY included the contents of the oid column by default. The fact that the oid column was not an ordinary column necessitated a significant amount of special case code to support oid columns. That already was painful for the existing, but upcoming work aiming to make table storage pluggable, would have required expanding and duplicating that "specialness" significantly. WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0). Remove it. Removing includes: - CREATE TABLE and ALTER TABLE syntax for declaring the table to be WITH OIDS has been removed (WITH (oids[ = true]) will error out) - pg_dump does not support dumping tables declared WITH OIDS and will issue a warning when dumping one (and ignore the oid column). - restoring an pg_dump archive with pg_restore will warn when restoring a table with oid contents (and ignore the oid column) - COPY will refuse to load binary dump that includes oids. - pg_upgrade will error out when encountering tables declared WITH OIDS, they have to be altered to remove the oid column first. - Functionality to access the oid of the last inserted row (like plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed. The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false) for CREATE TABLE) is still supported. While that requires a bit of support code, it seems unnecessary to break applications / dumps that do not use oids, and are explicit about not using them. The biggest user of WITH OID columns was postgres' catalog. This commit changes all 'magic' oid columns to be columns that are normally declared and stored. To reduce unnecessary query breakage all the newly added columns are still named 'oid', even if a table's column naming scheme would indicate 'reloid' or such. This obviously requires adapting a lot code, mostly replacing oid access via HeapTupleGetOid() with access to the underlying Form_pg_*->oid column. The bootstrap process now assigns oids for all oid columns in genbki.pl that do not have an explicit value (starting at the largest oid previously used), only oids assigned later by oids will be above FirstBootstrapObjectId. As the oid column now is a normal column the special bootstrap syntax for oids has been removed. Oids are not automatically assigned during insertion anymore, all backend code explicitly assigns oids with GetNewOidWithIndex(). For the rare case that insertions into the catalog via SQL are called for the new pg_nextoid() function can be used (which only works on catalog tables). The fact that oid columns on system tables are now normal columns means that they will be included in the set of columns expanded by * (i.e. SELECT * FROM pg_class will now include the table's oid, previously it did not). It'd not technically be hard to hide oid column by default, but that'd mean confusing behavior would either have to be carried forward forever, or it'd cause breakage down the line. While it's not unlikely that further adjustments are needed, the scope/invasiveness of the patch makes it worthwhile to get merge this now. It's painful to maintain externally, too complicated to commit after the code code freeze, and a dependency of a number of other patches. Catversion bump, for obvious reasons. Author: Andres Freund, with contributions by John Naylor Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
* Redesign initialization of partition routing structuresAlvaro Herrera2018-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well as the future MERGE) on partitioned tables. This changes the setup for tuple routing so that it does far less work during the initial setup and pushes more work out to when partitions receive tuples. PartitionDispatchData structs for sub-partitioned tables are only created when a tuple gets routed through it. The possibly large arrays in the PartitionTupleRouting struct have largely been removed. The partitions[] array remains but now never contains any NULL gaps. Previously the NULLs had to be skipped during ExecCleanupTupleRouting(), which could add a large overhead to the cleanup when the number of partitions was large. The partitions[] array is allocated small to start with and only enlarged when we route tuples to enough partitions that it runs out of space. This allows us to keep simple single-row partition INSERTs running quickly. Redesign The arrays in PartitionTupleRouting which stored the tuple translation maps have now been removed. These have been moved out into a PartitionRoutingInfo struct which is an additional field in ResultRelInfo. The find_all_inheritors() call still remains by far the slowest part of ExecSetupPartitionTupleRouting(). This commit just removes the other slow parts. In passing also rename the tuple translation maps from being ParentToChild and ChildToParent to being RootToPartition and PartitionToRoot. The old names mislead you into thinking that a partition of some sub-partitioned table would translate to the rowtype of the sub-partitioned table rather than the root partitioned table. Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera Testing help from Jesper Pedersen and Kato Sho. Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com
* Introduce notion of different types of slots (without implementing them).Andres Freund2018-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upcoming work intends to allow pluggable ways to introduce new ways of storing table data. Accessing those table access methods from the executor requires TupleTableSlots to be carry tuples in the native format of such storage methods; otherwise there'll be a significant conversion overhead. Different access methods will require different data to store tuples efficiently (just like virtual, minimal, heap already require fields in TupleTableSlot). To allow that without requiring additional pointer indirections, we want to have different structs (embedding TupleTableSlot) for different types of slots. Thus different types of slots are needed, which requires adapting creators of slots. The slot that most efficiently can represent a type of tuple in an executor node will often depend on the type of slot a child node uses. Therefore we need to track the type of slot is returned by nodes, so parent slots can create slots based on that. Relatedly, JIT compilation of tuple deforming needs to know which type of slot a certain expression refers to, so it can create an appropriate deforming function for the type of tuple in the slot. But not all nodes will only return one type of slot, e.g. an append node will potentially return different types of slots for each of its subplans. Therefore add function that allows to query the type of a node's result slot, and whether it'll always be the same type (whether it's fixed). This can be queried using ExecGetResultSlotOps(). The scan, result, inner, outer type of slots are automatically inferred from ExecInitScanTupleSlot(), ExecInitResultSlot(), left/right subtrees respectively. If that's not correct for a node, that can be overwritten using new fields in PlanState. This commit does not introduce the actually abstracted implementation of different kind of TupleTableSlots, that will be left for a followup commit. The different types of slots introduced will, for now, still use the same backing implementation. While this already partially invalidates the big comment in tuptable.h, it seems to make more sense to update it later, when the different TupleTableSlot implementations actually exist. Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
* Rejigger materializing and fetching a HeapTuple from a slot.Andres Freund2018-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | Previously materializing a slot always returned a HeapTuple. As current work aims to reduce the reliance on HeapTuples (so other storage systems can work efficiently), that needs to change. Thus split the tasks of materializing a slot (i.e. making it independent from the underlying storage / other memory contexts) from fetching a HeapTuple from the slot. For brevity, allow to fetch a HeapTuple from a slot and materializing the slot at the same time, controlled by a parameter. For now some callers of ExecFetchSlotHeapTuple, with materialize = true, expect that changes to the heap tuple will be reflected in the underlying slot. Those places will be adapted in due course, so while not pretty, that's OK for now. Also rename ExecFetchSlotTuple to ExecFetchSlotHeapTupleDatum and ExecFetchSlotTupleDatum to ExecFetchSlotHeapTupleDatum, as it's likely that future storage methods will need similar methods. There already is ExecFetchSlotMinimalTuple, so the new names make the naming scheme more coherent. Author: Ashutosh Bapat and Andres Freund, with changes by Amit Khandekar Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
* Improve some comments related to executor result relations.Tom Lane2018-10-17
| | | | | | | | es_leaf_result_relations doesn't exist; perhaps this was an old name for es_tuple_routing_result_relations, or maybe this comment has gone unmaintained through multiple rounds of whacking the code around. Related comment in execnodes.h was both obsolete and ungrammatical.
* Avoid O(N^2) cost in ExecFindRowMark().Tom Lane2018-10-08
| | | | | | | | | | | | | | | | | | | | If there are many ExecRowMark structs, we spent O(N^2) time in ExecFindRowMark during executor startup. Once upon a time this was not of great concern, but the addition of native partitioning has squeezed out enough other costs that this can become the dominant overhead in some use-cases for tables with many partitions. To fix, simply replace that List data structure with an array. This adds a little bit of cost to execCurrentOf(), but not much, and anyway that code path is neither of large importance nor very efficient now. If we ever decide it is a bottleneck, constructing a hash table for lookup-by-tableoid would likely be the thing to do. Per complaint from Amit Langote, though this is different from his fix proposal. Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
* Restore sane locking behavior during parallel query.Tom Lane2018-10-06
| | | | | | | | | | | | | | Commit 9a3cebeaa changed things so that parallel workers didn't obtain any lock of their own on tables they access. That was clearly a bad idea, but I'd mistakenly supposed that it was the intended end result of the series of patches for simplifying the executor's lock management. Undo that change in relation_open(), and adjust ExecOpenScanRelation() so that it gets the correct lock if inside a parallel worker. In passing, clean up some more obsolete comments about when locks are acquired. Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp
* Remove more redundant relation locking during executor startup.Tom Lane2018-10-06
| | | | | | | | | | | | | | | | | We already have appropriate locks on every relation listed in the query's rangetable before we reach the executor. Take the next step in exploiting that knowledge by removing code that worries about taking locks on non-leaf result relations in a partitioned table. In particular, get rid of ExecLockNonLeafAppendTables and a stanza in InitPlan that asserts we already have locks on certain such tables. In passing, clean up some now-obsolete comments in InitPlan. Amit Langote, reviewed by David Rowley and Jesper Pedersen, and whacked around a bit more by me Discussion: https://postgr.es/m/468c85d9-540e-66a2-1dde-fec2b741e688@lab.ntt.co.jp