aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor
Commit message (Collapse)AuthorAge
...
* tidbitmap: Support shared iteration.Robert Haas2017-03-08
| | | | | | | | | | | | | | | | When a shared iterator is used, each call to tbm_shared_iterate() returns a result that has not yet been returned to any process attached to the shared iterator. In other words, each cooperating processes gets a disjoint subset of the full result set, but all results are returned exactly once. This is infrastructure for parallel bitmap heap scan. Dilip Kumar. The larger patch set of which this is a part has been reviewed and tested by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia Sabih, Haribabu Kommi, and Thomas Munro. Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
* Improve error reporting for tuple-routing failures.Robert Haas2017-03-03
| | | | | | | | | | | Currently, the whole row is shown without column names. Instead, adopt a style similar to _bt_check_unique() in ExecFindPartition() and show the failing key: (key1, ...) = (val1, ...). Amit Langote, per a complaint from Simon Riggs. Reviewed by me; I also adjusted the grammar in one of the comments. Discussion: http://postgr.es/m/9f9dc7ae-14f0-4a25-5485-964d9bfc19bd@lab.ntt.co.jp
* Refactor bitmap heap scan in preparation for parallel support.Robert Haas2017-03-02
| | | | | | | | | | The final patch will be less messy if the prefetching support is a bit better isolated, so do that. Dilip Kumar, with some changes by me. The larger patch set of which this is a part has been reviewed and tested by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia Sabih, Haribabu Kommi, and Thomas Munro.
* Use proper enum constants for LockWaitPolicyPeter Eisentraut2017-02-28
|
* Allow index AMs to return either HeapTuple or IndexTuple format during IOS.Tom Lane2017-02-27
| | | | | | | | | | | | | | | | | | | | | | Previously, only IndexTuple format was supported for the output data of an index-only scan. This is fine for btree, which is just returning a verbatim index tuple anyway. It's not so fine for SP-GiST, which can return reconstructed data that's much larger than a page. To fix, extend the index AM API so that index-only scan data can be returned in either HeapTuple or IndexTuple format. There's other ways we could have done it, but this way avoids an API break for index AMs that aren't concerned with the issue, and it costs little except a couple more fields in IndexScanDescs. I changed both GiST and SP-GiST to use the HeapTuple method. I'm not very clear on whether GiST can reconstruct data that's too large for an IndexTuple, but that seems possible, and it's not much of a code change to fix. Per a complaint from Vik Fearing. Reviewed by Jason Li. Discussion: https://postgr.es/m/49527f79-530d-0bfe-3dad-d183596afa92@2ndquadrant.fr
* Allow custom and foreign scans to have shutdown callbacks.Robert Haas2017-02-26
| | | | | | | | | | | | This is expected to be useful mostly when performing such scans in parallel, because in that case it allows (in combination with commit acf555bc53acb589b5a2827e65d655fa8c9adee0) nodes below a Gather to get control just before the DSM segment goes away. KaiGai Kohei, except that I rewrote the documentation. Reviewed by Claudio Freire. Discussion: http://postgr.es/m/CADyhKSXJK0jUJ8rWv4AmKDhsUh124_rEn39eqgfC5D8fu6xVuw@mail.gmail.com
* Consistently declare timestamp variables as TimestampTz.Tom Lane2017-02-23
| | | | | | | | | | | | | | | | | | | Twiddle the replication-related code so that its timestamp variables are declared TimestampTz, rather than the uninformative "int64" that was previously used for meant-to-be-always-integer timestamps. This resolves the int64-vs-TimestampTz declaration inconsistencies introduced by commit 7c030783a, though in the opposite direction to what was originally suggested. This required including datatype/timestamp.h in a couple more places than before. I decided it would be a good idea to slim down that header by not having it pull in <float.h> etc, as those headers are no longer at all relevant to its purpose. Unsurprisingly, a small number of .c files turn out to have been depending on those inclusions, so add them back in the .c files as needed. Discussion: https://postgr.es/m/26788.1487455319@sss.pgh.pa.us Discussion: https://postgr.es/m/27694.1487456324@sss.pgh.pa.us
* Pass the source text for a parallel query to the workers.Robert Haas2017-02-22
| | | | | | | | | With this change, you can see the query that a parallel worker is executing in pg_stat_activity, and if the worker crashes you can see what query it was executing when it crashed. Rafia Sabih, reviewed by Kuntal Ghosh and Amit Kapila and slightly revised by me.
* Shut down Gather's children before shutting down Gather itself.Robert Haas2017-02-22
| | | | | | | | | | | | | It turns out that the original shutdown order here does not work well. Multiple people attempting to develop further parallel query patches have discovered that they need to do cleanup before the DSM goes away, and you can't do that if the parent node gets cleaned up first. Patch by me, reviewed by KaiGai Kohei and Dilip Kumar. Discussion: http://postgr.es/m/CA+TgmoY6bOc1YnhcAQnMfCBDbsJzROQ3sYxSAL-SYB5tMJcTKg@mail.gmail.com Discussion: http://postgr.es/m/9A28C8860F777E439AA12E8AEA7694F8012AEB82@BPXM15GP.gisp.nec.co.jp Discussion: http://postgr.es/m/CA+TgmoYuPOc=+xrG1v0fCsoLbKAab9F1ddOeaaiLMzKOiBar1Q@mail.gmail.com
* Add optimizer and executor support for parallel index-only scans.Robert Haas2017-02-19
| | | | | | | | | | | | | Commit 5262f7a4fc44f651241d2ff1fa688dd664a34874 added similar support for parallel index scans; this extends that work to index-only scans. As with parallel index scans, this requires support from the index AM, so currently parallel index-only scans will only be possible for btree indexes. Rafia Sabih, reviewed and tested by Rahila Syed, Tushar Ahuja, and Amit Kapila Discussion: http://postgr.es/m/CAOGQiiPEAs4C=TBp0XShxBvnWXuzGL2u++Hm1=qnCpd6_Mf8Fw@mail.gmail.com
* Make sure that hash join's bulk-tuple-transfer loops are interruptible.Tom Lane2017-02-15
| | | | | | | | | | | | | | | | | | The loops in ExecHashJoinNewBatch(), ExecHashIncreaseNumBatches(), and ExecHashRemoveNextSkewBucket() are all capable of iterating over many tuples without ever doing a CHECK_FOR_INTERRUPTS, so that the backend might fail to respond to SIGINT or SIGTERM for an unreasonably long time. Fix that. In the case of ExecHashJoinNewBatch(), it seems useful to put the added CHECK_FOR_INTERRUPTS into ExecHashJoinGetSavedTuple() rather than directly in the loop, because that will also ensure that both principal code paths through ExecHashJoinOuterGetTuple() will do a CHECK_FOR_INTERRUPTS, which seems like a good idea to avoid surprises. Back-patch to all supported branches. Tom Lane and Thomas Munro Discussion: https://postgr.es/m/6044.1487121720@sss.pgh.pa.us
* Add optimizer and executor support for parallel index scans.Robert Haas2017-02-15
| | | | | | | | | | | | In combination with 569174f1be92be93f5366212cc46960d28a5c5cd, which taught the btree AM how to perform parallel index scans, this allows parallel index scan plans on btree indexes. This infrastructure should be general enough to support parallel index scans for other index AMs as well, if someone updates them to support parallel scans. Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar Ahuja, and Haribabu Kommi, and me.
* Allow parallel workers to execute subplans.Robert Haas2017-02-14
| | | | | | | | | | | | | | | This doesn't do anything to make Param nodes anything other than parallel-restricted, so this only helps with uncorrelated subplans, and it's not necessarily very cheap because each worker will run the subplan separately (just as a Hash Join will build a separate copy of the hash table in each participating process), but it's a first step toward supporting cases that are more likely to help in practice, and is occasionally useful on its own. Amit Kapila, reviewed and tested by Rafia Sabih, Dilip Kumar, and me. Discussion: http://postgr.es/m/CAA4eK1+e8Z45D2n+rnDMDYsVEb5iW7jqaCH_tvPMYau=1Rru9w@mail.gmail.com
* simplehash: Additional tweaks to make specifying an allocator work.Robert Haas2017-02-09
| | | | | | | | | | | | | Even if we don't emit definitions for SH_ALLOCATE and SH_FREE, we still need prototypes. The user can't define them before including simplehash.h because SH_TYPE isn't available yet. For the allocator to be able to access private_data, it needs to become an argument to SH_CREATE. Previously we relied on callers to set that after returning from SH_CREATE, but SH_CREATE calls SH_ALLOCATE before returning. Dilip Kumar, reviewed by me.
* Allow index AMs to cache data across aminsert calls within a SQL command.Tom Lane2017-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | It's always been possible for index AMs to cache data across successive amgettuple calls within a single SQL command: the IndexScanDesc.opaque field is meant for precisely that. However, no comparable facility exists for amortizing setup work across successive aminsert calls. This patch adds such a feature and teaches GIN, GIST, and BRIN to use it to amortize catalog lookups they'd previously been doing on every call. (The other standard index AMs keep everything they need in the relcache, so there's little to improve there.) For GIN, the overall improvement in a statement that inserts many rows can be as much as 10%, though it seems a bit less for the other two. In addition, this makes a really significant difference in runtime for CLOBBER_CACHE_ALWAYS tests, since in those builds the repeated catalog lookups are vastly more expensive. The reason this has been hard up to now is that the aminsert function is not passed any useful place to cache per-statement data. What I chose to do is to add suitable fields to struct IndexInfo and pass that to aminsert. That's not widening the index AM API very much because IndexInfo is already within the ken of ambuild; in fact, by passing the same info to aminsert as to ambuild, this is really removing an inconsistency in the AM API. Discussion: https://postgr.es/m/27568.1486508680@sss.pgh.pa.us
* Revise the way the element allocator for a simplehash is specified.Robert Haas2017-02-07
| | | | | | | This method is more elegant and more efficient. Per a suggestion from Andres Freund, who also briefly reviewed the patch.
* Allow the element allocator for a simplehash to be specified.Robert Haas2017-02-07
| | | | | | | | This is infrastructure for a pending patch to allow parallel bitmap heap scans. Dilip Kumar, reviewed (in earlier versions) by Andres Freund and (more recently) by me. Some further renaming by me, also.
* Fix typos in comments.Heikki Linnakangas2017-02-06
| | | | | | | | | Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com
* Remove redundant comment.Robert Haas2017-02-03
| | | | Rafia Sabih
* Use castNode() in a bunch of statement-list-related code.Tom Lane2017-01-26
| | | | | | | | | | | | | When I wrote commit ab1f0c822, I really missed the castNode() macro that Peter E. had proposed shortly before. This back-fills the uses I would have put it to. It's probably not all that significant, but there are more assertions here than there were before, and conceivably they will help catch any bugs associated with those representation changes. I left behind a number of usages like "(Query *) copyObject(query_var)". Those could have been converted as well, but Peter has proposed another notational improvement that would handle copyObject cases automatically, so I let that be for now.
* Use the new castNode() macro in a number of places.Andres Freund2017-01-26
| | | | | | | | | This is far from a pervasive conversion, but it's a good starting point. Author: Peter Eisentraut, with some minor changes by me Reviewed-By: Tom Lane Discussion: https://postgr.es/m/c5d387d9-3440-f5e0-f9d4-71d53b9fbe52@2ndquadrant.com
* Update copyright years in some recently added filesPeter Eisentraut2017-01-25
|
* Fix things so that updatable views work with partitioned tables.Robert Haas2017-01-24
| | | | | | | | Previously, ExecInitModifyTable was missing handling for WITH CHECK OPTION, and view_query_is_auto_updatable was missing handling for RELKIND_PARTITIONED_TABLE. Amit Langote, reviewed by me.
* Set ecxt_scantuple correctly for tuple routing.Robert Haas2017-01-24
| | | | | | | | | | | | In 2ac3ef7a01df859c62d0a02333b646d65eaec5ff, we changed things so that it's possible for a different TupleTableSlot to be used for partitioned tables at successively lower levels. If we do end up changing the slot from the original, we must update ecxt_scantuple to point to the new one for partition key of the tuple to be computed correctly. Reported by Rajkumar Raghuwanshi. Patch by Amit Langote. Discussion: http://postgr.es/m/CAKcux6%3Dm1qyqB2k6cjniuMMrYXb75O-MB4qGQMu8zg-iGGLjDw%40mail.gmail.com
* Reindent table partitioning code.Robert Haas2017-01-24
| | | | | | We've accumulated quite a bit of stuff with which pgindent is not quite happy in this code; clean it up to provide a less-annoying base for future pgindent runs.
* Remove no-longer-needed loop in ExecGather().Tom Lane2017-01-22
| | | | | | | | | Coverity complained quite properly that commit ea15e1867 had introduced unreachable code into ExecGather(); to wit, it was no longer possible to iterate the final for-loop more or less than once. So remove the for(). In passing, clean up a couple of comments, and make better use of a local variable.
* Logical replicationPeter Eisentraut2017-01-20
| | | | | | | | | | | | | - Add PUBLICATION catalogs and DDL - Add SUBSCRIPTION catalog and DDL - Define logical replication protocol and output plugin - Add logical replication workers From: Petr Jelinek <petr@2ndquadrant.com> Reviewed-by: Steve Singer <steve@ssinger.info> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Erik Rijkers <er@xs4all.nl> Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
* Remove obsoleted code relating to targetlist SRF evaluation.Andres Freund2017-01-19
| | | | | | | | | | | | | Since 69f4b9c plain expression evaluation (and thus normal projection) can't return sets of tuples anymore. Thus remove code dealing with that possibility. This will require adjustments in external code using ExecEvalExpr()/ExecProject() - that should neither be hard nor very common. Author: Andres Freund and Tom Lane Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
* Fix RETURNING to work correctly with partition tuple routing.Robert Haas2017-01-19
| | | | | | | | | | In ExecInsert(), do not switch back to the root partitioned table ResultRelInfo until after we finish ExecProcessReturning(), so that RETURNING projection is done using the partition's descriptor. For the projection to work correctly, we must initialize the same for each leaf partition during ModifyTableState initialization. Amit Langote
* Fix failure to enforce partitioning contraint for internal partitions.Robert Haas2017-01-19
| | | | | | | | | | | | | When a tuple is inherited into a partitioning root, no partition constraints need to be enforced; when it is inserted into a leaf, the parent's partitioning quals needed to be enforced. The previous coding got both of those cases right. When a tuple is inserted into an intermediate level of the partitioning hierarchy (i.e. a table which is both a partition itself and in turn partitioned), it must enforce the partitioning qual inherited from its parent. That case got overlooked; repair. Amit Langote
* Move targetlist SRF handling from expression evaluation to new executor node.Andres Freund2017-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT generate_series(1,5)) so far was done in the expression evaluation (i.e. ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code. This meant that most executor nodes performing projection, and most expression evaluation functions, had to deal with the possibility that an evaluated expression could return a set of return values. That's bad because it leads to repeated code in a lot of places. It also, and that's my (Andres's) motivation, made it a lot harder to implement a more efficient way of doing expression evaluation. To fix this, introduce a new executor node (ProjectSet) that can evaluate targetlists containing one or more SRFs. To avoid the complexity of the old way of handling nested expressions returning sets (e.g. having to pass up ExprDoneCond, and dealing with arguments to functions returning sets etc.), those SRFs can only be at the top level of the node's targetlist. The planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is only necessary in ProjectSet nodes and that SRFs are only present at the top level of the node's targetlist. If there are nested SRFs the planner creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get input from an underlying node. We also discussed and prototyped evaluating targetlist SRFs using ROWS FROM(), but that turned out to be more complicated than we'd hoped. While moving SRF evaluation to ProjectSet would allow to retain the old "least common multiple" behavior when multiple SRFs are present in one targetlist (i.e. continue returning rows until all SRFs are at the end of their input at the same time), we decided to instead only return rows till all SRFs are exhausted, returning NULL for already exhausted ones. We deemed the previous behavior to be too confusing, unexpected and actually not particularly useful. As a side effect, the previously prohibited case of multiple set returning arguments to a function, is now allowed. Not because it's particularly desirable, but because it ends up working and there seems to be no argument for adding code to prohibit it. Currently the behavior for COALESCE and CASE containing SRFs has changed, returning multiple rows from the expression, even when the SRF containing "arm" of the expression is not evaluated. That's because the SRFs are evaluated in a separate ProjectSet node. As that's quite confusing, we're likely to instead prohibit SRFs in those places. But that's still being discussed, and the code would reside in places not touched here, so that's a task for later. There's a lot of, now superfluous, code dealing with set return expressions around. But as the changes to get rid of those are verbose largely boring, it seems better for readability to keep the cleanup as a separate commit. Author: Tom Lane and Andres Freund Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
* Generate fmgr prototypes automaticallyPeter Eisentraut2017-01-17
| | | | | | | | | | | | Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains prototypes for all functions registered in pg_proc.h. This avoids having to manually maintain these prototypes across a random variety of header files. It also automatically enforces a correct function signature, and since there are warnings about missing prototypes, it will detect functions that are defined but not registered in pg_proc.h (or otherwise used). Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
* Change representation of statement lists, and add statement location info.Tom Lane2017-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes several changes that improve the consistency of representation of lists of statements. It's always been the case that the output of parse analysis is a list of Query nodes, whatever the types of the individual statements in the list. This patch brings similar consistency to the outputs of raw parsing and planning steps: * The output of raw parsing is now always a list of RawStmt nodes; the statement-type-dependent nodes are one level down from that. * The output of pg_plan_queries() is now always a list of PlannedStmt nodes, even for utility statements. In the case of a utility statement, "planning" just consists of wrapping a CMD_UTILITY PlannedStmt around the utility node. This list representation is now used in Portal and CachedPlan plan lists, replacing the former convention of intermixing PlannedStmts with bare utility-statement nodes. Now, every list of statements has a consistent head-node type depending on how far along it is in processing. This allows changing many places that formerly used generic "Node *" pointers to use a more specific pointer type, thus reducing the number of IsA() tests and casts needed, as well as improving code clarity. Also, the post-parse-analysis representation of DECLARE CURSOR is changed so that it looks more like EXPLAIN, PREPARE, etc. That is, the contained SELECT remains a child of the DeclareCursorStmt rather than getting flipped around to be the other way. It's now true for both Query and PlannedStmt that utilityStmt is non-null if and only if commandType is CMD_UTILITY. That allows simplifying a lot of places that were testing both fields. (I think some of those were just defensive programming, but in many places, it was actually necessary to avoid confusing DECLARE CURSOR with SELECT.) Because PlannedStmt carries a canSetTag field, we're also able to get rid of some ad-hoc rules about how to reconstruct canSetTag for a bare utility statement; specifically, the assumption that a utility is canSetTag if and only if it's the only one in its list. While I see no near-term need for relaxing that restriction, it's nice to get rid of the ad-hocery. The API of ProcessUtility() is changed so that what it's passed is the wrapper PlannedStmt not just the bare utility statement. This will affect all users of ProcessUtility_hook, but the changes are pretty trivial; see the affected contrib modules for examples of the minimum change needed. (Most compilers should give pointer-type-mismatch warnings for uncorrected code.) There's also a change in the API of ExplainOneQuery_hook, to pass through cursorOptions instead of expecting hook functions to know what to pick. This is needed because of the DECLARE CURSOR changes, but really should have been done in 9.6; it's unlikely that any extant hook functions know about using CURSOR_OPT_PARALLEL_OK. Finally, teach gram.y to save statement boundary locations in RawStmt nodes, and pass those through to Query and PlannedStmt nodes. This allows more intelligent handling of cases where a source query string contains multiple statements. This patch doesn't actually do anything with the information, but a follow-on patch will. (Passing this information through cleanly is the true motivation for these changes; while I think this is all good cleanup, it's unlikely we'd have bothered without this end goal.) catversion bump because addition of location fields to struct Query affects stored rules. This patch is by me, but it owes a good deal to Fabien Coelho who did a lot of preliminary work on the problem, and also reviewed the patch. Discussion: https://postgr.es/m/alpine.DEB.2.20.1612200926310.29821@lancre
* Throw suitable error for COPY TO STDOUT/FROM STDIN in a SQL function.Tom Lane2017-01-14
| | | | | | | | | | | | | | | | | A client copy can't work inside a function because the FE/BE wire protocol doesn't support nesting of a COPY operation within query results. (Maybe it could, but the protocol spec doesn't suggest that clients should support this, and libpq for one certainly doesn't.) In most PLs, this prohibition is enforced by spi.c, but SQL functions don't use SPI. A comparison of _SPI_execute_plan() and init_execution_state() shows that rejecting client COPY is the only discrepancy in what they allow, so there's no other similar bugs. This is an astonishingly ancient oversight, so back-patch to all supported branches. Report: https://postgr.es/m/BY2PR05MB2309EABA3DEFA0143F50F0D593780@BY2PR05MB2309.namprd05.prod.outlook.com
* Fix incorrect function name in comment.Robert Haas2017-01-12
| | | | Amit Langote
* Repair commit b81b5a96f424531b97cdd1dba97d9d1b9c9d372e.Robert Haas2017-01-06
| | | | | | This commit purported to use a variable hash seed for Partial HashAggregate, but actually did the opposite - it made us use a variable seed for any HashAggregate that is NOT partial. Woops.
* Fix possible crash reading pg_stat_activity.Robert Haas2017-01-05
| | | | | | | | | | With the old code, a backend that read pg_stat_activity without ever having executed a parallel query might see a backend in the midst of executing one waiting on a DSA LWLock, resulting in a crash. The solution is for backends to register the tranche at startup time, not the first time a parallel query is executed. Report by Andreas Seltenreich. Patch by me, reviewed by Thomas Munro.
* Remove unnecessary arguments from partitioning functions.Robert Haas2017-01-04
| | | | | | | RelationGetPartitionQual() and generate_partition_qual() are always called with recurse = true, so we don't need an argument for that. Extracted by me from a larger patch by Amit Langote.
* Fix reporting of constraint violations for table partitioning.Robert Haas2017-01-04
| | | | | | | | | After a tuple is routed to a partition, it has been converted from the root table's row type to the partition's row type. ExecConstraints needs to report the failure using the original tuple and the parent's tuple descriptor rather than the ones for the selected partition. Amit Langote
* Move partition_tuple_slot out of EState.Robert Haas2017-01-04
| | | | | | | | | | | Commit 2ac3ef7a01df859c62d0a02333b646d65eaec5ff added a TupleTapleSlot for partition tuple slot to EState (es_partition_tuple_slot) but it's more logical to have it as part of ModifyTableState (mt_partition_tuple_slot) and CopyState (partition_tuple_slot). Discussion: http://postgr.es/m/1bd459d9-4c0c-197a-346e-e5e59e217d97@lab.ntt.co.jp Amit Langote, per a gripe from me
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Remove unnecessary casts of makeNode() resultPeter Eisentraut2016-12-23
| | | | | makeNode() is already a macro that has the right result pointer type, so casting it again to the same type is unnecessary.
* Spellcheck: s/descendent/descendant/gTom Lane2016-12-23
| | | | | | | | | | I got a little annoyed by reading documentation paragraphs containing both spellings within a few lines of each other. My dictionary says "descendant" is the preferred spelling, and it's certainly the majority usage in our tree, so standardize on that. For one usage in parallel.sgml, I thought it better to rewrite to avoid the term altogether.
* Fix tuple routing in cases where tuple descriptors don't match.Robert Haas2016-12-22
| | | | | | | | | | | | | | | The previous coding failed to work correctly when we have a multi-level partitioned hierarchy where tables at successive levels have different attribute numbers for the partition key attributes. To fix, have each PartitionDispatch object store a standalone TupleTableSlot initialized with the TupleDesc of the corresponding partitioned table, along with a TupleConversionMap to map tuples from the its parent's rowtype to own rowtype. After tuple routing chooses a leaf partition, we must use the leaf partition's tuple descriptor, not the root table's. To that end, a dedicated TupleTableSlot for tuple routing is now allocated in EState. Amit Langote
* Fix handling of expanded objects in CoerceToDomain and CASE execution.Tom Lane2016-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | When the input value to a CoerceToDomain expression node is a read-write expanded datum, we should pass a read-only pointer to any domain CHECK expressions and then return the original read-write pointer as the expression result. Previously we were blindly passing the same pointer to all the consumers of the value, making it possible for a function in CHECK to modify or even delete the expanded value. (Since a plpgsql function will absorb a passed-in read-write expanded array as a local variable value, it will in fact delete the value on exit.) A similar hazard of passing the same read-write pointer to multiple consumers exists in domain_check() and in ExecEvalCase, so fix those too. The fix requires adding MakeExpandedObjectReadOnly calls at the appropriate places, which is simple enough except that we need to get the data type's typlen from somewhere. For the domain cases, solve this by redefining DomainConstraintRef.tcache as okay for callers to access; there wasn't any reason for the original convention against that, other than not wanting the API of typcache.c to be any wider than it had to be. For CASE, there's no good solution except to add a syscache lookup during executor start. Per bug #14472 from Marcos Castedo. Back-patch to 9.5 where expanded values were introduced. Discussion: https://postgr.es/m/15225.1482431619@sss.pgh.pa.us
* Refactor partition tuple routing code to reduce duplication.Robert Haas2016-12-21
| | | | Amit Langote
* Fix minor oversights in nodeAgg.c.Tom Lane2016-12-20
| | | | | | | | | | | | | aggstate->evalproj is always set up by ExecInitAgg, so there's no need to test. Doing so led Coverity to think that we might be intending "slot" to be possibly NULL here, and it quite properly complained that the rest of combine_aggregates() wasn't prepared for that. Also fix a couple of obvious thinkos in Asserts checking that "inputoff" isn't past the end of the slot. Errors introduced in commit 8ed3f11bb, so no need for back-patch.
* Fix sharing Agg transition state of DISTINCT or ordered aggs.Heikki Linnakangas2016-12-20
| | | | | | | | | | | | | | | If a query contained two aggregates that could share the transition value, we would correctly collect the input into a tuplesort only once, but incorrectly run the transition function over the accumulated input twice, in finalize_aggregates(). That caused a crash, when we tried to call tuplesort_performsort() on an already-freed NULL tuplestore. Backport to 9.6, where sharing of transition state and this bug were introduced. Analysis by Tom Lane. Discussion: https://www.postgresql.org/message-id/ac5b0b69-744c-9114-6218-8300ac920e61@iki.fi
* Provide a DSA area for all parallel queries.Robert Haas2016-12-19
| | | | | | | This will allow future parallel query code to dynamically allocate storage shared by all participants. Thomas Munro, with assorted changes by me.
* Unbreak Finalize HashAggregate over Partial HashAggregate.Robert Haas2016-12-16
| | | | | | | | | | | | | | | Commit 5dfc198146b49ce7ecc8a1fc9d5e171fb75f6ba5 introduced the use of a new type of hash table with linear reprobing for hash aggregates. Such a hash table behaves very poorly if keys are inserted in hash order, which does in fact happen in the case where a query use a Finalize HashAggregate node fed (via Gather) by a Partial HashAggregate node. In fact, queries with this type of plan tend to run effectively forever. Fix that by seeding the hash value differently in each worker (and in the leader, if it participates). Andres Freund and Robert Haas