aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor
Commit message (Collapse)AuthorAge
* Fix EvalPlanQualStart to handle partitioned result rels correctly.Tom Lane2019-04-08
| | | | | | | | | | | | The es_root_result_relations array needs to be shallow-copied in the same way as the main es_result_relations array, else EPQ rechecks on partitioned result relations fail, as seen in bug #15677 from Norbert Benkocs. Amit Langote, isolation test case added by me Discussion: https://postgr.es/m/15677-0bf089579b4cd02d@postgresql.org Discussion: https://postgr.es/m/19321.1554567786@sss.pgh.pa.us
* Clean up side-effects of commits ab5fcf2b0 et al.Tom Lane2019-04-07
| | | | | | | | | | | | | | | | | | | | | | | Before those commits, partitioning-related code in the executor could assume that ModifyTableState.resultRelInfo[] contains only leaf partitions. However, now a fully-pruned update results in a dummy ModifyTable that references the root partitioned table, and that breaks some stuff. In v11, this led to an assertion or core dump in the tuple routing code. Fix by disabling tuple routing, since we don't need that anyway. (I chose to do that in HEAD as well for safety, even though the problem doesn't manifest in HEAD as it stands.) In v10, this confused ExecInitModifyTable's decision about whether it needed to close the root table. But we can get rid of that altogether by being smarter about where to find the root table. Note that since the referenced commits haven't shipped yet, this isn't fixing any bug the field has seen. Amit Langote, per a report from me Discussion: https://postgr.es/m/20710.1554582479@sss.pgh.pa.us
* Fix outdated commentPeter Eisentraut2019-01-19
| | | | | The issue the comment is referring to was fixed by 08859bb5c2cebc132629ca838113d27bb31b990c.
* Allow btree comparison functions to return INT_MIN.Tom Lane2018-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Historically we forbade datatype-specific comparison functions from returning INT_MIN, so that it would be safe to invert the sort order just by negating the comparison result. However, this was never really safe for comparison functions that directly return the result of memcmp(), strcmp(), etc, as POSIX doesn't place any such restriction on those library functions. Buildfarm results show that at least on recent Linux on s390x, memcmp() actually does return INT_MIN sometimes, causing sort failures. The agreed-on answer is to remove this restriction and fix relevant call sites to not make such an assumption; code such as "res = -res" should be replaced by "INVERT_COMPARE_RESULT(res)". The same is needed in a few places that just directly negated the result of memcmp or strcmp. To help find places having this problem, I've also added a compile option to nbtcompare.c that causes some of the commonly used comparators to return INT_MIN/INT_MAX instead of their usual -1/+1. It'd likely be a good idea to have at least one buildfarm member running with "-DSTRESS_SORT_INT_MIN". That's far from a complete test of course, but it should help to prevent fresh introductions of such bugs. This is a longstanding portability hazard, so back-patch to all supported branches. Discussion: https://postgr.es/m/20180928185215.ffoq2xrq5d3pafna@alap3.anarazel.de
* Remove obsolete commentAlvaro Herrera2018-09-25
| | | | | The documented shortcoming was actually fixed in 4c728f3829 so the comment is not true anymore.
* Fix failure in WHERE CURRENT OF after rewinding the referenced cursor.Tom Lane2018-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a case where we have multiple relation-scan nodes in a cursor plan, such as a scan of an inheritance tree, it's possible to fetch from a given scan node, then rewind the cursor and fetch some row from an earlier scan node. In such a case, execCurrent.c mistakenly thought that the later scan node was still active, because ExecReScan hadn't done anything to make it look not-active. We'd get some sort of failure in the case of a SeqScan node, because the node's scan tuple slot would be pointing at a HeapTuple whose t_self gets reset to invalid by heapam.c. But it seems possible that for other relation scan node types we'd actually return a valid tuple TID to the caller, resulting in updating or deleting a tuple that shouldn't have been considered current. To fix, forcibly clear the ScanTupleSlot in ExecScanReScan. Another issue here, which seems only latent at the moment but could easily become a live bug in future, is that rewinding a cursor does not necessarily lead to *immediately* applying ExecReScan to every scan-level node in the plan tree. Upper-level nodes will think that they can postpone that call if their child node is already marked with chgParam flags. I don't see a way for that to happen today in a plan tree that's simple enough for execCurrent.c's search_plan_tree to understand, but that's one heck of a fragile assumption. So, add some logic in search_plan_tree to detect chgParam flags being set on nodes that it descended to/through, and assume that that means we should consider lower scan nodes to be logically reset even if their ReScan call hasn't actually happened yet. Per bug #15395 from Matvey Arye. This has been broken for a long time, so back-patch to all supported branches. Discussion: https://postgr.es/m/153764171023.14986.280404050547008575@wrigleys.postgresql.org
* Fix parsetree representation of XMLTABLE(XMLNAMESPACES(DEFAULT ...)).Tom Lane2018-09-17
| | | | | | | | | | | | | | | | | | | | | | | The original coding for XMLTABLE thought it could represent a default namespace by a T_String Value node with a null string pointer. That's not okay, though; in particular outfuncs.c/readfuncs.c are not on board with such a representation, meaning you'll get a null pointer crash if you try to store a view or rule containing this construct. To fix, change the parsetree representation so that we have a NULL list element, instead of a bogus Value node. This isn't really a functional limitation since default XML namespaces aren't yet implemented in the executor; you'd just get "DEFAULT namespace is not supported" anyway. But crashes are not nice, so back-patch to v10 where this syntax was added. Ordinarily we'd consider a parsetree representation change to be un-backpatchable; but since existing releases would crash on the way to storing such constructs, there can't be any existing views/rules to be incompatible with. Per report from Andrey Lepikhov. Discussion: https://postgr.es/m/3690074f-abd2-56a9-144a-aa5545d7a291@postgrespro.ru
* Fix failure with initplans used conditionally during EvalPlanQual rechecks.Tom Lane2018-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The EvalPlanQual machinery assumes that any initplans (that is, uncorrelated sub-selects) used during an EPQ recheck would have already been evaluated during the main query; this is implicit in the fact that execPlan pointers are not copied into the EPQ estate's es_param_exec_vals. But it's possible for that assumption to fail, if the initplan is only reached conditionally. For example, a sub-select inside a CASE expression could be reached during a recheck when it had not been previously, if the CASE test depends on a column that was just updated. This bug is old, appearing to date back to my rewrite of EvalPlanQual in commit 9f2ee8f28, but was not detected until Kyle Samson reported a case. To fix, force all not-yet-evaluated initplans used within the EPQ plan subtree to be evaluated at the start of the recheck, before entering the EPQ environment. This could be inefficient, if such an initplan is expensive and goes unused again during the recheck --- but that's piling one layer of improbability atop another. It doesn't seem worth adding more complexity to prevent that, at least not in the back branches. It was convenient to use the new-in-v11 ExecEvalParamExecParams function to implement this, but I didn't like either its name or the specifics of its API, so revise that. Back-patch all the way. Rather than rewrite the patch to avoid depending on bms_next_member() in the oldest branches, I chose to back-patch that function into 9.4 and 9.3. (This isn't the first time back-patches have needed that, and it exhausted my patience.) I also chose to back-patch some test cases added by commits 71404af2a and 342a1ffa2 into 9.4 and 9.3, so that the 9.x versions of eval-plan-qual.spec are all the same. Andrew Gierth diagnosed the problem and contributed the added test cases, though the actual code changes are by me. Discussion: https://postgr.es/m/A033A40A-B234-4324-BE37-272279F7B627@tripadvisor.com
* Save/restore SPI's global variables in SPI_connect() and SPI_finish().Tom Lane2018-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes two sources of interference between nominally independent functions when one SPI-using function calls another, perhaps without knowing that it does so. Chapman Flack pointed out that xml.c's query_to_xml_internal() expects SPI_tuptable and SPI_processed to stay valid across datatype output function calls; but it's possible that such a call could involve re-entrant use of SPI. It seems likely that there are similar hazards elsewhere, if not in the core code then in third-party SPI users. Previously SPI_finish() reset SPI's API globals to zeroes/nulls, which would typically make for a crash in such a situation. Restoring them to the values they had at SPI_connect() seems like a considerably more useful behavior, and it still meets the design goal of not leaving any dangling pointers to tuple tables of the function being exited. Also, cause SPI_connect() to reset these variables to zeroes/nulls after saving them. This prevents interference in the opposite direction: it's possible that a SPI-using function that's only ever been tested standalone contains assumptions that these variables start out as zeroes. That was the case as long as you were the outermost SPI user, but not so much for an inner user. Now it's consistent. Report and fix suggestion by Chapman Flack, actual patch by me. Back-patch to all supported branches. Discussion: https://postgr.es/m/9fa25bef-2e4f-1c32-22a4-3ad0723c4a17@anastigmatix.net
* Set scan direction appropriately for SubPlans (bug #15336)Andrew Gierth2018-08-17
| | | | | | | | | | | | | | | | | | | | When executing a SubPlan in an expression, the EState's direction field was left alone, resulting in an attempt to execute the subplan backwards if it was encountered during a backwards scan of a cursor. Also, though much less likely, it was possible to reach the execution of an InitPlan while in backwards-scan state. Repair by saving/restoring estate->es_direction and forcing forward scan mode in the relevant places. Backpatch all the way, since this has been broken since 8.3 (prior to commit c7ff7663e, SubPlans had their own EStates rather than sharing the parent plan's, so there was no confusion over scan direction). Per bug #15336 reported by Vladimir Baranoff; analysis and patch by me, review by Tom Lane. Discussion: https://postgr.es/m/153449812167.1304.1741624125628126322@wrigleys.postgresql.org
* Adjust comment atop ExecShutdownNode.Amit Kapila2018-08-13
| | | | | | | | | After commits a315b967cc and b805b63ac2, part of the comment atop ExecShutdownNode is redundant. Adjust it. Author: Amit Kapila Backpatch-through: 10 where both the mentioned commits are present. Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info
* Prohibit shutting down resources if there is a possibility of back up.Amit Kapila2018-08-13
| | | | | | | | | | | | | | | Currently, we release the asynchronous resources as soon as it is evident that no more rows will be needed e.g. when a Limit is filled. This can be problematic especially for custom and foreign scans where we can scan backward. Fix that by disallowing the shutting down of resources in such cases. Reported-by: Robert Haas Analysed-by: Robert Haas and Amit Kapila Author: Amit Kapila Reviewed-by: Robert Haas Backpatch-through: 9.6 where this code was introduced Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info
* Avoid query-lifetime memory leaks in XMLTABLE (bug #15321)Andrew Gierth2018-08-13
| | | | | | | | | | | | | | | | | | | Multiple calls to XMLTABLE in a query (e.g. laterally applying it to a table with an xml column, an important use-case) were leaking large amounts of memory into the per-query context, blowing up memory usage. Repair by reorganizing memory context usage in nodeTableFuncscan; use the usual per-tuple context for row-by-row evaluations instead of perValueCxt, and use the explicitly created context -- renamed from perValueCxt to perTableCxt -- for arguments and state for each individual table-generation operation. Backpatch to PG10 where this code was introduced. Original report by IRC user begriffs; analysis and patch by me. Reviewed by Tom Lane and Pavel Stehule. Discussion: https://postgr.es/m/153394403528.10284.7530399040974170549@wrigleys.postgresql.org
* Fix buffer usage stats for parallel nodes.Amit Kapila2018-08-03
| | | | | | | | | | | | | | | | | | | | | | | | The buffer usage stats is accounted only for the execution phase of the node. For Gather and Gather Merge nodes, such stats are accumulated at the time of shutdown of workers which is done after execution of node due to which we missed to account them for such nodes. Fix it by treating nodes as running while we shut down them. We can also miss accounting for a Limit node when Gather or Gather Merge is beneath it, because it can finish the execution before shutting down such nodes. So we allow a Limit node to shut down the resources before it completes the execution. In the passing fix the gather node code to allow workers to shut down as soon as we find that all the tuples from the workers have been retrieved. The original code use to do that, but is accidently removed by commit 01edb5c7fc. Reported-by: Adrien Nayrat Author: Amit Kapila and Robert Haas Reviewed-by: Robert Haas and Andres Freund Backpatch-through: 9.6 where this code was introduced Discussion: https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info
* Match the buffer usage tracking for leader and worker backends.Amit Kapila2018-08-03
| | | | | | | | | | | | | In the leader backend, we don't track the buffer usage for ExecutorStart phase whereas in worker backend we track it for ExecutorStart phase as well. This leads to different value for buffer usage stats for the parallel and non-parallel query. Change the code so that worker backend also starts tracking buffer usage after ExecutorStart. Author: Amit Kapila and Robert Haas Reviewed-by: Robert Haas and Andres Freund Backpatch-through: 9.6 where this code was introduced Discussion:https://postgr.es/m/86137f17-1dfb-42f9-7421-82fd786b04a1@anayrat.info
* Rephrase a few comments for clarity.Heikki Linnakangas2018-07-19
| | | | | | | | I was confused by what "intended to be parallel serially" meant, until Robert Haas and David G. Johnston explained it. Rephrase the comment to make it more clear, using David's suggested wording. Discussion: https://www.postgresql.org/message-id/1fec9022-41e8-e484-70ce-2179b08c2092%40iki.fi
* Fix misc typos, mostly in comments.Heikki Linnakangas2018-07-18
| | | | | | | | A collection of typos I happened to spot while reading code, as well as grepping for common mistakes. Backpatch to all supported versions, as applicable, to avoid conflicts when backporting other commits in the future.
* Fix thinko in comments.Amit Kapila2018-06-27
| | | | | | | | | A slot can not be stored in a tuple but it's vice versa. Reported-by: Ashutosh Bapat Author: Ashutosh Bapat Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/CAFjFpRcHhNhXdegyJv3KKDWrwO1_NB_KYZM_ZSDeMOZaL1A5jQ@mail.gmail.com
* Fix typoPeter Eisentraut2018-06-08
|
* Fix obsolete comment.Heikki Linnakangas2018-06-07
| | | | | | | | The 'orig_slot' argument was removed in commit c0a8ae7be392, but that commit forgot to update the comment. Author: Amit Langote Discussion: https://www.postgresql.org/message-id/194ac4bf-7b4a-c887-bf26-bc1a85ea995a@lab.ntt.co.jp
* Fix type checking for support functions of parallel VARIADIC aggregates.Tom Lane2018-05-15
| | | | | | | | | | | | | | | | | | | | | The impact of VARIADIC on the combine/serialize/deserialize support functions of an aggregate wasn't thought through carefully. There is actually no impact, because variadicity isn't passed through to these functions (and it doesn't seem like it would need to be). However, lookup_agg_function was mistakenly told to check things as though it were passed through. The net result was that it was impossible to declare an aggregate that had both VARIADIC input and parallelism support functions. In passing, fix a runtime check in nodeAgg.c for the combine function's strictness to make its error message agree with the creation-time check. The previous message was actually backwards, and it doesn't seem like there's a good reason to have two versions of this message text anyway. Back-patch to 9.6 where parallel aggregation was introduced. Alexey Bashtanov; message fix by me Discussion: https://postgr.es/m/f86dde87-fef4-71eb-0480-62754aaca01b@imap.cc
* Fix state reversal after partition tuple routingAlvaro Herrera2018-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | We make some changes to ModifyTableState and the EState it uses whenever we route tuples to partitions; but we weren't restoring properly in all cases, possibly causing crashes when partitions with different tuple descriptors are targeted by tuples inserted in the same command. Refactor some code, creating ExecPrepareTupleRouting, to encapsulate the needed state changing logic, and have it invoked one level above its current place (ie. put it in ExecModifyTable instead of ExecInsert); this makes it all more readable. Add a test case to exercise this. We don't support having views as partitions; and since only views can have INSTEAD OF triggers, there is no point in testing for INSTEAD OF when processing insertions into a partitioned table. Remove code that appears to support this (but which is actually never relevant.) In passing, fix location of some very confusing comments in ModifyTableState. Reported-by: Amit Langote Author: Etsuro Fujita, Amit Langote Discussion: https://postgr/es/m/0473bf5c-57b1-f1f7-3d58-455c2230bc5f@lab.ntt.co.jp
* Fix WHERE CURRENT OF when the referenced cursor uses an index-only scan.Tom Lane2018-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | "UPDATE/DELETE WHERE CURRENT OF cursor_name" failed, with an error message like "cannot extract system attribute from virtual tuple", if the cursor was using a index-only scan for the target table. Fix it by digging the current TID out of the indexscan state. It seems likely that the same failure could occur for CustomScan plans and perhaps some FDW plan types, so that leaving this to be treated as an internal error with an obscure message isn't as good an idea as it first seemed. Hence, add a bit of heaptuple.c infrastructure to let us deliver a more on-topic message. I chose to make the message match what you get for the case where execCurrentOf can't identify the target scan node at all, "cursor "foo" is not a simply updatable scan of table "bar"". Perhaps it should be different, but we can always adjust that later. In the future, it might be nice to provide hooks that would let custom scan providers and/or FDWs deal with this in other ways; but that's not a suitable topic for a back-patchable bug fix. It's been like this all along, so back-patch to all supported branches. Yugo Nagata and Tom Lane Discussion: https://postgr.es/m/20180201013349.937dfc5f.nagata@sraoss.co.jp
* Fix query-lifespan memory leakage in repeatedly executed hash joins.Tom Lane2018-03-16
| | | | | | | | | | | | | | | ExecHashTableCreate allocated some memory that wasn't freed by ExecHashTableDestroy, specifically the per-hash-key function information. That's not a huge amount of data, but if one runs a query that repeats a hash join enough times, it builds up. Fix by arranging for the data in question to be kept in the hashtable's hashCxt instead of leaving it "loose" in the query-lifespan executor context. (This ensures that we'll also clean up anything that the hash functions allocate in fn_mcxt.) Per report from Amit Khandekar. It's been like this forever, so back-patch to all supported branches. Discussion: https://postgr.es/m/CAJ3gD9cFofAWGvcxLOxDHC=B0hjtW8yGmUsF2hdGh97CM38=7g@mail.gmail.com
* Use the correct tuplestore read pointer in a NamedTuplestoreScan.Tom Lane2018-02-27
| | | | | | | | | | | Tom Kazimiers reported that transition tables don't work correctly when they are scanned by more than one executor node. That's because commit 18ce3a4ab allocated separate read pointers for each executor node, as it must, but failed to make them active at the appropriate times. Repair. Thomas Munro Discussion: https://postgr.es/m/20180224034748.bixarv6632vbxgeb%40dewberry.localdomain
* Fix misbehavior of CTE-used-in-a-subplan during EPQ rechecks.Tom Lane2018-02-19
| | | | | | | | | | | | | | | | | | An updating query that reads a CTE within an InitPlan or SubPlan could get incorrect results if it updates rows that are concurrently being modified. This is caused by CteScanNext supposing that nothing inside its recursive ExecProcNode call could change which read pointer is selected in the CTE's shared tuplestore. While that's normally true because of scoping considerations, it can break down if an EPQ plan tree gets built during the call, because EvalPlanQualStart builds execution trees for all subplans whether they're going to be used during the recheck or not. And it seems like a pretty shaky assumption anyway, so let's just reselect our own read pointer here. Per bug #14870 from Andrei Gorita. This has been broken since CTEs were implemented, so back-patch to all supported branches. Discussion: https://postgr.es/m/20171024155358.1471.82377@wrigleys.postgresql.org
* Improve bit perturbation in TupleHashTableHash.Andres Freund2018-01-29
| | | | | | | | | | | | | | | | The changes in b81b5a96f424531b97cdd1dba97d9d1b9c9d372e did not fully address the issue, because the bit-mixing of the IV into the final hash-key didn't prevent clustering in the input-data survive in the output data. This didn't cause a lot of problems because of the additional growth conditions added d4c62a6b623d6eef88218158e9fa3cf974c6c7e5. But as we want to rein those in due to explosive growth in some edges, this needs to be fixed. Author: Andres Freund Discussion: https://postgr.es/m/20171127185700.1470.20362@wrigleys.postgresql.org Backpatch: 10, where simplehash was introduced
* Avoid crash during EvalPlanQual recheck of an inner indexscan.Tom Lane2018-01-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 09529a70b changed nodeIndexscan.c and nodeIndexonlyscan.c to postpone initialization of the indexscan proper until the first tuple fetch. It overlooked the question of mark/restore behavior, which means that if some caller attempts to mark the scan before the first tuple fetch, you get a null pointer dereference. The only existing user of mark/restore is nodeMergejoin.c, which (somewhat accidentally) will never attempt to set a mark before the first inner tuple unless the inner child node is a Material node. Hence the case can't arise normally, so it seems sufficient to document the assumption at both ends. However, during an EvalPlanQual recheck, ExecScanFetch doesn't call IndexNext but just returns the jammed-in test tuple. Therefore, if we're doing a recheck in a plan tree with a mergejoin with inner indexscan, it's possible to reach ExecIndexMarkPos with iss_ScanDesc still null, as reported by Guo Xiang Tan in bug #15032. Really, when there's a test tuple supplied during an EPQ recheck, touching the index at all is the wrong thing: rather, the behavior of mark/restore ought to amount to saving and restoring the es_epqScanDone flag. We can avoid finding a place to actually save the flag, for the moment, because given the assumption that no caller will set a mark before fetching a tuple, es_epqScanDone must always be set by the time we try to mark. So the actual behavior change required is just to not reach the index access if a test tuple is supplied. The set of plan node types that need to consider this issue are those that support EPQ test tuples (i.e., call ExecScan()) and also support mark/restore; which is to say, IndexScan, IndexOnlyScan, and perhaps CustomScan. It's tempting to try to fix the problem in one place by teaching ExecMarkPos() itself about EPQ; but ExecMarkPos supports some plan types that aren't Scans, and also it seems risky to make assumptions about what a CustomScan wants to do here. Also, the most likely future change here is to decide that we do need to support marks placed before the first tuple, which would require additional work in IndexScan and IndexOnlyScan in any case. Hence, fix the EPQ issue in nodeIndexscan.c and nodeIndexonlyscan.c, accepting the small amount of code duplicated thereby, and leave it to CustomScan providers to fix this bug if they have it. Back-patch to v10 where commit 09529a70b came in. In earlier branches, the index_markpos() call is a waste of cycles when EPQ is active, but no more than that, so it doesn't seem appropriate to back-patch further. Discussion: https://postgr.es/m/20180126074932.3098.97815@wrigleys.postgresql.org
* When passing query strings to workers, pass the terminating \0.Robert Haas2017-12-20
| | | | | | | | | Otherwise, when the query string is read, we might trailing garbage beyond the end, unless there happens to be a \0 there by good luck. Report and patch by Thomas Munro. Reviewed by Rafia Sabih. Discussion: http://postgr.es/m/CAEepm=2SJs7X+_vx8QoDu8d1SMEOxtLhxxLNzZun_BvNkuNhrw@mail.gmail.com
* Try again to fix accumulation of parallel worker instrumentation.Robert Haas2017-12-19
| | | | | | | | | | | | | | | | | When a Gather or Gather Merge node is started and stopped multiple times, accumulate instrumentation data only once, at the end, instead of after each execution, to avoid recording inflated totals. Commit 778e78ae9fa51e58f41cbdc72b293291d02d8984, the previous attempt at a fix, instead reset the state after every execution, which worked for the general instrumentation data but had problems for the additional instrumentation specific to Sort and Hash nodes. Report by hubert depesz lubaczewski. Analysis and fix by Amit Kapila, following a design proposal from Thomas Munro, with a comment tweak by me. Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com
* Fix crashes on plans with multiple Gather (Merge) nodes.Robert Haas2017-12-18
| | | | | | | | | | | | | | | | es_query_dsa turns out to be broken by design, because it supposes that there is only one DSA for the whole query, whereas there is actually one per Gather (Merge) node. For now, work around that problem by setting and clearing the pointer around the sections of code that might need it. It's probably a better idea to get rid of es_query_dsa altogether in favor of having each node keep track individually of which DSA is relevant, but that seems like more than we would want to back-patch. Thomas Munro, reviewed and tested by Andreas Seltenreich, Amit Kapila, and by me. Discussion: http://postgr.es/m/CAEepm=1U6as=brnVvMNixEV2tpi8NuyQoTmO8Qef0-VV+=7MDA@mail.gmail.com
* Revert "Fix accumulation of parallel worker instrumentation."Robert Haas2017-12-13
| | | | | | | This reverts commit 778e78ae9fa51e58f41cbdc72b293291d02d8984. Per further discussion, that doesn't seem to be the best possible fix. Discussion: http://postgr.es/m/CAA4eK1LW2aFKzY3=vwvc=t-juzPPVWP2uT1bpx_MeyEqnM+p8g@mail.gmail.com
* Fix commentPeter Eisentraut2017-12-11
| | | | Reported-by: Noah Misch <noah@leadboat.com>
* Fix corner-case coredump in _SPI_error_callback().Tom Lane2017-12-11
| | | | | | | | | | | I noticed that _SPI_execute_plan initially sets spierrcontext.arg = NULL, and only fills it in some time later. If an error were to happen in between, _SPI_error_callback would try to dereference the null pointer. This is unlikely --- there's not much between those points except push-snapshot calls --- but it's clearly not impossible. Tweak the callback to do nothing if the pointer isn't set yet. It's been like this for awhile, so back-patch to all supported branches.
* Fix accumulation of parallel worker instrumentation.Robert Haas2017-12-05
| | | | | | | | | | | | | When a Gather or Gather Merge node is started and stopped multiple times, the old code wouldn't reset the shared state between executions, potentially resulting in dramatically inflated instrumentation data for nodes beneath it. (The per-worker instrumentation ended up OK, I think, but the overall totals were inflated.) Report by hubert depesz lubaczewski. Analysis and fix by Amit Kapila, reviewed and tweaked a bit by me. Discussion: http://postgr.es/m/20171127175631.GA405@depesz.com
* Teach bitmap heap scan to cope with absence of a DSA.Robert Haas2017-11-28
| | | | | | | | | | | | If we have a plan that uses parallelism but are unable to execute it using parallelism, for example due to a lack of available DSM segments, then the EState's es_query_dsa will be NULL. Parallel bitmap heap scan needs to fall back to a non-parallel scan in such cases. Patch by me, reviewed by Dilip Kumar Discussion: http://postgr.es/m/CAEepm=0kADK5inNf_KuemjX=HQ=PuTP0DykM--fO5jS5ePVFEA@mail.gmail.com
* Fix creation of resjunk tlist entries for inherited mixed UPDATE/DELETE.Tom Lane2017-11-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rewriteTargetListUD's processing is dependent on the relkind of the query's target table. That was fine at the time it was made to act that way, even for queries on inheritance trees, because all tables in an inheritance tree would necessarily be plain tables. However, the 9.5 feature addition allowing some members of an inheritance tree to be foreign tables broke the assumption that rewriteTargetListUD's output tlist could be applied to all child tables with nothing more than column-number mapping. This led to visible failures if foreign child tables had row-level triggers, and would also break in cases where child tables belonged to FDWs that used methods other than CTID for row identification. To fix, delay running rewriteTargetListUD until after the planner has expanded inheritance, so that it is applied separately to the (already mapped) tlist for each child table. We can conveniently call it from preprocess_targetlist. Refactor associated code slightly to avoid the need to heap_open the target relation multiple times during preprocess_targetlist. (The APIs remain a bit ugly, particularly around the point of which steps scribble on parse->targetList and which don't. But avoiding such scribbling would require a change in FDW callback APIs, which is more pain than it's worth.) Also fix ExecModifyTable to ensure that "tupleid" is reset to NULL when we transition from rows providing a CTID to rows that don't. (That's really an independent bug, but it manifests in much the same cases.) Add a regression test checking one manifestation of this problem, which was that row-level triggers on a foreign child table did not work right. Back-patch to 9.5 where the problem was introduced. Etsuro Fujita, reviewed by Ildus Kurbangaliev and Ashutosh Bapat Discussion: https://postgr.es/m/20170514150525.0346ba72@postgrespro.ru
* Repair failure with SubPlans in multi-row VALUES lists.Tom Lane2017-11-25
| | | | | | | | | | | | | | | | | | | | | | | When nodeValuesscan.c was written, it was impossible to have a SubPlan in VALUES --- any sub-SELECT there would have to be uncorrelated and thereby would produce an InitPlan instead. We therefore took a shortcut in the logic that throws away a ValuesScan's per-row expression evaluation data structures. This was broken by the introduction of LATERAL however; a sub-SELECT containing a lateral reference produces a correlated SubPlan. The cleanest fix for this would be to give up the optimization of discarding the expression eval state. But that still seems pretty unappetizing for long VALUES lists. It seems to work to just prevent the subexpressions from hooking into the ValuesScan node's subPlan list, so let's do that and see how well it works. (If this breaks, due to additional connections between the subexpressions and the outer query structures, we might consider compromises like throwing away data only for VALUES rows not containing SubPlans.) Per bug #14924 from Christian Duta. Back-patch to 9.3 where LATERAL was introduced. Discussion: https://postgr.es/m/20171124120836.1463.5310@wrigleys.postgresql.org
* Fix handling of NULLs returned by aggregate combine functions.Andres Freund2017-11-23
| | | | | | | | | | | | | | | | | | | | | | | When strict aggregate combine functions, used in multi-stage/parallel aggregation, returned NULL, we didn't check for that, invoking the combine function with NULL the next round, despite it being strict. The equivalent code invoking normal transition functions has a check for that situation, which did not get copied in a7de3dc5c346. Fix the bug by adding the equivalent check. Based on a quick look I could not find any strict combine functions in core actually returning NULL, and it doesn't seem very likely external users have done so. So this isn't likely to have caused issues in practice. Add tests verifying transition / combine functions returning NULL is tested. Reported-By: Andres Freund Author: Andres Freund Discussion: https://postgr.es/m/20171121033642.7xvmjqrl4jdaaat3@alap3.anarazel.de Backpatch: 9.6, where parallel aggregation was introduced
* Revert bogus fixes of HOT-freezing bugAlvaro Herrera2017-11-02
| | | | | | | | | | It turns out we misdiagnosed what the real problem was. Revert the previous changes, because they may have worse consequences going forward. A better fix is forthcoming. The simplistic test case is kept, though disabled. Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
* Fix mistaken failure to allow parallelism in corner case.Robert Haas2017-10-27
| | | | | | | | | | | | | | | If we try to run a parallel plan in serial mode because, for example, it's going to be scanned via a cursor, but for some reason we're already in parallel mode (for example because an outer query is running in parallel), we'd incorrectly try to launch workers. Fix by adding a flag to the EState, so that we can be certain that ExecutePlan() and ExecGather()/ExecGatherMerge() will have the same idea about whether we are executing serially or in parallel. Report and fix by Amit Kapila with help from Kuntal Ghosh. A few tweaks by me. Discussion: http://postgr.es/m/CAA4eK1+_BuZrmVCeua5Eqnm4Co9DAXdM5HPAOE2J19ePbR912Q@mail.gmail.com
* Repair breakage of aggregate FILTER option.Tom Lane2017-10-16
| | | | | | | | | | | | | | | | | An aggregate's input expression(s) are not supposed to be evaluated at all for a row where its FILTER test fails ... but commit 8ed3f11bb overlooked that requirement. Reshuffle so that aggregates having a filter clause evaluate their arguments separately from those without. This still gets the benefit of doing only one ExecProject in the common case of multiple Aggrefs, none of which have filters. While at it, arrange for filter clauses to be included in the common ExecProject evaluation, thus perhaps buying a little bit even when there are filters. Back-patch to v10 where the bug was introduced. Discussion: https://postgr.es/m/30065.1508161354@sss.pgh.pa.us
* Restore nodeAgg.c's ability to check for improperly-nested aggregates.Tom Lane2017-10-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | While poking around in the aggregate logic, I noticed that commit 8ed3f11bb broke the logic in nodeAgg.c that purports to detect nested aggregates, by moving initialization of regular aggregate argument expressions out of the code segment that checks for that. You could argue that this check is unnecessary, but it's not much code so I'm inclined to keep it as a backstop against parser and planner bugs. However, there's certainly zero value in checking only some of the subexpressions. We can make the check complete again, and as a bonus make it a good deal more bulletproof against future mistakes of the same ilk, by moving it out to the outermost level of ExecInitAgg. This means we need to check only once per Agg node not once per aggregate, which also seems like a good thing --- if the check does find something wrong, it's not urgent that we report it before the plan node initialization finishes. Since this requires remembering the original length of the aggs list, I deleted a long-obsolete stanza that changed numaggs from 0 to 1. That's so old it predates our decision that palloc(0) is a valid operation, in (digs...) 2004, see commit 24a1e20f1. In passing improve a few comments. Back-patch to v10, just in case.
* Fix AggGetAggref() so it won't lie to aggregate final functions.Tom Lane2017-10-12
| | | | | | | | | | | | | | | | | | | | | | If we merge the transition calculations for two different aggregates, it's reasonable to assume that the transition function should not care which of those Aggref structs it gets from AggGetAggref(). It is not reasonable to make the same assumption about an aggregate final function, however. Commit 804163bc2 broke this, as it will pass whichever Aggref was first associated with the transition state in both cases. This doesn't create an observable bug so far as the core system is concerned, because the only existing uses of AggGetAggref() are in ordered-set aggregates that happen to not pay attention to anything but the input properties of the Aggref; and besides that, we disabled sharing of transition calculations for OSAs yesterday. Nonetheless, if some third-party code were using AggGetAggref() in a normal aggregate, they would be entitled to call this a bug. Hence, back-patch the fix to 9.6 where the problem was introduced. In passing, improve some of the comments about transition state sharing. Discussion: https://postgr.es/m/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com
* Fix logical replication to fire BEFORE ROW DELETE triggers.Robert Haas2017-10-12
| | | | | | | | | | Before, that would fail to happen unless a BEFORE ROW UPDATE trigger was also present. Noted by me while reviewing a patch from Masahiko Sawada, who also wrote this patch. Reviewed by Petr Jelinek. Discussion: http://postgr.es/m/CA+TgmobAZvCxduG8y_mQKBK7nz-vhbdLvjM354KEFozpuzMN5A@mail.gmail.com
* Prevent sharing transition states between ordered-set aggregates.Tom Lane2017-10-11
| | | | | | | | | | | | | | | | | | | | | | | | This ought to work, but the built-in OSAs are not capable of coping, because their final-functions destructively modify their transition state (specifically, the contained tuplesort object). That was fine when those functions were written, but commit 804163bc2 moved the goalposts without telling orderedsetaggs.c. We should fix the built-in OSAs to support this, but it will take a little work, especially if we don't want to sacrifice performance in the normal non-shared-state case. Given that it took a year after 9.6 release for anyone to notice this bug, we should not prioritize sharable-state over nonsharable-state performance. And a proper fix is likely to be more complicated than we'd want to back-patch, too. Therefore, let's just put in this stop-gap patch to prevent nodeAgg.c from choosing to use shared state for OSAs. We can revert it in HEAD when we get a better fix. Report from Lukas Eder, diagnosis by me, patch by David Rowley. Back-patch to 9.6 where the problem was introduced. Discussion: https://postgr.es/m/CAB4ELO5RZhOamuT9Xsf72ozbenDLLXZKSk07FiSVsuJNZB861A@mail.gmail.com
* Fix mistakes in comments.Robert Haas2017-10-11
| | | | | | Masahiko Sawada Discussion: http://postgr.es/m/CAD21AoBsfYsMHD6_SL9iN3n_Foaa+oPbL5jG55DxU1ChaujqwQ@mail.gmail.com
* Fix crash when logical decoding is invoked from a PL function.Tom Lane2017-10-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The logical decoding functions do BeginInternalSubTransaction and RollbackAndReleaseCurrentSubTransaction to clean up after themselves. It turns out that AtEOSubXact_SPI has an unrecognized assumption that we always need to cancel the active SPI operation in the SPI context that surrounds the subtransaction (if there is one). That's true when the RollbackAndReleaseCurrentSubTransaction call is coming from the SPI-using function itself, but not when it's happening inside some unrelated function invoked by a SPI query. In practice the affected callers are the various PLs. To fix, record the current subtransaction ID when we begin a SPI operation, and clean up only if that ID is the subtransaction being canceled. Also, remove AtEOSubXact_SPI's assertion that it must have cleaned up the surrounding SPI context's active tuptable. That's proven wrong by the same test case. Also clarify (or, if you prefer, reinterpret) the calling conventions for _SPI_begin_call and _SPI_end_call. The memory context cleanup in the latter means that these have always had the flavor of a matched resource-management pair, but they weren't documented that way before. Per report from Ben Chobot. Back-patch to 9.4 where logical decoding came in. In principle, the SPI changes should go all the way back, since the problem dates back to commit 7ec1c5a86. But given the lack of field complaints it seems few people are using internal subtransactions in this way. So I don't feel a need to take any risks in 9.2/9.3. Discussion: https://postgr.es/m/73FBA179-C68C-4540-9473-71E865408B15@silentmedia.com
* Fix intra-query memory leakage in nodeProjectSet.c.Tom Lane2017-10-06
| | | | | | | | | | | | Both ExecMakeFunctionResultSet() and evaluation of simple expressions need to be done in the per-tuple memory context, not per-query, else we leak data until end of query. This is a consideration that was missed while refactoring code in the ProjectSet patch (note that in pre-v10, ExecMakeFunctionResult is called in the per-tuple context). Per bug #14843 from Ben M. Diagnosed independently by Andres and myself. Discussion: https://postgr.es/m/20171005230321.28561.15927@wrigleys.postgresql.org
* Fix traversal of half-frozen update chainsAlvaro Herrera2017-10-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When some tuple versions in an update chain are frozen due to them being older than freeze_min_age, the xmax/xmin trail can become broken. This breaks HOT (and probably other things). A subsequent VACUUM can break things in more serious ways, such as leaving orphan heap-only tuples whose root HOT redirect items were removed. This can be seen because index creation (or REINDEX) complain like ERROR: XX000: failed to find parent tuple for heap-only tuple at (0,7) in table "t" Because of relfrozenxid contraints, we cannot avoid the freezing of the early tuples, so we must cope with the results: whenever we see an Xmin of FrozenTransactionId, consider it a match for whatever the previous Xmax value was. This problem seems to have appeared in 9.3 with multixact changes, though strictly speaking it seems unrelated. Since 9.4 we have commit 37484ad2a "Change the way we mark tuples as frozen", so the fix is simple: just compare the raw Xmin (still stored in the tuple header, since freezing merely set an infomask bit) to the Xmax. But in 9.3 we rewrite the Xmin value to FrozenTransactionId, so the original value is lost and we have nothing to compare the Xmax with. To cope with that case we need to compare the Xmin with FrozenXid, assume it's a match, and hope for the best. Sadly, since you can pg_upgrade a 9.3 instance containing half-frozen pages to newer releases, we need to keep the old check in newer versions too, which seems a bit brittle; I hope we can somehow get rid of that. I didn't optimize the new function for performance. The new coding is probably a bit slower than before, since there is a function call rather than a straight comparison, but I'd rather have it work correctly than be fast but wrong. This is a followup after 20b655224249 fixed a few related problems. Apparently, in 9.6 and up there are more ways to get into trouble, but in 9.3 - 9.5 I cannot reproduce a problem anymore with this patch, so there must be a separate bug. Reported-by: Peter Geoghegan Diagnosed-by: Peter Geoghegan, Michael Paquier, Daniel Wood, Yi Wen Wong, Álvaro Discussion: https://postgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com