aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor
Commit message (Collapse)AuthorAge
* Fix failure to handle conflicts in non-arbiter exclusion constraints.Tom Lane2016-07-04
| | | | | | | | | | | | | | | | | | | ExecInsertIndexTuples treated an exclusion constraint as subject to noDupErr processing even when it was not listed in arbiterIndexes, and would therefore not error out for a conflict in such a constraint, instead returning it as an arbiter-index failure. That led to an infinite loop in ExecInsert, since ExecCheckIndexConstraints ignored the index as-intended and therefore didn't throw the expected error. To fix, make the exclusion constraint code path use the same condition as the index_insert call does to decide whether no-error-for-duplicates behavior is appropriate. While at it, refactor a little bit to avoid unnecessary list_member_oid calls. (That surely wouldn't save anything worth noticing, but I find the code a bit clearer this way.) Per bug report from Heikki Rauhala. Back-patch to 9.5 where ON CONFLICT was introduced. Report: <4C976D6B-76B4-434C-8052-D009F7B7AEDA@reaktor.fi>
* Rethink node-level representation of partial-aggregation modes.Tom Lane2016-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | The original coding had three separate booleans representing partial aggregation behavior, which was confusing, unreadable, and error-prone, not least because the booleans weren't always listed in the same order. It was also inadequate for the allegedly-desirable future extension to support intermediate partial aggregation, because we'd need separate markers for serialization and deserialization in such a case. Merge these bools into an enum "AggSplit" to provide symbolic names for the supported operating modes (and document what those are). By assigning the values of the enum constants carefully, we can treat AggSplit values as options bitmasks so that tests of what to do aren't noticeably more expensive than before. While at it, get rid of Aggref.aggoutputtype. That's not needed since commit 59a3795c2 got rid of setrefs.c's special-purpose Aggref comparison code, and it likewise seemed more confusing than helpful. Assorted comment cleanup as well (there's still more that I want to do in that line). catversion bump for change in Aggref node contents. Should be the last one for partial-aggregation changes. Discussion: <29309.1466699160@sss.pgh.pa.us>
* Fix type-safety problem with parallel aggregate serial/deserialization.Tom Lane2016-06-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original specification for this called for the deserialization function to have signature "deserialize(serialtype) returns transtype", which is a security violation if transtype is INTERNAL (which it always would be in practice) and serialtype is not (which ditto). The patch blithely overrode the opr_sanity check for that, which was sloppy-enough work in itself, but the indisputable reason this cannot be allowed to stand is that CREATE FUNCTION will reject such a signature and thus it'd be impossible for extensions to create parallelizable aggregates. The minimum fix to make the signature type-safe is to add a second, dummy argument of type INTERNAL. But to lock it down a bit more and make misuse of INTERNAL-accepting functions less likely, let's get rid of the ability to specify a "serialtype" for an aggregate and just say that the only useful serialtype is BYTEA --- which, in practice, is the only interesting value anyway, due to the usefulness of the send/recv infrastructure for this purpose. That means we only have to allow "serialize(internal) returns bytea" and "deserialize(bytea, internal) returns internal" as the signatures for these support functions. In passing fix bogus signature of int4_avg_combine, which I found thanks to adding an opr_sanity check on combinefunc signatures. catversion bump due to removing pg_aggregate.aggserialtype and adjusting signatures of assorted built-in functions. David Rowley and Tom Lane Discussion: <27247.1466185504@sss.pgh.pa.us>
* Fix handling of argument and result datatypes for partial aggregation.Tom Lane2016-06-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing partial aggregation, the args list of the upper (combining) Aggref node is replaced by a Var representing the output of the partial aggregation steps, which has either the aggregate's transition data type or a serialized representation of that. However, nodeAgg.c blindly continued to use the args list as an indication of the user-level argument types. This broke resolution of polymorphic transition datatypes at executor startup (though it accidentally failed to fail for the ANYARRAY case, which is likely the only one anyone had tested). Moreover, the constructed FuncExpr passed to the finalfunc contained completely wrong information, which would have led to bogus answers or crashes for any case where the finalfunc examined that information (which is only likely to be with polymorphic aggregates using a non-polymorphic transition type). As an independent bug, apply_partialaggref_adjustment neglected to resolve a polymorphic transition datatype before assigning it as the output type of the lower-level Aggref node. This again accidentally failed to fail for ANYARRAY but would be unlikely to work in other cases. To fix the first problem, record the user-level argument types in a separate OID-list field of Aggref, and look to that rather than the args list when asking what the argument types were. (It turns out to be convenient to include any "direct" arguments in this list too, although those are not currently subject to being overwritten.) Rather than adding yet another resolve_aggregate_transtype() call to fix the second problem, add an aggtranstype field to Aggref, and store the resolved transition type OID there when the planner first computes it. (By doing this in the planner and not the parser, we can allow the aggregate's transition type to change from time to time, although no DDL support yet exists for that.) This saves nothing of consequence for simple non-polymorphic aggregates, but for polymorphic transition types we save a catalog lookup during executor startup as well as several planner lookups that are new in 9.6 due to parallel query planning. In passing, fix an error that was introduced into count_agg_clauses_walker some time ago: it was applying exprTypmod() to something that wasn't an expression node at all, but a TargetEntry. exprTypmod silently returned -1 so that there was not an obvious failure, but this broke the intended sensitivity of aggregate space consumption estimates to the typmod of varchar and similar data types. This part needs to be back-patched. Catversion bump due to change of stored Aggref nodes. Discussion: <8229.1466109074@sss.pgh.pa.us>
* pgindent run for 9.6Robert Haas2016-06-09
|
* Stop the executor if no more tuples can be sent from worker to leader.Robert Haas2016-06-06
| | | | | | | | | | | | | | | | | | | | | If a Gather node has read as many tuples as it needs (for example, due to Limit) it may detach the queue connecting it to the worker before reading all of the worker's tuples. Rather than let the worker continue to generate and send all of the results, have it stop after sending the next tuple. More could be done here to stop the worker even quicker, but this is about as well as we can hope to do for 9.6. This is in response to a problem report from Andreas Seltenreich. Commit 44339b892a04e94bbb472235882dc6f7023bdc65 should be actually be sufficient to fix that example even without this change, but it seems better to do this, too, since we might otherwise waste quite a large amount of effort in one or more workers. Discussion: CAA4eK1KOKGqmz9bGu+Z42qhRwMbm4R5rfnqsLCNqFs9j14jzEA@mail.gmail.com Amit Kapila
* Properly initialize SortSupport for ORDER BY rechecks in nodeIndexscan.c.Tom Lane2016-06-05
| | | | | | | | | | | | | | | | | | Fix still another bug in commit 35fcb1b3d: it failed to fully initialize the SortSupport states it introduced to allow the executor to re-check ORDER BY expressions containing distance operators. That led to a null pointer dereference if the sortsupport code tried to use ssup_cxt. The problem only manifests in narrow cases, explaining the lack of previous field reports. It requires a GiST-indexable distance operator that lacks SortSupport and is on a pass-by-ref data type, which among core+contrib seems to be only btree_gist's interval opclass; and it requires the scan to be done as an IndexScan not an IndexOnlyScan, which explains how btree_gist's regression test didn't catch it. Per bug #14134 from Jihyun Yu. Peter Geoghegan Report: <20160511154904.2603.43889@wrigleys.postgresql.org>
* Mark read/write expanded values as read-only in ValuesNext(), too.Tom Lane2016-06-03
| | | | | | | | | | | | | | | | | Further thought about bug #14174 motivated me to try the case of a R/W datum being returned from a VALUES list, and sure enough it was broken. Fix that. Also add a regression test case exercising the same scenario for FunctionScan. That's not broken right now, because the function's result will get shoved into a tuplestore between generation and use; but it could easily become broken whenever we get around to optimizing FunctionScan better. There don't seem to be any other places where we put the result of expression evaluation into a virtual tuple slot that could then be the source for Vars of further expression evaluation, so I think this is the end of this bug.
* Mark read/write expanded values as read-only in ExecProject().Tom Lane2016-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a plan node output expression returns an "expanded" datum, and that output column is referenced in more than one place in upper-level plan nodes, we need to ensure that what is returned is a read-only reference not a read/write reference. Otherwise one of the referencing sites could scribble on or even delete the expanded datum before we have evaluated the others. Commit 1dc5ebc9077ab742, which introduced this feature, supposed that it'd be sufficient to make SubqueryScan nodes force their output columns to read-only state. The folly of that was revealed by bug #14174 from Andrew Gierth, and really should have been immediately obvious considering that the planner will happily optimize SubqueryScan nodes out of the plan without any regard for this issue. The safest fix seems to be to make ExecProject() force its results into read-only state; that will cover every case where a plan node returns expression results. Actually we can delegate this to ExecTargetList() since we can recursively assume that plain Vars will not reference read-write datums. That should keep the extra overhead down to something minimal. We no longer need ExecMakeSlotContentsReadOnly(), which was introduced only in support of the idea that just a few plan node types would need to do this. In the future it would be nice to have the planner account for this problem and inject force-to-read-only expression evaluation nodes into only the places where there's a risk of multiple evaluation. That's not a suitable solution for 9.5 or even 9.6 at this point, though. Report: <20160603124628.9932.41279@wrigleys.postgresql.org>
* Fix latent crash in do_text_output_multiline().Tom Lane2016-05-23
| | | | | | | | | | | | | | | | | | | | | do_text_output_multiline() would fail (typically with a null pointer dereference crash) if its input string did not end with a newline. Such cases do not arise in our current sources; but it certainly could happen in future, or in extension code's usage of the function, so we should fix it. To fix, replace "eol += len" with "eol = text + len". While at it, make two cosmetic improvements: mark the input string const, and rename the argument from "text" to "txt" to dodge pgindent strangeness (since "text" is a typedef name). Even though this problem is only latent at present, it seems like a good idea to back-patch the fix, since it's a very simple/safe patch and it's not out of the realm of possibility that we might in future back-patch something that expects sane behavior from do_text_output_multiline(). Per report from Hao Lee. Report: <CAGoxFiFPAGyPAJLcFxTB5cGhTW2yOVBDYeqDugYwV4dEd1L_Ag@mail.gmail.com>
* Use mul_size when multiplying by the number of parallel workers.Robert Haas2016-05-06
| | | | | | | | | | That way, if the result overflows size_t, you'll get an error instead of undefined behavior, which seems like a plus. This also has the effect of casting the number of workers from int to Size, which is better because it's harder to overflow int than size_t. Dilip Kumar reported this issue and provided a patch upon which this patch is based, but his version did use mul_size.
* Tweak a few more things in preparation for upcoming pgindent run.Robert Haas2016-05-03
| | | | | | | | These adjustments adjust code and comments in minor ways to prevent pgindent from mangling them. Among other things, I tried to avoid situations where pgindent would emit "a +b" instead of "a + b", and I tried to avoid having it break up inline comments across multiple lines.
* Tighten up sanity checks for parallel aggregate in execQual.c.Robert Haas2016-04-27
| | | | David Rowley
* Clean up a few parallelism-related things that pgindent wants to mangle.Robert Haas2016-04-27
| | | | | | | | | | | | | | In nodeFuncs.c, pgindent wants to introduce spurious indentation into the definitions of planstate_tree_walker and planstate_walk_subplans. Fix that by spreading the definition out across several lines, similar to what is already done for other walker functions in that file. In execParallel.c, in the definition of SharedExecutorInstrumentation, pgindent wants to insert more whitespace between the type name and the member name. That causes it to mangle comments later on the line. Fix by moving the comments out of line. Now that we have a bit more room, add some more details that may be useful to the next person reading this code.
* Fix typo in commentMagnus Hagander2016-04-26
| | | | Author: Daniel Gustafsson
* Revert no-op changes to BufferGetPage()Kevin Grittner2016-04-20
| | | | | | | | | | | | | | | | | | The reverted changes were intended to force a choice of whether any newly-added BufferGetPage() calls needed to be accompanied by a test of the snapshot age, to support the "snapshot too old" feature. Such an accompanying test is needed in about 7% of the cases, where the page is being used as part of a scan rather than positioning for other purposes (such as DML or vacuuming). The additional effort required for back-patching, and the doubt whether the intended benefit would really be there, have indicated it is best just to rely on developers to do the right thing based on comments and existing usage, as we do with many other conventions. This change should have little or no effect on generated executable code. Motivated by the back-patching pain of Tom Lane and Robert Haas
* Tweak EXPLAIN for parallel query to show workers launched.Robert Haas2016-04-15
| | | | | | | | | The previous display was sort of confusing, because it didn't distinguish between the number of workers that we planned to launch and the number that actually got launched. This has already confused several people, so display both numbers and label them clearly. Julien Rouhaud, reviewed by me.
* Remove trailing commas in enums.Andres Freund2016-04-14
| | | | | These aren't valid C89. Found thanks to gcc's -Wc90-c99-compat. These exist in differing places in most supported branches.
* Fix _SPI_execute_plan() for CREATE TABLE IF NOT EXISTS foo AS ...Tom Lane2016-04-11
| | | | | | | | | When IF NOT EXISTS was added to CREATE TABLE AS, this logic didn't get the memo, possibly resulting in an Assert failure. It looks like there would have been no ill effects in a non-Assert build, though. Back-patch to 9.5 where the IF NOT EXISTS option was added. Stas Kelvich
* Modify BufferGetPage() to prepare for "snapshot too old" featureKevin Grittner2016-04-08
| | | | | | | | | | | This patch is a no-op patch which is intended to reduce the chances of failures of omission once the functional part of the "snapshot too old" patch goes in. It adds parameters for snapshot, relation, and an enum to specify whether the snapshot age check needs to be done for the page at this point. This initial patch passes NULL for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the third. The follow-on patch will change the places where the test needs to be made.
* Revert CREATE INDEX ... INCLUDING ...Teodor Sigaev2016-04-08
| | | | | | It's not ready yet, revert two commits 690c543550b0d2852060c18d270cdb534d339d9a - unstable test output 386e3d7609c49505e079c40c65919d99feb82505 - patch itself
* CREATE INDEX ... INCLUDING (column[, ...])Teodor Sigaev2016-04-08
| | | | | | | | | | Now indexes (but only B-tree for now) can contain "extra" column(s) which doesn't participate in index structure, they are just stored in leaf tuples. It allows to use index only scan by using single index instead of two or more indexes. Author: Anastasia Lubennikova with minor editorializing by me Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
* Copyedit comments and documentation.Noah Misch2016-04-01
|
* Fix bug in aggregate (de)serialization commit.Robert Haas2016-03-29
| | | | | | | | | resulttypeLen and resulttypeByVal must be set correctly when serializing aggregates, not just when finalizing them. This was in David's final patch but I downloaded the wrong version by mistake and failed to spot the error. David Rowley
* Allow aggregate transition states to be serialized and deserialized.Robert Haas2016-03-29
| | | | | | | | | This is necessary infrastructure for supporting parallel aggregation for aggregates whose transition type is "internal". Such values can't be passed between cooperating processes, because they are just pointers. David Rowley, reviewed by Tomas Vondra and by me.
* Fix EvalPlanQual bug when query contains both locked and not-locked rels.Tom Lane2016-03-22
| | | | | | | | | | | | | | | | | In commit afb9249d06f47d7a, we (probably I) made ExecLockRows assign null test tuples to all relations of the query while setting up to do an EvalPlanQual recheck for a newly-updated locked row. This was sheerest brain fade: we should only set test tuples for relations that are lockable by the LockRows node, and in particular empty test tuples are only sensible for inheritance child relations that weren't the source of the current tuple from their inheritance tree. Setting a null test tuple for an unrelated table causes it to return NULLs when it should not, as exhibited in bug #14034 from Bronislav Houdek. To add insult to injury, doing it the wrong way required two loops where one would suffice; so the corrected code is even a bit shorter and faster. Add a regression test case based on his example, and back-patch to 9.5 where the bug was introduced.
* Support parallel aggregation.Robert Haas2016-03-21
| | | | | | | | | Parallel workers can now partially aggregate the data and pass the transition values back to the leader, which can combine the partial results to produce the final answer. David Rowley, based on earlier work by Haribabu Kommi. Reviewed by Álvaro Herrera, Tomas Vondra, Amit Kapila, James Sewell, and me.
* Directly modify foreign tables.Robert Haas2016-03-18
| | | | | | | | | postgres_fdw can now sent an UPDATE or DELETE statement directly to the foreign server in simple cases, rather than sending a SELECT FOR UPDATE statement and then updating or deleting rows one-by-one. Etsuro Fujita, reviewed by Rushabh Lathia, Shigeru Hanada, Kyotaro Horiguchi, Albe Laurenz, Thom Brown, and me.
* Avoid incorrectly indicating exclusion constraint waitStephen Frost2016-03-15
| | | | | | | | | | | | | | | | | INSERT ... ON CONFLICT's precheck may have to wait on the outcome of another insertion, which may or may not itself be a speculative insertion. This wait is not necessarily associated with an exclusion constraint, but was always reported that way in log messages if the wait happened to involve a tuple that had no speculative token. Initially discovered through use of ON CONFLICT DO NOTHING, where spurious references to exclusion constraints in log messages were more likely. Patch by Peter Geoghegan. Reviewed by Julien Rouhaud. Back-patch to 9.5 where INSERT ... ON CONFLICT was added.
* Use repalloc_huge() to enlarge a SPITupleTable's tuple pointer array.Tom Lane2016-03-14
| | | | | | | | | | | Commit 23a27b039d94ba35 widened the rows-stored counters to uint64, but that's academic unless we allow the tuple pointer array to exceed 1GB. (It might be a good idea to provide some other limit on how much storage a SPITupleTable can eat. On the other hand, there are plenty of other ways to drive a backend into swap hell.) Dagfinn Ilmari Mannsåker
* Widen query numbers-of-tuples-processed counters to uint64.Tom Lane2016-03-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch widens SPI_processed, EState's es_processed field, PortalData's portalPos field, FuncCallContext's call_cntr and max_calls fields, ExecutorRun's count argument, PortalRunFetch's result, and the max number of rows in a SPITupleTable to uint64, and deals with (I hope) all the ensuing fallout. Some of these values were declared uint32 before, and others "long". I also removed PortalData's posOverflow field, since that logic seems pretty useless given that portalPos is now always 64 bits. The user-visible results are that command tags for SELECT etc will correctly report tuple counts larger than 4G, as will plpgsql's GET GET DIAGNOSTICS ... ROW_COUNT command. Queries processing more tuples than that are still not exactly the norm, but they're becoming more common. Most values associated with FETCH/MOVE distances, such as PortalRun's count argument and the count argument of most SPI functions that have one, remain declared as "long". It's not clear whether it would be worth promoting those to int64; but it would definitely be a large dollop of additional API churn on top of this, and it would only help 32-bit platforms which seem relatively less likely to see any benefit. Andreas Scherbaum, reviewed by Christian Ullrich, additional hacking by me
* Make the upper part of the planner work by generating and comparing Paths.Tom Lane2016-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've been saying we needed to do this for more than five years, and here it finally is. This patch removes the ever-growing tangle of spaghetti logic that grouping_planner() used to use to try to identify the best plan for post-scan/join query steps. Now, there is (nearly) independent consideration of each execution step, and entirely separate construction of Paths to represent each of the possible ways to do that step. We choose the best Path or set of Paths using the same add_path() logic that's been used inside query_planner() for years. In addition, this patch removes the old restriction that subquery_planner() could return only a single Plan. It now returns a RelOptInfo containing a set of Paths, just as query_planner() does, and the parent query level can use each of those Paths as the basis of a SubqueryScanPath at its level. This allows finding some optimizations that we missed before, wherein a subquery was capable of returning presorted data and thereby avoiding a sort in the parent level, making the overall cost cheaper even though delivering sorted output was not the cheapest plan for the subquery in isolation. (A couple of regression test outputs change in consequence of that. However, there is very little change in visible planner behavior overall, because the point of this patch is not to get immediate planning benefits but to create the infrastructure for future improvements.) There is a great deal left to do here. This patch unblocks a lot of planner work that was basically impractical in the old code structure, such as allowing FDWs to implement remote aggregation, or rewriting plan_set_operations() to allow consideration of multiple implementation orders for set operations. (The latter will likely require a full rewrite of plan_set_operations(); what I've done here is only to fix it to return Paths not Plans.) I have also left unfinished some localized refactoring in createplan.c and planner.c, because it was not necessary to get this patch to a working state. Thanks to Robert Haas, David Rowley, and Amit Kapila for review.
* Minor optimizations based on ParallelContext having nworkers_launched.Robert Haas2016-03-04
| | | | | | | | | | | Originally, we didn't have nworkers_launched, so code that used parallel contexts had to be preprared for the possibility that not all of the workers requested actually got launched. But now we can count on knowing the number of workers that were successfully launched, which can shave off a few cycles and simplify some code slightly. Amit Kapila, reviewed by Haribabu Kommi, per a suggestion from Peter Geoghegan.
* Change the format of the VM fork to add a second bit per page.Robert Haas2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | The new bit indicates whether every tuple on the page is already frozen. It is cleared only when the all-visible bit is cleared, and it can be set only when we vacuum a page and find that every tuple on that page is both visible to every transaction and in no need of any future vacuuming. A future commit will use this new bit to optimize away full-table scans that would otherwise be triggered by XID wraparound considerations. A page which is merely all-visible must still be scanned in that case, but a page which is all-frozen need not be. This commit does not attempt that optimization, although that optimization is the goal here. It seems better to get the basic infrastructure in place first. Per discussion, it's very desirable for pg_upgrade to automatically migrate existing VM forks from the old format to the new format. That, too, will be handled in a follow-on patch. Masahiko Sawada, reviewed by Kyotaro Horiguchi, Fujii Masao, Amit Kapila, Simon Riggs, Andres Freund, and others, and substantially revised by me.
* Reuse abbreviated keys in ordered [set] aggregates.Robert Haas2016-02-17
| | | | | | | | | When processing ordered aggregates following a sort that could make use of the abbreviated key optimization, only call the equality operator to compare successive pairs of tuples when their abbreviated keys were not equal. Peter Geoghegan, reviewd by Andreas Karlsson and by me.
* ExecHashRemoveNextSkewBucket must physically copy tuples to main hashtable.Tom Lane2016-02-07
| | | | | | | | | | | | | | | | | | | | | | Commit 45f6240a8fa9d355 added an assumption in ExecHashIncreaseNumBatches and ExecHashIncreaseNumBuckets that they could find all tuples in the main hash table by iterating over the "dense storage" introduced by that patch. However, ExecHashRemoveNextSkewBucket continued its old practice of simply re-linking deleted skew tuples into the main table's hashchains. Hence, such tuples got lost during any subsequent increase in nbatch or nbuckets, and would never get joined, as reported in bug #13908 from Seth P. I (tgl) think that the aforesaid commit has got multiple design issues and should be reworked rather completely; but there is no time for that right now, so band-aid the problem by making ExecHashRemoveNextSkewBucket physically copy deleted skew tuples into the "dense storage" arena. The added test case is able to exhibit the problem by means of fooling the planner with a WHERE condition that it will underestimate the selectivity of, causing the initial nbatch estimate to be too small. Tomas Vondra and Tom Lane. Thanks to David Johnston for initial investigation into the bug report.
* Fix comment block trashed by pgindent.Tom Lane2016-02-06
| | | | Looks like I put the protective dashes in the wrong place in f4e4b32743.
* Improve HJDEBUG code a bit.Tom Lane2016-02-06
| | | | | | | | | | | | | | | Commit 30d7ae3c76d2de144232ae6ab328ca86b70e72c3 introduced an HJDEBUG stanza that probably didn't compile at the time, and definitely doesn't compile now, because it refers to a nonexistent variable. It doesn't seem terribly useful anyway, so just get rid of it. While I'm fooling with it, use %z modifier instead of the obsolete hack of casting size_t to unsigned long, and include the HashJoinTable's address in each printout so that it's possible to distinguish the activities of multiple hashjoins occurring in one query. Noted while trying to use HJDEBUG to investigate bug #13908. Back-patch to 9.5, because code that doesn't compile is certainly not very helpful.
* When modifying a foreign table, initialize tableoid field properly.Robert Haas2016-02-04
| | | | | | | Failure to do this can cause AFTER ROW triggers or RETURNING expressions that reference this field to misbehave. Etsuro Fujita, reviewed by Thom Brown
* Allow parallel custom and foreign scans.Robert Haas2016-02-03
| | | | | | | | | | This patch doesn't put the new infrastructure to use anywhere, and indeed it's not clear how it could ever be used for something like postgres_fdw which has to send an SQL query and wait for a reply, but there might be FDWs or custom scan providers that are CPU-bound, so let's give them a way to join club parallel. KaiGai Kohei, reviewed by me.
* Only try to push down foreign joins if the user mapping OIDs match.Robert Haas2016-01-28
| | | | | | | | | | | | | Previously, the foreign join pushdown infrastructure left the question of security entirely up to individual FDWs, but it would be easy for a foreign data wrapper to inadvertently open up subtle security holes that way. So, make it the core code's job to determine which user mapping OID is relevant, and don't attempt join pushdown unless it's the same for all relevant relations. Per a suggestion from Tom Lane. Shigeru Hanada and Ashutosh Bapat, reviewed by Etsuro Fujita and KaiGai Kohei, with some further changes by me.
* Support parallel joins, and make related improvements.Robert Haas2016-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | The core innovation of this patch is the introduction of the concept of a partial path; that is, a path which if executed in parallel will generate a subset of the output rows in each process. Gathering a partial path produces an ordinary (complete) path. This allows us to generate paths for parallel joins by joining a partial path for one side (which at the baserel level is currently always a Partial Seq Scan) to an ordinary path on the other side. This is subject to various restrictions at present, especially that this strategy seems unlikely to be sensible for merge joins, so only nested loops and hash joins paths are generated. This also allows an Append node to be pushed below a Gather node in the case of a partitioned table. Testing revealed that early versions of this patch made poor decisions in some cases, which turned out to be caused by the fact that the original cost model for Parallel Seq Scan wasn't very good. So this patch tries to make some modest improvements in that area. There is much more to be done in the area of generating good parallel plans in all cases, but this seems like a useful step forward. Patch by me, reviewed by Dilip Kumar and Amit Kapila.
* Support multi-stage aggregation.Robert Haas2016-01-20
| | | | | | | | | | | | | | | | | | | Aggregate nodes now have two new modes: a "partial" mode where they output the unfinalized transition state, and a "finalize" mode where they accept unfinalized transition states rather than individual values as input. These new modes are not used anywhere yet, but they will be necessary for parallel aggregation. The infrastructure also figures to be useful for cases where we want to aggregate local data and remote data via the FDW interface, and want to bring back partial aggregates from the remote side that can then be combined with locally generated partial aggregates to produce the final value. It may also be useful even when neither FDWs nor parallelism are in play, as explained in the comments in nodeAgg.c. David Rowley and Simon Riggs, reviewed by KaiGai Kohei, Heikki Linnakangas, Haribabu Kommi, and me.
* Restructure index access method API to hide most of it at the C level.Tom Lane2016-01-17
| | | | | | | | | | | | | | | | | | | | | | | | This patch reduces pg_am to just two columns, a name and a handler function. All the data formerly obtained from pg_am is now provided in a C struct returned by the handler function. This is similar to the designs we've adopted for FDWs and tablesample methods. There are multiple advantages. For one, the index AM's support functions are now simple C functions, making them faster to call and much less error-prone, since the C compiler can now check function signatures. For another, this will make it far more practical to define index access methods in installable extensions. A disadvantage is that SQL-level code can no longer see attributes of index AMs; in particular, some of the crosschecks in the opr_sanity regression test are no longer possible from SQL. We've addressed that by adding a facility for the index AM to perform such checks instead. (Much more could be done in that line, but for now we're content if the amvalidate functions more or less replace what opr_sanity used to do.) We might also want to expose some sort of reporting functionality, but this patch doesn't do that. Alexander Korotkov, reviewed by Petr Jelínek, and rather heavily editorialized on by me.
* Fix spelling mistakes.Robert Haas2016-01-14
| | | | Same patch submitted independently by David Rowley and Peter Geoghegan.
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Read from the same worker repeatedly until it returns no tuple.Robert Haas2015-12-23
| | | | | | | | | | | | | The original coding read tuples from workers in round-robin fashion, but performance testing shows that it works much better to read enough to empty one queue before moving on to the next. I believe the reason for this is that, with the old approach, we could easily wake up a worker repeatedly to write only one new tuple into the shm_mq each time. With this approach, by the time the process gets scheduled, it has a decent chance of being able to fill the entire buffer in one go. Patch by me. Dilip Kumar helped with performance testing.
* Allow omitting one or both boundaries in an array slice specifier.Tom Lane2015-12-22
| | | | | | | | | | Omitted boundaries represent the upper or lower limit of the corresponding array subscript. This allows simpler specification of many common use-cases. (Revised version of commit 9246af6799819847faa33baf441251003acbb8fe) YUriy Zhuravlev
* Revert 9246af6799819847faa33baf441251003acbb8fe becauseTeodor Sigaev2015-12-18
| | | | I miss too much. Patch is returned to commitfest process.
* Fix TupleQueueReaderNext not to ignore its nowait argument.Robert Haas2015-12-18
| | | | | | This was a silly goof on my (rhaas's) part. Report and fix by Rushabh Lathia.