aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor
Commit message (Collapse)AuthorAge
* Improve index-only scans to avoid repeated access to the index page.Tom Lane2011-10-09
| | | | | | | | | | | | We copy all the matched tuples off the page during _bt_readpage, instead of expensively re-locking the page during each subsequent tuple fetch. This costs a bit more local storage, but not more than 2*BLCKSZ worth, and the reduction in LWLock traffic is certainly worth that. What's more, this lets us get rid of the API wart in the original patch that said an index AM could randomly decline to supply an index tuple despite having asserted pg_am.amcanreturn. That will be important for future improvements in the index-only-scan feature, since the executor will now be able to rely on having the index data available.
* Support index-only scans using the visibility map to avoid heap fetches.Tom Lane2011-10-07
| | | | | | | | | | | | | When a btree index contains all columns required by the query, and the visibility map shows that all tuples on a target heap page are visible-to-all, we don't need to fetch that heap page. This patch depends on the previous patches that made the visibility map reliable. There's a fair amount left to do here, notably trying to figure out a less chintzy way of estimating the cost of an index-only scan, but the core functionality seems ready to commit. Robert Haas and Ibrar Ahmed, with some previous work by Heikki Linnakangas.
* Update obsolete comments.Robert Haas2011-09-26
| | | | | | | This was partially fixed by 57fdb2b0d835fe201434fc28bf5dabf83ada26d1, back in 2005, but it missed a couple of spots. YAMAMOTO Takashi
* Make EXPLAIN ANALYZE report the numbers of rows rejected by filter steps.Tom Lane2011-09-22
| | | | | | | | | | | | | | | This provides information about the numbers of tuples that were visited but not returned by table scans, as well as the numbers of join tuples that were considered and discarded within a join plan node. There is still some discussion going on about the best way to report counts for outer-join situations, but I think most of what's in the patch would not change if we revise that, so I'm going to go ahead and commit it as-is. Documentation changes to follow (they weren't in the submitted patch either). Marko Tiikkaja, reviewed by Marc Cousin, somewhat revised by Tom
* Redesign the plancache mechanism for more flexibility and efficiency.Tom Lane2011-09-16
| | | | | | | | | | | | | | | | | | | Rewrite plancache.c so that a "cached plan" (which is rather a misnomer at this point) can support generation of custom, parameter-value-dependent plans, and can make an intelligent choice between using custom plans and the traditional generic-plan approach. The specific choice algorithm implemented here can probably be improved in future, but this commit is all about getting the mechanism in place, not the policy. In addition, restructure the API to greatly reduce the amount of extraneous data copying needed. The main compromise needed to make that possible was to split the initial creation of a CachedPlanSource into two steps. It's worth noting in particular that SPI_saveplan is now deprecated in favor of SPI_keepplan, which accomplishes the same end result with zero data copying, and no need to then spend even more cycles throwing away the original SPIPlan. The risk of long-term memory leaks while manipulating SPIPlans has also been greatly reduced. Most of this improvement is based on use of the recently-added MemoryContextSetParent primitive.
* Clean up the #include mess a little.Tom Lane2011-09-04
| | | | | | | | | | | | | | | | | walsender.h should depend on xlog.h, not vice versa. (Actually, the inclusion was circular until a couple hours ago, which was even sillier; but Bruce broke it in the expedient rather than logically correct direction.) Because of that poor decision, plus blind application of pgrminclude, we had a situation where half the system was depending on xlog.h to include such unrelated stuff as array.h and guc.h. Clean up the header inclusion, and manually revert a lot of what pgrminclude had done so things build again. This episode reinforces my feeling that pgrminclude should not be run without adult supervision. Inclusion changes in header files in particular need to be reviewed with great care. More generally, it'd be good if we had a clearer notion of module layering to dictate which headers can sanely include which others ... but that's a big task for another day.
* Rearrange planner to save the whole PlannerInfo (subroot) for a subquery.Tom Lane2011-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | Formerly, set_subquery_pathlist and other creators of plans for subqueries saved only the rangetable and rowMarks lists from the lower-level PlannerInfo. But there's no reason not to remember the whole PlannerInfo, and indeed this turns out to simplify matters in a number of places. The immediate reason for doing this was so that the subroot will still be accessible when we're trying to extract column statistics out of an already-planned subquery. But now that I've done it, it seems like a good code-beautification effort in its own right. I also chose to get rid of the transient subrtable and subrowmark fields in SubqueryScan nodes, in favor of having setrefs.c look up the subquery's RelOptInfo. That required changing all the APIs in setrefs.c to pass PlannerInfo not PlannerGlobal, which was a large but quite mechanical transformation. One side-effect not foreseen at the beginning is that this finally broke inheritance_planner's assumption that replanning the same subquery RTE N times would necessarily give interchangeable results each time. That assumption was always pretty risky, but now we really have to make a separate RTE for each instance so that there's a place to carry the separate subroots.
* Remove unnecessary #include references, per pgrminclude script.Bruce Momjian2011-09-01
|
* Fix trigger WHEN conditions when both BEFORE and AFTER triggers exist.Tom Lane2011-08-21
| | | | | | | | | Due to tuple-slot mismanagement, evaluation of WHEN conditions for AFTER ROW UPDATE triggers could crash if there had been a BEFORE ROW trigger fired for the same update. Fix by not trying to overload the use of estate->es_trig_tuple_slot. Per report from Yoran Heling. Back-patch to 9.0, when trigger WHEN conditions were introduced.
* Avoid integer overflow when LIMIT + OFFSET >= 2^63.Heikki Linnakangas2011-08-02
| | | | This fixes bug #6139 reported by Hitoshi Harada.
* Move Trigger and TriggerDesc structs out of rel.h into a new reltrigger.hAlvaro Herrera2011-07-04
| | | | | This lets us stop including rel.h into execnodes.h, which is a widely used header.
* Fix bugs in relpersistence handling during table creation.Robert Haas2011-07-03
| | | | | | | | | | | | | | | | | Unlike the relistemp field which it replaced, relpersistence must be set correctly quite early during the table creation process, as we rely on it quite early on for a number of purposes, including security checks. Normally, this is set based on whether the user enters CREATE TABLE, CREATE UNLOGGED TABLE, or CREATE TEMPORARY TABLE, but a relation may also be made implicitly temporary by creating it in pg_temp. This patch fixes the handling of that case, and also disables creation of unlogged tables in temporary tablespace (such table indeed skip WAL-logging, but we reject an explicit specification) and creation of relations in the temporary schemas of other sessions (which is not very sensible, and didn't work right anyway). Report by Amit Khandekar.
* Move the PredicateLockRelation() call from nodeSeqscan.c to heapam.c. It'sHeikki Linnakangas2011-06-29
| | | | | | | | | | | | | | | | | | | | more consistent that way, since all the other PredicateLock* calls are made in various heapam.c and index AM functions. The call in nodeSeqscan.c was unnecessarily aggressive anyway, there's no need to try to lock the relation every time a tuple is fetched, it's enough to do it once. This has the user-visible effect that if a seq scan is initialized in the executor, but never executed, we now acquire the predicate lock on the heap relation anyway. We could avoid that by taking the lock on the first heap_getnext() call instead, but it doesn't seem worth the trouble given that it feels more natural to do it in heap_beginscan(). Also, remove the retail PredicateLockTuple() calls from heap_getnext(). In a seqscan, started with heap_begin(), we're holding a whole-relation predicate lock on the heap so there's no need to lock the tuples individually. Kevin Grittner and me
* Grab predicate locks on matching tuples in a lossy bitmap heap scan.Heikki Linnakangas2011-06-29
| | | | | | Non-lossy case was already handled correctly. Kevin Grittner
* Avoid having two copies of the HOT-chain search logic.Robert Haas2011-06-27
| | | | | | | | | | | | | It's been like this since HOT was originally introduced, but the logic is complex enough that this is a recipe for bugs, as we've already found out with SSI. So refactor heap_hot_search_buffer() so that it can satisfy the needs of index_getnext(), and make index_getnext() use that rather than duplicating the logic. This change was originally proposed by Heikki Linnakangas as part of a larger refactoring oriented towards allowing index-only scans. I extracted and adjusted this part, since it seems to have independent merit. Review by Jeff Davis.
* Remove extra copying of TupleDescs for heap_create_with_catalogAlvaro Herrera2011-06-20
| | | | | | | | | Some callers were creating copies of tuple descriptors to pass to that function, stating in code comments that it was necessary because it modified the passed descriptor. Code inspection reveals this not to be true, and indeed not all callers are passing copies in the first place. So remove the extra ones and the misleading comments about this behavior as well.
* Remove another no-longer-needed inclusion of predicate.h.Tom Lane2011-06-16
|
* Make non-MVCC snapshots exempt from predicate locking. Scans with non-MVCCHeikki Linnakangas2011-06-15
| | | | | | | | snapshots, like in REINDEX, are basically non-transactional operations. The DDL operation itself might participate in SSI, but there's separate functions for that. Kevin Grittner and Dan Ports, with some changes by me.
* Pgindent run before 9.1 beta2.Bruce Momjian2011-06-09
|
* Disallow SELECT FOR UPDATE/SHARE on sequences.Tom Lane2011-06-02
| | | | | | | | | | | | | | | | We can't allow this because such an operation stores its transaction XID into the sequence tuple's xmax. Because VACUUM doesn't process sequences (and we don't want it to start doing so), such an xmax value won't get frozen, meaning it will eventually refer to nonexistent pg_clog storage, and even wrap around completely. Since the row lock is ignored by nextval and setval, the usefulness of the operation is highly debatable anyway. Per reports of trouble with pgpool 3.0, which had ill-advisedly started using such commands as a form of locking. In HEAD, also disallow SELECT FOR UPDATE/SHARE on toast tables. Although this does work safely given the current implementation, there seems no good reason to allow it. I refrained from changing that behavior in back branches, however.
* Allow hash joins to be interrupted while searching hash table for match.Tom Lane2011-06-01
| | | | | | | | | | | Per experimentation with a recent example, in which unreasonable amounts of time could elapse before the backend would respond to a query-cancel. This might be something to back-patch, but the patch doesn't apply cleanly because this code was rewritten for 9.1. Given the lack of field complaints I won't bother for now. Cédric Villemain
* Install defenses against overflow in BuildTupleHashTable().Tom Lane2011-05-23
| | | | | | | | | | | | | | | | | | | | | The planner can sometimes compute very large values for numGroups, and in cases where we have no alternative to building a hashtable, such a value will get fed directly to BuildTupleHashTable as its nbuckets parameter. There were two ways in which that could go bad. First, BuildTupleHashTable declared the parameter as "int" but most callers were passing "long"s, so on 64-bit machines undetected overflow could occur leading to a bogus negative value. The obvious fix for that is to change the parameter to "long", which is what I've done in HEAD. In the back branches that seems a bit risky, though, since third-party code might be calling this function. So for them, just put in a kluge to treat negative inputs as INT_MAX. Second, hash_create can go nuts with extremely large requested table sizes (notably, my_log2 becomes an infinite loop for inputs larger than LONG_MAX/2). What seems most appropriate to avoid that is to bound the initial table size request to work_mem. This fixes bug #6035 reported by Daniel Schreiber. Although the reported case only occurs back to 8.4 since it involves WITH RECURSIVE, I think it's a good idea to install the defenses in all supported branches.
* Reset per-tuple memory context between every row in a scan node, even whenHeikki Linnakangas2011-05-21
| | | | | | there's no quals or projections. Currently this only matters for foreign scans, as none of the other scan nodes litter the per-tuple memory context when there's no quals or projections.
* Refactor broken CREATE TABLE IF NOT EXISTS support.Robert Haas2011-04-25
| | | | | | | | | | | | | | | | Per bug #5988, reported by Marko Tiikkaja, and further analyzed by Tom Lane, the previous coding was broken in several respects: even if the target table already existed, a subsequent CREATE TABLE IF NOT EXISTS might try to add additional constraints or sequences-for-serial specified in the new CREATE TABLE statement. In passing, this also fixes a minor information leak: it's no longer possible to figure out whether a schema to which you don't have CREATE access contains a sequence named like "x_y_seq" by attempting to create a table in that schema called "x" with a serial column called "y". Some more refactoring of this code in the future might be warranted, but that will need to wait for a later major release.
* Make a code-cleanup pass over the collations patch.Tom Lane2011-04-22
| | | | | | | This patch is almost entirely cosmetic --- mostly cleaning up a lot of neglected comments, and fixing code layout problems in places where the patch made lines too long and then pgindent did weird things with that. I did find a bug-of-omission in equalTupleDescs().
* Pass collations to functions in FunctionCallInfoData, not FmgrInfo.Tom Lane2011-04-12
| | | | | | | | | | | Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...
* Clean up most -Wunused-but-set-variable warnings from gcc 4.6Peter Eisentraut2011-04-11
| | | | | | This warning is new in gcc 4.6 and part of -Wall. This patch cleans up most of the noise, but there are some still warnings that are trickier to remove.
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Fix check_exclusion_constraint() to insert correct collations in ScanKeys.Tom Lane2011-03-27
|
* Clean up cruft around collation initialization for tupdescs and scankeys.Tom Lane2011-03-26
| | | | | I found actual bugs in GiST and plpgsql; the rest of this is cosmetic but meant to decrease the odds of future bugs of omission.
* Pass collation to makeConst() instead of looking it up internally.Tom Lane2011-03-25
| | | | | | | | | In nearly all cases, the caller already knows the correct collation, and in a number of places, the value the caller has handy is more correct than the default for the type would be. (In particular, this patch makes it significantly less likely that eval_const_expressions will result in changing the exposed collation of an expression.) So an internal lookup is both expensive and wrong.
* Fix handling of collation in SQL-language functions.Tom Lane2011-03-24
| | | | | | | | | | Ensure that parameter symbols receive collation from the function's resolved input collation, and fix inlining to behave properly. BTW, this commit lays about 90% of the infrastructure needed to support use of argument names in SQL functions. Parsing of parameters is now done via the parser-hook infrastructure ... we'd just need to supply a column-ref hook ...
* Revise collation derivation method and expression-tree representation.Tom Lane2011-03-19
| | | | | | | | | | | | | | | | | | | All expression nodes now have an explicit output-collation field, unless they are known to only return a noncollatable data type (such as boolean or record). Also, nodes that can invoke collation-aware functions store a separate field that is the collation value to pass to the function. This avoids confusion that arises when a function has collatable inputs and noncollatable output type, or vice versa. Also, replace the parser's on-the-fly collation assignment method with a post-pass over the completed expression tree. This allows us to use a more complex (and hopefully more nearly spec-compliant) assignment rule without paying for it in extra storage in every expression node. Fix assorted bugs in the planner's handling of collations by making collation one of the defining properties of an EquivalenceClass and by converting CollateExprs into discardable RelabelType nodes during expression preprocessing.
* Split CollateClause into separate raw and analyzed node types.Tom Lane2011-03-11
| | | | | | | | | | | CollateClause is now used only in raw grammar output, and CollateExpr after parse analysis. This is for clarity and to avoid carrying collation names in post-analysis parse trees: that's both wasteful and possibly misleading, since the collation's name could be changed while the parsetree still exists. Also, clean up assorted infelicities and omissions in processing of the node type.
* Rearrange snapshot handling to make rule expansion more consistent.Tom Lane2011-02-28
| | | | | | | | | | | | | | | | | | | | | With this patch, portals, SQL functions, and SPI all agree that there should be only a CommandCounterIncrement between the queries that are generated from a single SQL command by rule expansion. Fetching a whole new snapshot now happens only between original queries. This is equivalent to the existing behavior of EXPLAIN ANALYZE, and it was judged to be the best choice since it eliminates one source of concurrency hazards for rules. The patch should also make things marginally faster by reducing the number of snapshot push/pop operations. The patch removes pg_parse_and_rewrite(), which is no longer used anywhere. There was considerable discussion about more aggressive refactoring of the query-processing functions exported by postgres.c, but for the moment nothing more has been done there. I also took the opportunity to refactor snapmgr.c's API slightly: the former PushUpdatedSnapshot() has been split into two functions. Marko Tiikkaja, reviewed by Steve Singer and Tom Lane
* Refactor the executor's API to support data-modifying CTEs better.Tom Lane2011-02-27
| | | | | | | | | | | | | | | | | | | | | | The originally committed patch for modifying CTEs didn't interact well with EXPLAIN, as noted by myself, and also had corner-case problems with triggers, as noted by Dean Rasheed. Those problems show it is really not practical for ExecutorEnd to call any user-defined code; so split the cleanup duties out into a new function ExecutorFinish, which must be called between the last ExecutorRun call and ExecutorEnd. Some Asserts have been added to these functions to help verify correct usage. It is no longer necessary for callers of the executor to call AfterTriggerBeginQuery/AfterTriggerEndQuery for themselves, as this is now done by ExecutorStart/ExecutorFinish respectively. If you really need to suppress that and do it for yourself, pass EXEC_FLAG_SKIP_TRIGGERS to ExecutorStart. Also, refactor portal commit processing to allow for the possibility that PortalDrop will invoke user-defined code. I think this is not actually necessary just yet, since the portal-execution-strategy logic forces any non-pure-SELECT query to be run to completion before we will consider committing. But it seems like good future-proofing.
* Fix order of shutdown processing when CTEs contain inter-references.Tom Lane2011-02-25
| | | | | | We need ExecutorEnd to run the ModifyTable nodes to completion in reverse order of initialization, not forward order. Easily done by constructing the list back-to-front.
* Support data-modifying commands (INSERT/UPDATE/DELETE) in WITH.Tom Lane2011-02-25
| | | | | | | | | | | | | | This patch implements data-modifying WITH queries according to the semantics that the updates all happen with the same command counter value, and in an unspecified order. Therefore one WITH clause can't see the effects of another, nor can the outer query see the effects other than through the RETURNING values. And attempts to do conflicting updates will have unpredictable results. We'll need to document all that. This commit just fixes the code; documentation updates are waiting on author. Marko Tiikkaja and Hitoshi Harada
* Remove ExecRemoveJunk(), which is no longer used anywhere.Tom Lane2011-02-21
| | | | | | | This was a leftover from the pre-8.1 design of junkfilters. It doesn't seem to have any reason to live, since it's merely a combination of two easy function calls, and not a well-designed combination at that (it encourages callers to leak the result tuple).
* Fix dangling-pointer problem in before-row update trigger processing.Tom Lane2011-02-21
| | | | | | | | | | | | | | | | | | | | | | ExecUpdate checked for whether ExecBRUpdateTriggers had returned a new tuple value by seeing if the returned tuple was pointer-equal to the old one. But the "old one" was in estate->es_junkFilter's result slot, which would be scribbled on if we had done an EvalPlanQual update in response to a concurrent update of the target tuple; therefore we were comparing a dangling pointer to a live one. Given the right set of circumstances we could get a false match, resulting in not forcing the tuple to be stored in the slot we thought it was stored in. In the case reported by Maxim Boguk in bug #5798, this led to "cannot extract system attribute from virtual tuple" failures when trying to do "RETURNING ctid". I believe there is a very-low-probability chance of more serious errors, such as generating incorrect index entries based on the original rather than the trigger-modified version of the row. In HEAD, change all of ExecBRInsertTriggers, ExecIRInsertTriggers, ExecBRUpdateTriggers, and ExecIRUpdateTriggers so that they continue to have similar APIs. In the back branches I just changed ExecBRUpdateTriggers, since there is no bug in the ExecBRInsertTriggers case.
* Implement an API to let foreign-data wrappers actually be functional.Tom Lane2011-02-20
| | | | | | | This commit provides the core code and documentation needed. A contrib module test case will follow shortly. Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
* Fix improper matching of resjunk column names for FOR UPDATE in subselect.Tom Lane2011-02-09
| | | | | | | | | | | | | | | | Flattening of subquery range tables during setrefs.c could lead to the rangetable indexes in PlanRowMark nodes not matching up with the column names previously assigned to the corresponding resjunk ctid (resp. tableoid or wholerow) columns. Typical symptom would be either a "cannot extract system attribute from virtual tuple" error or an Assert failure. This wasn't a problem before 9.0 because we didn't support FOR UPDATE below the top query level, and so the final flattening could never renumber an RTE that was relevant to FOR UPDATE. Fix by using a plan-tree-wide unique number for each PlanRowMark to label the associated resjunk columns, so that the number need not change during flattening. Per report from David Johnston (though I'm darned if I can see how this got past initial testing of the relevant code). Back-patch to 9.0.
* Per-column collation supportPeter Eisentraut2011-02-08
| | | | | | | | This adds collation support for columns and domains, a COLLATE clause to override it per expression, and B-tree index support. Peter Eisentraut reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
* Implement genuine serializable isolation level.Heikki Linnakangas2011-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen
* Fix wrong error reports in 'number of array dimensions exceeds theItagaki Takahiro2011-02-01
| | | | | | maximum allowed' messages, that have reported one-less dimensions. Alexey Klyukin
* Fix PlanRowMark/ExecRowMark structures to handle inheritance correctly.Tom Lane2011-01-12
| | | | | | | | | | | | | | | | | | In an inherited UPDATE/DELETE, each target table has its own subplan, because it might have a column set different from other targets. This means that the resjunk columns we add to support EvalPlanQual might be at different physical column numbers in each subplan. The EvalPlanQual rewrite I did for 9.0 failed to account for this, resulting in possible misbehavior or even crashes during concurrent updates to the same row, as seen in a recent report from Gordon Shannon. Revise the data structure so that we track resjunk column numbers separately for each subplan. I also chose to move responsibility for identifying the physical column numbers back to executor startup, instead of assuming that numbers derived during preprocess_targetlist would stay valid throughout subsequent massaging of the plan. That's a bit slower, so we might want to consider undoing it someday; but it would complicate the patch considerably and didn't seem justifiable in a bug fix that has to be back-patched to 9.0.
* Basic foreign table support.Robert Haas2011-01-01
| | | | | | | | | | | Foreign tables are a core component of SQL/MED. This commit does not provide a working SQL/MED infrastructure, because foreign tables cannot yet be queried. Support for foreign table scans will need to be added in a future patch. However, this patch creates the necessary system catalog structure, syntax support, and support for ancillary operations such as COMMENT and SECURITY LABEL. Shigeru Hanada, heavily revised by Robert Haas
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Move symbols for ExecMergeJoin's state machine into nodeMergejoin.c.Tom Lane2010-12-30
| | | | | There's no reason for these values to be known anywhere else. After doing this, executor/execdefs.h is vestigial and can be removed.
* Support RIGHT and FULL OUTER JOIN in hash joins.Tom Lane2010-12-30
| | | | | | | | | | | | | | | | | | | | | | This is advantageous first because it allows us to hash the smaller table regardless of the outer-join type, and second because hash join can be more flexible than merge join in dealing with arbitrary join quals in a FULL join. For merge join all the join quals have to be mergejoinable, but hash join will work so long as there's at least one hashjoinable qual --- the others can be any condition. (This is true essentially because we don't keep per-inner-tuple match flags in merge join, while hash join can do so.) To do this, we need a has-it-been-matched flag for each tuple in the hashtable, not just one for the current outer tuple. The key idea that makes this practical is that we can store the match flag in the tuple's infomask, since there are lots of bits there that are of no interest for a MinimalTuple. So we aren't increasing the size of the hashtable at all for the feature. To write this without turning the hash code into even more of a pile of spaghetti than it already was, I rewrote ExecHashJoin in a state-machine style, similar to ExecMergeJoin. Other than that decision, it was pretty straightforward.