aboutsummaryrefslogtreecommitdiff
path: root/src/backend/executor/nodeHashjoin.c
Commit message (Collapse)AuthorAge
...
* Improve implementation of pg_attribute_always_inline.Tom Lane2018-01-23
| | | | | | | | | | | | | | | | | | | | Avoid compiler warnings on MSVC (which doesn't want to see both __forceinline and inline) and ancient GCC (which doesn't have __attribute__((always_inline))). Don't force inline-ing when building at -O0, as the programmer is probably hoping for exact source-to-object-line correspondence in that case. (For the moment this only works for GCC; maybe we can extend it later.) Make pg_attribute_always_inline be syntactically a drop-in replacement for inline, rather than an additional wart. And improve the comments. Thomas Munro and Michail Nikolaev, small tweaks by me Discussion: https://postgr.es/m/32278.1514863068@sss.pgh.pa.us Discussion: https://postgr.es/m/CANtu0oiYp74brgntKOxgg1FK5+t8uQ05guSiFU6FYz_5KUhr6Q@mail.gmail.com
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Fix EXPLAIN ANALYZE output for Parallel Hash.Andres Freund2018-01-01
| | | | | | | | | | | | | | In a race case, EXPLAIN ANALYZE could fail to display correct nbatch and size information. Refactor so that participants report only on batches they worked on rather than trying to report on all of them, and teach explain.c to consider the HashInstrumentation object from all participants instead of picking the first one it can find. This should fix an occasional build farm failure in the "join" regression test. Author: Thomas Munro Reviewed-By: Andres Freund Discussion: https://postgr.es/m/30219.1514428346%40sss.pgh.pa.us
* Add parallel-aware hash joins.Andres Freund2017-12-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce parallel-aware hash joins that appear in EXPLAIN plans as Parallel Hash Join with Parallel Hash. While hash joins could already appear in parallel queries, they were previously always parallel-oblivious and had a partial subplan only on the outer side, meaning that the work of the inner subplan was duplicated in every worker. After this commit, the planner will consider using a partial subplan on the inner side too, using the Parallel Hash node to divide the work over the available CPU cores and combine its results in shared memory. If the join needs to be split into multiple batches in order to respect work_mem, then workers process different batches as much as possible and then work together on the remaining batches. The advantages of a parallel-aware hash join over a parallel-oblivious hash join used in a parallel query are that it: * avoids wasting memory on duplicated hash tables * avoids wasting disk space on duplicated batch files * divides the work of building the hash table over the CPUs One disadvantage is that there is some communication between the participating CPUs which might outweigh the benefits of parallelism in the case of small hash tables. This is avoided by the planner's existing reluctance to supply partial plans for small scans, but it may be necessary to estimate synchronization costs in future if that situation changes. Another is that outer batch 0 must be written to disk if multiple batches are required. A potential future advantage of parallel-aware hash joins is that right and full outer joins could be supported, since there is a single set of matched bits for each hashtable, but that is not yet implemented. A new GUC enable_parallel_hash is defined to control the feature, defaulting to on. Author: Thomas Munro Reviewed-By: Andres Freund, Robert Haas Tested-By: Rafia Sabih, Prabhat Sahu Discussion: https://postgr.es/m/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA@mail.gmail.com https://postgr.es/m/CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
* Move ExecProcNode from dispatch to function pointer based model.Andres Freund2017-07-30
| | | | | | | | | | | | | | | | | | | | | | This allows us to add stack-depth checks the first time an executor node is called, and skip that overhead on following calls. Additionally it yields a nice speedup. While it'd probably have been a good idea to have that check all along, it has become more important after the new expression evaluation framework in b8d7f053c5c2bf2a7e - there's no stack depth check in common paths anymore now. We previously relied on ExecEvalExpr() being executed somewhere. We should move towards that model for further routines, but as this is required for v10, it seems better to only do the necessary (which already is quite large). Author: Andres Freund, Tom Lane Reported-By: Julien Rouhaud Discussion: https://postgr.es/m/22833.1490390175@sss.pgh.pa.us https://postgr.es/m/b0af9eaa-130c-60d0-9e4e-7a135b1e0c76@dalibo.com
* Move interrupt checking from ExecProcNode() to executor nodes.Andres Freund2017-07-30
| | | | | | | | | | | | | | | | | In a followup commit ExecProcNode(), and especially the large switch it contains, will largely be replaced by a function pointer directly to the correct node. The node functions will then get invoked by a thin inline function wrapper. To avoid having to include miscadmin.h in headers - CHECK_FOR_INTERRUPTS() - move the interrupt checks into the individual executor routines. While looking through all executor nodes, I noticed a number of arguably missing interrupt checks, add these too. Author: Andres Freund, Tom Lane Reviewed-By: Tom Lane Discussion: https://postgr.es/m/22833.1490390175@sss.pgh.pa.us
* Phase 3 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Phase 2 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Initial pgindent run with pg_bsd_indent version 2.0.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new indent version includes numerous fixes thanks to Piotr Stefaniak. The main changes visible in this commit are: * Nicer formatting of function-pointer declarations. * No longer unexpectedly removes spaces in expressions using casts, sizeof, or offsetof. * No longer wants to add a space in "struct structname *varname", as well as some similar cases for const- or volatile-qualified pointers. * Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely. * Fixes bug where comments following declarations were sometimes placed with no space separating them from the code. * Fixes some odd decisions for comments following case labels. * Fixes some cases where comments following code were indented to less than the expected column 33. On the less good side, it now tends to put more whitespace around typedef names that are not listed in typedefs.list. This might encourage us to put more effort into typedef name collection; it's not really a bug in indent itself. There are more changes coming after this round, having to do with comment indentation and alignment of lines appearing within parentheses. I wanted to limit the size of the diffs to something that could be reviewed without one's eyes completely glazing over, so it seemed better to split up the changes as much as practical. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Improve castNode notation by introducing list-extraction-specific variants.Tom Lane2017-04-10
| | | | | | | | | | | | | | | | | This extends the castNode() notation introduced by commit 5bcab1114 to provide, in one step, extraction of a list cell's pointer and coercion to a concrete node type. For example, "lfirst_node(Foo, lc)" is the same as "castNode(Foo, lfirst(lc))". Almost half of the uses of castNode that have appeared so far include a list extraction call, so this is pretty widely useful, and it saves a few more keystrokes compared to the old way. As with the previous patch, back-patch the addition of these macros to pg_list.h, so that the notation will be available when back-patching. Patch by me, after an idea of Andrew Gierth's. Discussion: https://postgr.es/m/14197.1491841216@sss.pgh.pa.us
* Optimize joins when the inner relation can be proven unique.Tom Lane2017-04-07
| | | | | | | | | | | | | | | | | | | | | | | If there can certainly be no more than one matching inner row for a given outer row, then the executor can move on to the next outer row as soon as it's found one match; there's no need to continue scanning the inner relation for this outer row. This saves useless scanning in nestloop and hash joins. In merge joins, it offers the opportunity to skip mark/restore processing, because we know we have not advanced past the first possible match for the next outer row. Of course, the devil is in the details: the proof of uniqueness must depend only on joinquals (not otherquals), and if we want to skip mergejoin mark/restore then it must depend only on merge clauses. To avoid adding more planning overhead than absolutely necessary, the present patch errs in the conservative direction: there are cases where inner_unique or skip_mark_restore processing could be used, but it will not do so because it's not sure that the uniqueness proof depended only on "safe" clauses. This could be improved later. David Rowley, reviewed and rather heavily editorialized on by me Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com
* Faster expression evaluation and targetlist projection.Andres Freund2017-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the old, recursive tree-walk based evaluation, with non-recursive, opcode dispatch based, expression evaluation. Projection is now implemented as part of expression evaluation. This both leads to significant performance improvements, and makes future just-in-time compilation of expressions easier. The speed gains primarily come from: - non-recursive implementation reduces stack usage / overhead - simple sub-expressions are implemented with a single jump, without function calls - sharing some state between different sub-expressions - reduced amount of indirect/hard to predict memory accesses by laying out operation metadata sequentially; including the avoidance of nearly all of the previously used linked lists - more code has been moved to expression initialization, avoiding constant re-checks at evaluation time Future just-in-time compilation (JIT) has become easier, as demonstrated by released patches intended to be merged in a later release, for primarily two reasons: Firstly, due to a stricter split between expression initialization and evaluation, less code has to be handled by the JIT. Secondly, due to the non-recursive nature of the generated "instructions", less performance-critical code-paths can easily be shared between interpreted and compiled evaluation. The new framework allows for significant future optimizations. E.g.: - basic infrastructure for to later reduce the per executor-startup overhead of expression evaluation, by caching state in prepared statements. That'd be helpful in OLTPish scenarios where initialization overhead is measurable. - optimizing the generated "code". A number of proposals for potential work has already been made. - optimizing the interpreter. Similarly a number of proposals have been made here too. The move of logic into the expression initialization step leads to some backward-incompatible changes: - Function permission checks are now done during expression initialization, whereas previously they were done during execution. In edge cases this can lead to errors being raised that previously wouldn't have been, e.g. a NULL array being coerced to a different array type previously didn't perform checks. - The set of domain constraints to be checked, is now evaluated once during expression initialization, previously it was re-built every time a domain check was evaluated. For normal queries this doesn't change much, but e.g. for plpgsql functions, which caches ExprStates, the old set could stick around longer. The behavior around might still change. Author: Andres Freund, with significant changes by Tom Lane, changes by Heikki Linnakangas Reviewed-By: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/20161206034955.bh33paeralxbtluv@alap3.anarazel.de
* Make sure that hash join's bulk-tuple-transfer loops are interruptible.Tom Lane2017-02-15
| | | | | | | | | | | | | | | | | | The loops in ExecHashJoinNewBatch(), ExecHashIncreaseNumBatches(), and ExecHashRemoveNextSkewBucket() are all capable of iterating over many tuples without ever doing a CHECK_FOR_INTERRUPTS, so that the backend might fail to respond to SIGINT or SIGTERM for an unreasonably long time. Fix that. In the case of ExecHashJoinNewBatch(), it seems useful to put the added CHECK_FOR_INTERRUPTS into ExecHashJoinGetSavedTuple() rather than directly in the loop, because that will also ensure that both principal code paths through ExecHashJoinOuterGetTuple() will do a CHECK_FOR_INTERRUPTS, which seems like a good idea to avoid surprises. Back-patch to all supported branches. Tom Lane and Thomas Munro Discussion: https://postgr.es/m/6044.1487121720@sss.pgh.pa.us
* Use the new castNode() macro in a number of places.Andres Freund2017-01-26
| | | | | | | | | This is far from a pervasive conversion, but it's a good starting point. Author: Peter Eisentraut, with some minor changes by me Reviewed-By: Tom Lane Discussion: https://postgr.es/m/c5d387d9-3440-f5e0-f9d4-71d53b9fbe52@2ndquadrant.com
* Remove obsoleted code relating to targetlist SRF evaluation.Andres Freund2017-01-19
| | | | | | | | | | | | | Since 69f4b9c plain expression evaluation (and thus normal projection) can't return sets of tuples anymore. Thus remove code dealing with that possibility. This will require adjustments in external code using ExecEvalExpr()/ExecProject() - that should neither be hard nor very common. Author: Andres Freund and Tom Lane Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Split tuple struct defs from htup.h to htup_details.hAlvaro Herrera2012-08-30
| | | | | | | | | | | | This reduces unnecessary exposure of other headers through htup.h, which is very widely included by many files. I have chosen to move the function prototypes to the new file as well, because that means htup.h no longer needs to include tupdesc.h. In itself this doesn't have much effect in indirect inclusion of tupdesc.h throughout the tree, because it's also required by execnodes.h; but it's something to explore in the future, and it seemed best to do the htup.h change now while I'm busy with it.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Make EXPLAIN ANALYZE report the numbers of rows rejected by filter steps.Tom Lane2011-09-22
| | | | | | | | | | | | | | | This provides information about the numbers of tuples that were visited but not returned by table scans, as well as the numbers of join tuples that were considered and discarded within a join plan node. There is still some discussion going on about the best way to report counts for outer-join situations, but I think most of what's in the patch would not change if we revise that, so I'm going to go ahead and commit it as-is. Documentation changes to follow (they weren't in the submitted patch either). Marko Tiikkaja, reviewed by Marc Cousin, somewhat revised by Tom
* Pgindent run before 9.1 beta2.Bruce Momjian2011-06-09
|
* Allow hash joins to be interrupted while searching hash table for match.Tom Lane2011-06-01
| | | | | | | | | | | Per experimentation with a recent example, in which unreasonable amounts of time could elapse before the backend would respond to a query-cancel. This might be something to back-patch, but the patch doesn't apply cleanly because this code was rewritten for 9.1. Given the lack of field complaints I won't bother for now. Cédric Villemain
* Clean up most -Wunused-but-set-variable warnings from gcc 4.6Peter Eisentraut2011-04-11
| | | | | | This warning is new in gcc 4.6 and part of -Wall. This patch cleans up most of the noise, but there are some still warnings that are trickier to remove.
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Support RIGHT and FULL OUTER JOIN in hash joins.Tom Lane2010-12-30
| | | | | | | | | | | | | | | | | | | | | | This is advantageous first because it allows us to hash the smaller table regardless of the outer-join type, and second because hash join can be more flexible than merge join in dealing with arbitrary join quals in a FULL join. For merge join all the join quals have to be mergejoinable, but hash join will work so long as there's at least one hashjoinable qual --- the others can be any condition. (This is true essentially because we don't keep per-inner-tuple match flags in merge join, while hash join can do so.) To do this, we need a has-it-been-matched flag for each tuple in the hashtable, not just one for the current outer tuple. The key idea that makes this practical is that we can store the match flag in the tuple's infomask, since there are lots of bits there that are of no interest for a MinimalTuple. So we aren't increasing the size of the hashtable at all for the feature. To write this without turning the hash code into even more of a pile of spaghetti than it already was, I rewrote ExecHashJoin in a state-machine style, similar to ExecMergeJoin. Other than that decision, it was pretty straightforward.
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Make NestLoop plan nodes pass outer-relation variables into their innerTom Lane2010-07-12
| | | | | | | | | | | | relation using the general PARAM_EXEC executor parameter mechanism, rather than the ad-hoc kluge of passing the outer tuple down through ExecReScan. The previous method was hard to understand and could never be extended to handle parameters coming from multiple join levels. This patch doesn't change the set of possible plans nor have any significant performance effect, but it's necessary infrastructure for future generalization of the concept of an inner indexscan plan. ExecReScan's second parameter is now unused, so it's removed.
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Remove no-longer-needed ExecCountSlots infrastructure.Tom Lane2009-09-27
|
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Revert DTrace patch from Robert LorBruce Momjian2009-04-02
|
* Add support for additional DTrace probes.Bruce Momjian2009-04-02
| | | | Robert Lor
* Optimize multi-batch hash joins when the outer relation has a nonuniformTom Lane2009-03-21
| | | | | | | | | distribution, by creating a special fast path for the (first few) most common values of the outer relation. Tuples having hashvalues matching the MCVs are effectively forced to be in the first batch, so that we never write them out to the batch temp files. Bryce Cutt and Ramon Lawrence, with some editorialization by me.
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Remove useless ps_OuterTupleSlot field from PlanState. I suppose this wasTom Lane2008-10-23
| | | | | used long ago, but in the current code the ecxt_outertuple field of ExprContext is doing all the work. Spotted by Ran Tang.
* Performance fix for new anti-join code in nodeMergejoin.c: after finding aTom Lane2008-08-15
| | | | | | | | | | | | | | | | | | match in antijoin mode, we should advance to next outer tuple not next inner. We know we don't want to return this outer tuple, and there is no point in advancing over matching inner tuples now, because we'd just have to do it again if the next outer tuple has the same merge key. This makes a noticeable difference if there are lots of duplicate keys in both inputs. Similarly, after finding a match in semijoin mode, arrange to advance to the next outer tuple after returning the current match; or immediately, if it fails the extra quals. The rationale is the same. (This is a performance bug in existing releases; perhaps worth back-patching? The planner tries to avoid using mergejoin with lots of duplicates, so it may not be a big issue in practice.) Nestloop and hash got this right to start with, but I made some cosmetic adjustments there to make the corresponding bits of logic look more similar.
* Implement SEMI and ANTI joins in the planner and executor. (Semijoins replaceTom Lane2008-08-14
| | | | | | | | | | | | | | the old JOIN_IN code, but antijoins are new functionality.) Teach the planner to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL filters are recognized as being anti joins. Unify the InClauseInfo and OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change, it becomes possible to associate a SpecialJoinInfo with every join attempt, which permits some cleanup of join selectivity estimation. That needs to be taken much further than this patch does, but the next step is to change the API for oprjoin selectivity functions, which seems like material for a separate patch. So for the moment the output size estimates for semi and especially anti joins are quite bogus.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Rework temp_tablespaces patch so that temp tablespaces are assigned separatelyTom Lane2007-06-07
| | | | | | | | | for each temp file, rather than once per sort or hashjoin; this allows spreading the data of a large sort or join across multiple tablespaces. (I remain dubious that this will make any difference in practice, but certain people insisted.) Arrange to cache the results of parsing the GUC variable instead of recomputing from scratch on every demand, and push usage of the cache down to the bottommost fd.c level.
* Create a GUC parameter temp_tablespaces that allows selection of theTom Lane2007-06-03
| | | | | | | | | | tablespace(s) in which to store temp tables and temporary files. This is a list to allow spreading the load across multiple tablespaces (a random list element is chosen each time a temp object is to be created). Temp files are not stored in per-database pgsql_tmp/ directories anymore, but per-tablespace directories. Jaime Casanova and Albert Cervera, with review by Bernd Helmle and Tom Lane.
* Repair failure to check that a table is still compatible with a previouslyTom Lane2007-02-02
| | | | | | | | | | | | | | | | | | | | | | made query plan. Use of ALTER COLUMN TYPE creates a hazard for cached query plans: they could contain Vars that claim a column has a different type than it now has. Fix this by checking during plan startup that Vars at relation scan level match the current relation tuple descriptor. Since at that point we already have at least AccessShareLock, we can be sure the column type will not change underneath us later in the query. However, since a backend's locks do not conflict against itself, there is still a hole for an attacker to exploit: he could try to execute ALTER COLUMN TYPE while a query is in progress in the current backend. Seal that hole by rejecting ALTER TABLE whenever the target relation is already open in the current backend. This is a significant security hole: not only can one trivially crash the backend, but with appropriate misuse of pass-by-reference datatypes it is possible to read out arbitrary locations in the server process's memory, which could allow retrieving database content the user should not be able to see. Our thanks to Jeff Trout for the initial report. Security: CVE-2007-0556
* Add support for cross-type hashing in hash index searches and hash joins.Tom Lane2007-01-30
| | | | | | Hashing for aggregation purposes still needs work, so it's not time to mark any cross-type operators as hashable for general use, but these cases work if the operators are so marked by hand in the system catalogs.
* Improve hash join to discard input tuples immediately if they can'tTom Lane2007-01-28
| | | | | | match because they contain a null join key (and the join operator is known strict). Improves performance significantly when the inner relation contains a lot of nulls, as per bug #2930.
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.