aboutsummaryrefslogtreecommitdiff
path: root/src/backend/optimizer/path/joinpath.c
Commit message (Collapse)AuthorAge
...
* Implement SQL-standard LATERAL subqueries.Tom Lane2012-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the standard syntax of LATERAL attached to a sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM, since set-returning function calls are expected to be one of the principal use-cases. The main change here is a rewrite of the mechanism for keeping track of which relations are visible for column references while the FROM clause is being scanned. The parser "namespace" lists are no longer lists of bare RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE pointer as well as some visibility-controlling flags. Aside from supporting LATERAL correctly, this lets us get rid of the ancient hacks that required rechecking subqueries and JOIN/ON and function-in-FROM expressions for invalid references after they were initially parsed. Invalid column references are now always correctly detected on sight. In passing, remove assorted parser error checks that are now dead code by virtue of our having gotten rid of add_missing_from, as well as some comments that are obsolete for the same reason. (It was mainly add_missing_from that caused so much fudging here in the first place.) The planner support for this feature is very minimal, and will be improved in future patches. It works well enough for testing purposes, though. catversion bump forced due to new field in RangeTblEntry.
* Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian2012-06-10
| | | | commit-fest.
* Revise parameterized-path mechanism to fix assorted issues.Tom Lane2012-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adjusts the treatment of parameterized paths so that all paths with the same parameterization (same set of required outer rels) for the same relation will have the same rowcount estimate. We cache the rowcount estimates to ensure that property, and hopefully save a few cycles too. Doing this makes it practical for add_path_precheck to operate without a rowcount estimate: it need only assume that paths with different parameterizations never dominate each other, which is close enough to true anyway for coarse filtering, because normally a more-parameterized path should yield fewer rows thanks to having more join clauses to apply. In add_path, we do the full nine yards of comparing rowcount estimates along with everything else, so that we can discard parameterized paths that don't actually have an advantage. This fixes some issues I'd found with add_path rejecting parameterized paths on the grounds that they were more expensive than not-parameterized ones, even though they yielded many fewer rows and hence would be cheaper once subsequent joining was considered. To make the same-rowcounts assumption valid, we have to require that any parameterized path enforce *all* join clauses that could be obtained from the particular set of outer rels, even if not all of them are useful for indexing. This is required at both base scans and joins. It's a good thing anyway since the net impact is that join quals are checked at the lowest practical level in the join tree. Hence, discard the original rather ad-hoc mechanism for choosing parameterization joinquals, and build a better one that has a more principled rule for when clauses can be moved. The original rule was actually buggy anyway for lack of knowledge about which relations are part of an outer join's outer side; getting this right requires adding an outer_relids field to RestrictInfo.
* Use parameterized paths to generate inner indexscans more flexibly.Tom Lane2012-01-27
| | | | | | | | | | | | | | | | | | | This patch fixes the planner so that it can generate nestloop-with- inner-indexscan plans even with one or more levels of joining between the indexscan and the nestloop join that is supplying the parameter. The executor was fixed to handle such cases some time ago, but the planner was not ready. This should improve our plans in many situations where join ordering restrictions formerly forced complete table scans. There is probably a fair amount of tuning work yet to be done, because of various heuristics that have been added to limit the number of parameterized paths considered. However, we are not going to find out what needs to be adjusted until the code gets some real-world use, so it's time to get it in there where it can be tested easily. Note API change for index AM amcostestimate functions. I'm not aware of any non-core index AMs, but if there are any, they will need minor adjustments.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* pgindent run before PG 9.1 beta 1.Bruce Momjian2011-04-10
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Invert and rename flag variable to improve code readability.Tom Lane2010-12-31
| | | | No change in functionality. Per discussion with Robert.
* Support RIGHT and FULL OUTER JOIN in hash joins.Tom Lane2010-12-30
| | | | | | | | | | | | | | | | | | | | | | This is advantageous first because it allows us to hash the smaller table regardless of the outer-join type, and second because hash join can be more flexible than merge join in dealing with arbitrary join quals in a FULL join. For merge join all the join quals have to be mergejoinable, but hash join will work so long as there's at least one hashjoinable qual --- the others can be any condition. (This is true essentially because we don't keep per-inner-tuple match flags in merge join, while hash join can do so.) To do this, we need a has-it-been-matched flag for each tuple in the hashtable, not just one for the current outer tuple. The key idea that makes this practical is that we can store the match flag in the tuple's infomask, since there are lots of bits there that are of no interest for a MinimalTuple. So we aren't increasing the size of the hashtable at all for the feature. To write this without turning the hash code into even more of a pile of spaghetti than it already was, I rewrote ExecHashJoin in a state-machine style, similar to ExecMergeJoin. Other than that decision, it was pretty straightforward.
* Avoid creation of useless EquivalenceClasses during planning.Tom Lane2010-10-29
| | | | | | | | | | | | | | | | | | | | | | Zoltan Boszormenyi exhibited a test case in which planning time was dominated by construction of EquivalenceClasses and PathKeys that had no actual relevance to the query (and in fact got discarded immediately). This happened because we generated PathKeys describing the sort ordering of every index on every table in the query, and only after that checked to see if the sort ordering was relevant. The EC/PK construction code is O(N^2) in the number of ECs, which is all right for the intended number of such objects, but it gets out of hand if there are ECs for lots of irrelevant indexes. To fix, twiddle the handling of mergeclauses a little bit to ensure that every interesting EC is created before we begin path generation. (This doesn't cost anything --- in fact I think it's a bit cheaper than before --- since we always eventually created those ECs anyway.) Then, if an index column can't be found in any pre-existing EC, we know that that sort ordering is irrelevant for the query. Instead of creating a useless EC, we can just not build a pathkey for the index column in the first place. The index will still be considered if it's useful for non-order-related reasons, but we will think of its output as unsorted.
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Add an 'enable_material' GUC.Robert Haas2010-04-19
| | | | | | | | | | | The logic for determining whether to materialize has been significantly overhauled for 9.0. In case there should be any doubt about whether materialization is a win in any particular case, this should provide a convenient way of seeing what happens without it; but even with enable_material turned off, we still materialize in cases where it is required for correctness. Thanks to Tom Lane for the review.
* Rework join-removal logic as per recent discussion. In particular thisTom Lane2010-03-28
| | | | | fixes things so that it works for cases where nested removals are possible. The overhead of the optimization should be significantly less, as well.
* Fix an oversight in join-removal optimization: we have to check not only forTom Lane2010-03-22
| | | | | plain Vars that are generated in the inner rel and used above the join, but also for PlaceHolderVars. Per report from Oleg K.
* pgindent run for 9.0Bruce Momjian2010-02-26
|
* Add support for doing FULL JOIN ON FALSE. While this is really a ratherTom Lane2010-01-05
| | | | | | | | | | peculiar variant of UNION ALL, and so wouldn't likely get written directly as-is, it's possible for it to arise as a result of simplification of less-obviously-silly queries. In particular, now that we can do flattening of subqueries that have constant outputs and are underneath an outer join, it's possible for the case to result from simplification of queries of the type exhibited in bug #5263. Back-patch to 8.4 to avoid a functionality regression for this type of query.
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Fix brain fade in join-removal patch: a pushed-down clause in the outer join'sTom Lane2009-12-25
| | | | | restrict list is not just something to ignore, it's actually grounds to abandon the optimization entirely. Per bug #5255 from Matteo Beccati.
* Rename new subroutine, per discussion with Robert Haas.Tom Lane2009-09-19
|
* Marginal code cleanup in joinpath.c: factor out clause variable-membershipTom Lane2009-09-18
| | | | | tests into a small common subroutine, and eliminate an unnecessary difference in the order in which conditions are tested. Per a comment from Robert Haas.
* Implement "join removal" for cases where the inner side of a left joinTom Lane2009-09-17
| | | | | | | | | | | | | | is unique and is not referenced above the join. In this case the inner side doesn't affect the query result and can be thrown away entirely. Although perhaps nobody would ever write such a thing by hand, it's a reasonably common case in machine-generated SQL. The current implementation only recognizes the case where the inner side is a simple relation with a unique index matching the query conditions. This is enough for the use-cases that have been shown so far, but we might want to try to handle other cases later. Robert Haas, somewhat rewritten by Tom
* Rewrite the planner's handling of materialized plan types so that there isTom Lane2009-09-12
| | | | | | | | | | | | | | | | an explicit model of rescan costs being different from first-time costs. The costing of Material nodes in particular now has some visible relationship to the actual runtime behavior, where before it was essentially fantasy. This also fixes up a couple of places where different materialized plan types were treated differently for no very good reason (probably just oversights). A couple of the regression tests are affected, because the planner now chooses to put the other relation on the inside of a nestloop-with-materialize. So far as I can see both changes are sane, and the planner is now more consistently following the expectation that it should prefer to materialize the smaller of two relations. Per a recent discussion with Robert Haas.
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Fix an old corner-case error in match_unsorted_outer(): don't considerTom Lane2009-02-05
| | | | | | | | | | | | the cheapest-total inner path as a new candidate while truncating the sort key list, if it already matched the full sort key list. This is too much of a corner case to be worth back-patching, since it's unusual for the cheapest total path to be sorted, and anyway no real harm is done (except in JOIN_SEMI/ANTI cases where cost_mergejoin is a bit broken at the moment). But it wasn't behaving as intended, so fix it. Noted while examining a test case from Kevin Grittner. This error doesn't explain his issue, but it does explain why "set enable_seqscan = off" seemed to reproduce it for me.
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Switch the planner over to treating qualifications of a JOIN_SEMI join asTom Lane2008-11-22
| | | | | | | | | | | | | | | | | | | though it is an inner rather than outer join type. This essentially means that we don't bother to separate "pushed down" qual conditions from actual join quals at a semijoin plan node; which is okay because the restrictions of SQL syntax make it impossible to have a pushed-down qual that references the inner side of a semijoin. This allows noticeably better optimization of IN/EXISTS cases than we had before, since the equivalence-class machinery can now use those quals. Also fix a couple of other mistakes that had essentially disabled the ability to unique-ify the inner relation and then join it to just a subset of the left-hand relations. An example case using the regression database is select * from tenk1 a, tenk1 b where (a.unique1,b.unique2) in (select unique1,unique2 from tenk1 c); which is planned reasonably well by 8.3 and earlier but had been forcing a cartesian join of a/b in CVS HEAD.
* Implement SQL-standard WITH clauses, including WITH RECURSIVE.Tom Lane2008-10-04
| | | | | | | | | | | | | There are some unimplemented aspects: recursive queries must use UNION ALL (should allow UNION too), and we don't have SEARCH or CYCLE clauses. These might or might not get done for 8.4, but even without them it's a pretty useful feature. There are also a couple of small loose ends and definitional quibbles, which I'll send a memo about to pgsql-hackers shortly. But let's land the patch now so we can get on with other development. Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
* Implement SEMI and ANTI joins in the planner and executor. (Semijoins replaceTom Lane2008-08-14
| | | | | | | | | | | | | | the old JOIN_IN code, but antijoins are new functionality.) Teach the planner to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti joins respectively. Also, LEFT JOINs with suitable upper-level IS NULL filters are recognized as being anti joins. Unify the InClauseInfo and OuterJoinInfo infrastructure into "SpecialJoinInfo". With that change, it becomes possible to associate a SpecialJoinInfo with every join attempt, which permits some cleanup of join selectivity estimation. That needs to be taken much further than this patch does, but the next step is to change the API for oprjoin selectivity functions, which seems like material for a separate patch. So for the moment the output size estimates for semi and especially anti joins are quite bogus.
* When a relation has been proven empty by constraint exclusion, propagate thatTom Lane2008-03-24
| | | | | | | | knowledge up through any joins it participates in. We were doing that already in some special cases but not in the general case. Also, defend against zero row estimates for the input relations in cost_mergejoin --- this fix may have eliminated the only scenario in which that can happen, but be safe. Per report from Alex Solovey.
* Fix some planner issues found while investigating Kevin Grittner's reportTom Lane2008-01-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of poorer planning in 8.3 than 8.2: 1. After pushing a constant across an outer join --- ie, given "a LEFT JOIN b ON (a.x = b.y) WHERE a.x = 42", we can deduce that b.y is sort of equal to 42, in the sense that we needn't fetch any b rows where it isn't 42 --- loop to see if any additional deductions can be made. Previous releases did that by recursing, but I had mistakenly thought that this was no longer necessary given the EquivalenceClass machinery. 2. Allow pushing constants across outer join conditions even if the condition is outerjoin_delayed due to a lower outer join. This is safe as long as the condition is strict and we re-test it at the upper join. 3. Keep the outer-join clause even if we successfully push a constant across it. This is *necessary* in the outerjoin_delayed case, but even in the simple case, it seems better to do this to ensure that the join search order heuristics will consider the join as reasonable to make. Mark such a clause as having selectivity 1.0, though, since it's not going to eliminate very many rows after application of the constant condition. 4. Tweak have_relevant_eclass_joinclause to report that two relations are joinable when they have vars that are equated to the same constant. We won't actually generate any joinclause from such an EquivalenceClass, but again it seems that in such a case it's a good idea to consider the join as worth costing out. 5. Fix a bug in select_mergejoin_clauses that was exposed by these changes: we have to reject candidate mergejoin clauses if either side was equated to a constant, because we can't construct a canonical pathkey list for such a clause. This is an implementation restriction that might be worth fixing someday, but it doesn't seem critical to get it done for 8.3.
* Update copyrights in source tree to 2008.Bruce Momjian2008-01-01
|
* pgindent run for 8.3.Bruce Momjian2007-11-15
|
* Fix best_inner_indexscan to return both the cheapest-total-cost andTom Lane2007-05-22
| | | | | | | | | | | | cheapest-startup-cost innerjoin indexscans, and make joinpath.c consider both of these (when different) as the inside of a nestloop join. The original design was based on the assumption that indexscan paths always have negligible startup cost, and so total cost is the only important figure of merit; an assumption that's obviously broken by bitmap indexscans. This oversight could lead to choosing poor plans in cases where fast-start behavior is more important than total cost, such as LIMIT and IN queries. 8.1-vintage brain fade exposed by an example from Chuck D.
* Refactor planner's pathkeys data structure to create a separate, explicitTom Lane2007-01-20
| | | | | | | | | | | | | | representation of equivalence classes of variables. This is an extensive rewrite, but it brings a number of benefits: * planner no longer fails in the presence of "incomplete" operator families that don't offer operators for every possible combination of datatypes. * avoid generating and then discarding redundant equality clauses. * remove bogus assumption that derived equalities always use operators named "=". * mergejoins can work with a variety of sort orders (e.g., descending) now, instead of tying each mergejoinable operator to exactly one sort order. * better recognition of redundant sort columns. * can make use of equalities appearing underneath an outer join.
* Change the planner-to-executor API so that the planner tells the executorTom Lane2007-01-10
| | | | | | | | | | | | | | | | which comparison operators to use for plan nodes involving tuple comparison (Agg, Group, Unique, SetOp). Formerly the executor looked up the default equality operator for the datatype, which was really pretty shaky, since it's possible that the data being fed to the node is sorted according to some nondefault operator class that could have an incompatible idea of equality. The planner knows what it has sorted by and therefore can provide the right equality operator to use. Also, this change moves a couple of catalog lookups out of the executor and into the planner, which should help startup time for pre-planned queries by some small amount. Modify the planner to remove some other cavalier assumptions about always being able to use the default operators. Also add "nulls first/last" info to the Plan node for a mergejoin --- neither the executor nor the planner can cope yet, but at least the API is in place.
* Update CVS HEAD for 2007 copyright. Back branches are typically notBruce Momjian2007-01-05
| | | | back-stamped for this.
* Restructure operator classes to allow improved handling of cross-data-typeTom Lane2006-12-23
| | | | | | | | | | | | | | | | cases. Operator classes now exist within "operator families". While most families are equivalent to a single class, related classes can be grouped into one family to represent the fact that they are semantically compatible. Cross-type operators are now naturally adjunct parts of a family, without having to wedge them into a particular opclass as we had done originally. This commit restructures the catalogs and cleans up enough of the fallout so that everything still works at least as well as before, but most of the work needed to actually improve the planner's behavior will come later. Also, there are not yet CREATE/DROP/ALTER OPERATOR FAMILY commands; the only way to create a new family right now is to allow CREATE OPERATOR CLASS to make one by default. I owe some more documentation work, too. But that can all be done in smaller pieces once this infrastructure is in place.
* pgindent run for 8.2.Bruce Momjian2006-10-04
|
* Fix an oversight in mergejoin planning: the planner would reject aTom Lane2006-08-17
| | | | | | | | | mergejoin possibility where the inner rel was less well sorted than the outer (ie, it matches some but not all of the merge clauses that can work with the outer), if the inner path in question is also the overall cheapest path for its rel. This is an old bug, but I'm not sure it's worth back-patching, because it's such a corner case. Noted while investigating a test case from Peter Hardman.
* Remove 576 references of include files that were not needed.Bruce Momjian2006-07-14
|
* Make the planner estimate costs for nestloop inner indexscans on the basisTom Lane2006-06-06
| | | | | | | | | | | | | | | | | | | | | that the Mackert-Lohmann formula applies across all the repetitions of the nestloop, not just each scan independently. We use the M-L formula to estimate the number of pages fetched from the index as well as from the table; that isn't what it was designed for, but it seems reasonably applicable anyway. This makes large numbers of repetitions look much cheaper than before, which accords with many reports we've received of overestimation of the cost of a nestloop. Also, change the index access cost model to charge random_page_cost per index leaf page touched, while explicitly not counting anything for access to metapage or upper tree pages. This may all need tweaking after we get some field experience, but in simple tests it seems to be giving saner results than before. The main thing is to get the infrastructure in place to let cost_index() and amcostestimate functions take repeated scans into account at all. Per my recent proposal. Note: this patch changes pg_proc.h, but I did not force initdb because the changes are basically cosmetic --- the system does not look into pg_proc to decide how to call an index amcostestimate function, and there's no way to call such a function from SQL at all.
* Update copyright for 2006. Update scripts.Bruce Momjian2006-03-05
|
* Improve my initial, rather hacky implementation of joins to appendTom Lane2006-02-05
| | | | | | | | relations: fix the executor so that we can have an Append plan on the inside of a nestloop and still pass down outer index keys to index scans within the Append, then generate such plans as if they were regular inner indexscans. This avoids the need to evaluate the outer relation multiple times.
* Fix constraint exclusion to work in inherited UPDATE/DELETE queriesTom Lane2006-02-04
| | | | | | | | | ... in fact, it will be applied now in any query whatsoever. I'm still a bit concerned about the cycles that might be expended in failed proof attempts, but given that CE is turned off by default, it's the user's choice whether to expend those cycles or not. (Possibly we should change the simple bool constraint_exclusion parameter to something more fine-grained?)
* Teach planner to convert simple UNION ALL subqueries into append relations,Tom Lane2006-02-03
| | | | | | | | | thereby sharing code with the inheritance case. This puts the UNION-ALL-view approach to partitioned tables on par with inheritance, so far as constraint exclusion is concerned: it works either way. (Still need to update the docs to say so.) The definition of "simple UNION ALL" is a little simpler than I would like --- basically the union arms can only be SELECT * FROM foo --- but it's good enough for partitioned-table cases.
* Restructure planner's handling of inheritance. Rather than processingTom Lane2006-01-31
| | | | | | | | | | | | | inheritance trees on-the-fly, which pretty well constrained us to considering only one way of planning inheritance, expand inheritance sets during the planner prep phase, and build a side data structure that can be consulted later to find which RTEs are members of which inheritance sets. As proof of concept, use the data structure to plan joins against inheritance sets more efficiently: we can now use indexes on the set members in inner-indexscan joins. (The generated plans could be improved further, but it'll take some executor changes.) This data structure will also support handling UNION ALL subqueries in the same way as inheritance sets, but that aspect of it isn't finished yet.
* Re-run pgindent, fixing a problem where comment lines after a blankBruce Momjian2005-11-22
| | | | | | | | | comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.
* Fix longstanding bug that would sometimes let the planner generate a bad planTom Lane2005-10-25
| | | | | | | | | | for an outer join; symptom is bogus error "RIGHT JOIN is only supported with merge-joinable join conditions". Problem was that select_mergejoin_clauses did its tests in the wrong order. We need to force left join not right join for a merge join when there are non-mergeable join clauses; but the test for this only accounted for mergejoinability of the clause operator, and not whether the left and right Vars were of the proper relations. Per report from Jean-Pierre Pelletier.
* Standard pgindent run for 8.1.Bruce Momjian2005-10-15
|
* Remove planner's private fields from Query struct, and put them intoTom Lane2005-06-05
| | | | | | | | a new PlannerInfo struct, which is passed around instead of the bare Query in all the planning code. This commit is essentially just a code-beautification exercise, but it does open the door to making larger changes to the planner data structures without having to muck with the widely-known Query struct.