postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Further twiddling of nodeHash.c hashtable sizing calculation.	Tom Lane	2015-10-04
\| \| \| \| \| \| \| \| \| \| \|	On reflection, the submitted patch didn't really work to prevent the request size from exceeding MaxAllocSize, because of the fact that we'd happily round nbuckets up to the next power of 2 after we'd limited it to max_pointers. The simplest way to enforce the limit correctly is to round max_pointers down to a power of 2 when it isn't one already. (Note that the constraint to INT_MAX / 2, if it were doing anything useful at all, is properly applied after that.)
*	Fix possible "invalid memory alloc request size" failure in nodeHash.c.	Tom Lane	2015-10-04
\| \| \| \| \| \| \| \| \| \|	Limit the size of the hashtable pointer array to not more than MaxAllocSize. We've seen reports of failures due to this in HEAD/9.5, and it seems possible in older branches as well. The change in NTUP_PER_BUCKET in 9.5 may have made the problem more likely, but surely it didn't introduce it. Tomas Vondra, slightly modified by me
*	Avoid O(N^2) behavior when enlarging SPI tuple table in spi_printtup().	Tom Lane	2015-08-21
\| \| \| \| \| \| \| \| \| \| \| \| \|	For no obvious reason, spi_printtup() was coded to enlarge the tuple pointer table by just 256 slots at a time, rather than doubling the size at each reallocation, as is our usual habit. For very large SPI results, this makes for O(N^2) time spent in repalloc(), which of course soon comes to dominate the runtime. Use the standard doubling approach instead. This is a longstanding performance bug, so back-patch to all active branches. Neil Conway
*	Avoid some zero-divide hazards in the planner.	Tom Lane	2015-07-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Although I think on all modern machines floating division by zero results in Infinity not SIGFPE, we still don't want infinities running around in the planner's costing estimates; too much risk of that leading to insane behavior. grouping_planner() failed to consider the possibility that final_rel might be known dummy and hence have zero rowcount. (I wonder if it would be better to set a rows estimate of 1 for dummy relations? But at least in the back branches, changing this convention seems like a bad idea, so I'll leave that for another day.) Make certain that get_variable_numdistinct() produces a nonzero result. The case that can be shown to be broken is with stadistinct < 0.0 and small ntuples; we did not prevent the result from rounding to zero. For good luck I applied clamp_row_est() to all the nonconstant return values. In ExecChooseHashTableSize(), Assert that we compute positive nbuckets and nbatch. I know of no reason to think this isn't the case, but it seems like a good safety check. Per reports from Piotr Stefaniak. Back-patch to all active branches.
*	Fix ExecOpenScanRelation to take a lock on a ROW_MARK_COPY relation.	Tom Lane	2015-03-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ExecOpenScanRelation assumed that any relation listed in the ExecRowMark list has been locked by InitPlan; but this is not true if the rel's markType is ROW_MARK_COPY, which is possible if it's a foreign table. In most (possibly all) cases, failure to acquire a lock here isn't really problematic because the parser, planner, or plancache would have taken the appropriate lock already. In principle though it might leave us vulnerable to working with a relation that we hold no lock on, and in any case if the executor isn't depending on previously-taken locks otherwise then it should not do so for ROW_MARK_COPY relations. Noted by Etsuro Fujita. Back-patch to all active versions, since the inconsistency has been there a long time. (It's almost certainly irrelevant in 9.0, since that predates foreign tables, but the code's still wrong on its own terms.)
*	Ensure tableoid reads correctly in EvalPlanQual-manufactured tuples.	Tom Lane	2015-03-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ROW_MARK_COPY path in EvalPlanQualFetchRowMarks() was just setting tableoid to InvalidOid, I think on the assumption that the referenced RTE must be a subquery or other case without a meaningful OID. However, foreign tables also use this code path, and they do have meaningful table OIDs; so failure to set the tuple field can lead to user-visible misbehavior. Fix that by fetching the appropriate OID from the range table. There's still an issue about whether CTID can ever have a meaningful value in this case; at least with postgres_fdw foreign tables, it does. But that is a different problem that seems to require a significantly different patch --- it's debatable whether postgres_fdw really wants to use this code path at all. Simplified version of a patch by Etsuro Fujita, who also noted the problem to begin with. The issue can be demonstrated in all versions having FDWs, so back-patch to 9.1.
*	Minor cleanup of column-level priv fix	Stephen Frost	2015-02-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 9406884af19e2620a14059e64d4eb6ab430ab328 cleaned up column-privilege related leaks in various error-message paths, but ended up including a few more things than it should have in the back branches. Specifically, there's no need for the GetModifiedColumns macro in execMain.c as 9.1 and older didn't include the row in check constraint violations. Further, the regression tests added to check those cases aren't necessary. This patch removes the GetModifiedColumns macro from execMain.c, removes the comment which was added to trigger.c related to the duplicate macro definition, and removes the check-constraint-related regression tests. Pointed out by Robert. Back-patched to 9.1 and 9.0.
*	Fix column-privilege leak in error-message paths	Stephen Frost	2015-01-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While building error messages to return to the user, BuildIndexValueDescription and ri_ReportViolation would happily include the entire key or entire row in the result returned to the user, even if the user didn't have access to view all of the columns being included. Instead, include only those columns which the user is providing or which the user has select rights on. If the user does not have any rights to view the table or any of the columns involved then no detail is provided and a NULL value is returned from BuildIndexValueDescription. Note that, for key cases, the user must have access to all of the columns for the key to be shown; a partial key will not be returned. Back-patch all the way, as column-level privileges are now in all supported versions. This has been assigned CVE-2014-8161, but since the issue and the patch have already been publicized on pgsql-hackers, there's no point in trying to hide this commit.
*	Fix use-of-already-freed-memory problem in EvalPlanQual processing.	Tom Lane	2015-01-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Up to now, the "child" executor state trees generated for EvalPlanQual rechecks have simply shared the ResultRelInfo arrays used for the original execution tree. However, this leads to dangling-pointer problems, because ExecInitModifyTable() is all too willing to scribble on some fields of the ResultRelInfo(s) even when it's being run in one of those child trees. This trashes those fields from the perspective of the parent tree, because even if the generated subtree is logically identical to what was in use in the parent, it's in a memory context that will go away when we're done with the child state tree. We do however want to share information in the direction from the parent down to the children; in particular, fields such as es_instrument must be shared or we'll lose the stats arising from execution of the children. So the simplest fix is to make a copy of the parent's ResultRelInfo array, but not copy any fields back at end of child execution. Per report from Manuel Kniep. The added isolation test is based on his example. In an unpatched memory-clobber-enabled build it will reliably fail with "ctid is NULL" errors in all branches back to 9.1, as a consequence of junkfilter->jf_junkAttNo being overwritten with $7f7f. This test cannot be run as-is before that for lack of WITH syntax; but I have no doubt that some variant of this problem can arise in older branches, so apply the code change all the way back.
*	Fix corner case where SELECT FOR UPDATE could return a row twice.	Tom Lane	2014-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In READ COMMITTED mode, if a SELECT FOR UPDATE discovers it has to redo WHERE-clause checking on rows that have been updated since the SELECT's snapshot, it invokes EvalPlanQual processing to do that. If this first occurs within a non-first child table of an inheritance tree, the previous coding could accidentally re-return a matching row from an earlier, already-scanned child table. (And, to add insult to injury, I think this could make it miss returning a row that should have been returned, if the updated row that this happens on should still have passed the WHERE qual.) Per report from Kyotaro Horiguchi; the added isolation test is based on his test case. This has been broken for quite awhile, so back-patch to all supported branches.
*	Fix bug with whole-row references to append subplans.	Tom Lane	2014-07-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ExecEvalWholeRowVar incorrectly supposed that it could "bless" the source TupleTableSlot just once per query. But if the input is coming from an Append (or, perhaps, other cases?) more than one slot might be returned over the query run. This led to "record type has not been registered" errors when a composite datum was extracted from a non-blessed slot. This bug has been there a long time; I guess it escaped notice because when dealing with subqueries the planner tends to expand whole-row Vars into RowExprs, which don't have the same problem. It is possible to trigger the problem in all active branches, though, as illustrated by the added regression test.
*	Avoid leaking memory while evaluating arguments for a table function.	Tom Lane	2014-06-19
\| \| \| \| \| \| \| \| \| \| \| \| \|	ExecMakeTableFunctionResult evaluated the arguments for a function-in-FROM in the query-lifespan memory context. This is insignificant in simple cases where the function relation is scanned only once; but if the function is in a sub-SELECT or is on the inside of a nested loop, any memory consumed during argument evaluation can add up quickly. (The potential for trouble here had been foreseen long ago, per existing comments; but we'd not previously seen a complaint from the field about it.) To fix, create an additional temporary context just for this purpose. Per an example from MauMau. Back-patch to all active branches.
*	Remove tabs after spaces in C comments	Bruce Momjian	2014-05-06
\| \| \| \| \| \| \| \| \|	This was not changed in HEAD, but will be done later as part of a pgindent run. Future pgindent runs will also do this. Report by Tom Lane Backpatch through all supported branches, but not HEAD
*	Fix failure to detoast fields in composite elements of structured types.	Tom Lane	2014-05-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have an array of records stored on disk, the individual record fields cannot contain out-of-line TOAST pointers: the tuptoaster.c mechanisms are only prepared to deal with TOAST pointers appearing in top-level fields of a stored row. The same applies for ranges over composite types, nested composites, etc. However, the existing code only took care of expanding sub-field TOAST pointers for the case of nested composites, not for other structured types containing composites. For example, given a command such as UPDATE tab SET arraycol = ARRAY[(ROW(x,42)::mycompositetype] ... where x is a direct reference to a field of an on-disk tuple, if that field is long enough to be toasted out-of-line then the TOAST pointer would be inserted as-is into the array column. If the source record for x is later deleted, the array field value would become a dangling pointer, leading to errors along the line of "missing chunk number 0 for toast value ..." when the value is referenced. A reproducible test case for this was provided by Jan Pecek, but it seems likely that some of the "missing chunk number" reports we've heard in the past were caused by similar issues. Code-wise, the problem is that PG_DETOAST_DATUM() is not adequate to produce a self-contained Datum value if the Datum is of composite type. Seen in this light, the problem is not just confined to arrays and ranges, but could also affect some other places where detoasting is done in that way, for example form_index_tuple(). I tried teaching the array code to apply toast_flatten_tuple_attribute() along with PG_DETOAST_DATUM() when the array element type is composite, but this was messy and imposed extra cache lookup costs whether or not any TOAST pointers were present, indeed sometimes when the array element type isn't even composite (since sometimes it takes a typcache lookup to find that out). The idea of extending that approach to all the places that currently use PG_DETOAST_DATUM() wasn't attractive at all. This patch instead solves the problem by decreeing that composite Datum values must not contain any out-of-line TOAST pointers in the first place; that is, we expand out-of-line fields at the point of constructing a composite Datum, not at the point where we're about to insert it into a larger tuple. This rule is applied only to true composite Datums, not to tuples that are being passed around the system as tuples, so it's not as invasive as it might sound at first. With this approach, the amount of code that has to be touched for a full solution is greatly reduced, and added cache lookup costs are avoided except when there actually is a TOAST pointer that needs to be inlined. The main drawback of this approach is that we might sometimes dereference a TOAST pointer that will never actually be used by the query, imposing a rather large cost that wasn't there before. On the other side of the coin, if the field value is used multiple times then we'll come out ahead by avoiding repeat detoastings. Experimentation suggests that common SQL coding patterns are unaffected either way, though. Applications that are very negatively affected could be advised to modify their code to not fetch columns they won't be using. In future, we might consider reverting this solution in favor of detoasting only at the point where data is about to be stored to disk, using some method that can drill down into multiple levels of nested structured types. That will require defining new APIs for structured types, though, so it doesn't seem feasible as a back-patchable fix. Note that this patch changes HeapTupleGetDatum() from a macro to a function call; this means that any third-party code using that macro will not get protection against creating TOAST-pointer-containing Datums until it's recompiled. The same applies to any uses of PG_RETURN_HEAPTUPLEHEADER(). It seems likely that this is not a big problem in practice: most of the tuple-returning functions in core and contrib produce outputs that could not possibly be toasted anyway, and the same probably holds for third-party extensions. This bug has existed since TOAST was invented, so back-patch to all supported branches.
*	Fix "cannot accept a set" error when only some arms of a CASE return a set.	Tom Lane	2014-01-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit c1352052ef1d4eeb2eb1d822a207ddc2d106cb13, I implemented an optimization that assumed that a function's argument expressions would either always return a set (ie multiple rows), or always not. This is wrong however: we allow CASE expressions in which some arms return a set of some type and others just return a scalar of that type. There may be other examples as well. To fix, replace the run-time test of whether an argument returned a set with a static precheck (expression_returns_set). This adds a little bit of query startup overhead, but it seems barely measurable. Per bug #8228 from David Johnston. This has been broken since 8.0, so patch all supported branches.
*	Fix race condition in DELETE RETURNING.	Tom Lane	2013-03-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When RETURNING is specified, ExecDelete would return a virtual-tuple slot that could contain pointers into an already-unpinned disk buffer. Another process could change the buffer contents before we get around to using the data, resulting in garbage results or even a crash. This seems of fairly low probability, which may explain why there are no known field reports of the problem, but it's definitely possible. Fix by forcing the result slot to be "materialized" before we release pin on the disk buffer. Back-patch to 9.0; in earlier branches there is no bug because ExecProcessReturning sent the tuple to the destination immediately. Also, this is already fixed in HEAD as part of the writable-foreign-tables patch (where the fix is necessary for DELETE RETURNING to work at all with postgres_fdw).
*	Fix SPI documentation for new handling of ExecutorRun's count parameter.	Tom Lane	2013-01-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since 9.0, the count parameter has only limited the number of tuples actually returned by the executor. It doesn't affect the behavior of INSERT/UPDATE/DELETE unless RETURNING is specified, because without RETURNING, the ModifyTable plan node doesn't return control to execMain.c for each tuple. And we only check the limit at the top level. While this behavioral change was unintentional at the time, discussion of bug #6572 led us to the conclusion that we prefer the new behavior anyway, and so we should just adjust the docs to match rather than change the code. Accordingly, do that. Back-patch as far as 9.0 so that the docs match the code in each branch.
*	Add defenses against integer overflow in dynahash numbuckets calculations.	Tom Lane	2012-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The dynahash code requires the number of buckets in a hash table to fit in an int; but since we calculate the desired hash table size dynamically, there are various scenarios where we might calculate too large a value. The resulting overflow can lead to infinite loops, division-by-zero crashes, etc. I (tgl) had previously installed some defenses against that in commit 299d1716525c659f0e02840e31fbe4dea3, but that covered only one call path. Moreover it worked by limiting the request size to work_mem, but in a 64-bit machine it's possible to set work_mem high enough that the problem appears anyway. So let's fix the problem at the root by installing limits in the dynahash.c functions themselves. Trouble report and patch by Jeff Davis.
*	Fix assorted bugs in CREATE INDEX CONCURRENTLY.	Tom Lane	2012-11-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch changes CREATE INDEX CONCURRENTLY so that the pg_index flag changes it makes without exclusive lock on the index are made via heap_inplace_update() rather than a normal transactional update. The latter is not very safe because moving the pg_index tuple could result in concurrent SnapshotNow scans finding it twice or not at all, thus possibly resulting in index corruption. In addition, fix various places in the code that ought to check to make sure that the indexes they are manipulating are valid and/or ready as appropriate. These represent bugs that have existed since 8.2, since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or invalid index behind, and we ought not try to do anything that might fail with such an index. Also fix RelationReloadIndexInfo to ensure it copies all the pg_index columns that are allowed to change after initial creation. Previously we could have been left with stale values of some fields in an index relcache entry. It's not clear whether this actually had any user-visible consequences, but it's at least a bug waiting to happen. This is a subset of a patch already applied in 9.2 and HEAD. Back-patch into all earlier supported branches. Tom Lane and Andres Freund
*	Fix cross-type case in partial row matching for hashed subplans.	Tom Lane	2012-10-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When hashing a subplan like "WHERE (a, b) NOT IN (SELECT x, y FROM ...)", findPartialMatch() attempted to match rows using the hashtable's internal equality operators, which of course are for x and y's datatypes. What we need to use are the potentially cross-type operators for a=x, b=y, etc. Failure to do that leads to wrong answers or even crashes. The scope for problems is limited to cases where we have different types with compatible hash functions (else we'd not be using a hashed subplan), but for example int4 vs int8 can cause the problem. Per bug #7597 from Bo Jensen. This has been wrong since the hashed-subplan code was written, so patch all the way back.
*	Fix rescan logic in nodeCtescan.	Tom Lane	2012-08-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The previous coding essentially assumed that nodes would be rescanned in the same order they were initialized in; or at least that the "leader" of a group of CTEscans would be rescanned before any others were required to execute. Unfortunately, that isn't even a little bit true. It's possible to devise queries in which the leader isn't rescanned until other CTEscans on the same CTE have run to completion, or even in which the leader never gets a rescan call at all. The fix makes the leader specially responsible only for initial creation and final destruction of the tuplestore; rescan resets are now a symmetrically shared responsibility. This means that we might reset the tuplestore multiple times when restarting a plan subtree containing multiple CTEscans; but resetting an already-empty tuplestore is cheap enough that that doesn't seem like a problem. Per report from Adam Mackler; the new regression test cases are based on his example query. Back-patch to 8.4 where CTE scans were introduced.
*	Fix whole-row Var evaluation to cope with resjunk columns (again).	Tom Lane	2012-07-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a whole-row Var is reading the result of a subquery, we need it to ignore any "resjunk" columns that the subquery might have evaluated for GROUP BY or ORDER BY purposes. We've hacked this area before, in commit 68e40998d058c1f6662800a648ff1e1ce5d99cba, but that fix only covered whole-row Vars of named composite types, not those of RECORD type; and it was mighty klugy anyway, since it just assumed without checking that any extra columns in the result must be resjunk. A proper fix requires getting hold of the subquery's targetlist so we can actually see which columns are resjunk (whereupon we can use a JunkFilter to get rid of them). So bite the bullet and add some infrastructure to make that possible. Per report from Andrew Dunstan and additional testing by Merlin Moncure. Back-patch to all supported branches. In 8.3, also back-patch commit 292176a118da6979e5d368a4baf27f26896c99a5, which for some reason I had not done at the time, but it's a prerequisite for this change.
*	Fix memory leak in ARRAY(SELECT ...) subqueries.	Tom Lane	2012-06-21
\| \| \| \| \| \| \| \| \| \| \|	Repeated execution of an uncorrelated ARRAY_SUBLINK sub-select (which I think can only happen if the sub-select is embedded in a larger, correlated subquery) would leak memory for the duration of the query, due to not reclaiming the array generated in the previous execution. Per bug #6698 from Armando Miraglia. Diagnosis and fix idea by Heikki, patch itself by me. This has been like this all along, so back-patch to all supported versions.
*	Don't allow CREATE TABLE AS to put relations in pg_global.	Robert Haas	2012-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \|	This was never intended to be allowed, and is blocked for an ordinary CREATE TABLE, but CREATE TABLE AS slipped through the cracks. This commit won't do anything to fix existing cases where this has loophole has been exploited, but it still seems prudent to lock it down going forward. Back-branch commit only, as this problem has been refactored away on the master branch. Andres Freund
*	Fix handling of data-modifying CTE subplans in EvalPlanQual.	Tom Lane	2012-01-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can't just skip initializing such subplans, because the referencing CTE node will expect to find the subplan available when it initializes. That in turn means that ExecInitModifyTable must allow the case (which actually it needed to do anyway, since there's no guarantee that ModifyTable is exactly at the top of the CTE plan tree). So move the complaint about not being allowed in EvalPlanQual mode to execution instead of initialization. Testing turned up yet another problem, which is that we'd try to re-initialize the result relation's index list, leading to leaks and dangling pointers. Per report from Phil Sorber. Back-patch to 9.1 where data-modifying CTEs were introduced.
*	Make executor's SELECT INTO code save and restore original tuple receiver.	Tom Lane	2012-01-04
\| \| \| \| \| \| \| \| \| \| \| \|	As previously coded, the QueryDesc's dest pointer was left dangling (pointing at an already-freed receiver object) after ExecutorEnd. It's a bit astonishing that it took us this long to notice, and I'm not sure that the known problem case with SQL functions is the only one. Fix it by saving and restoring the original receiver pointer, which seems the most bulletproof way of ensuring any related bugs are also covered. Per bug #6379 from Paul Ramsey. Back-patch to 8.4 where the current handling of SELECT INTO was introduced.
*	Fix handling of PlaceHolderVars in nestloop parameter management.	Tom Lane	2011-11-03
\| \| \| \| \| \| \| \| \| \| \| \| \|	If we use a PlaceHolderVar from the outer relation in an inner indexscan, we need to reference the PlaceHolderVar as such as the value to be passed in from the outer relation. The previous code effectively tried to reconstruct the PHV from its component expression, which doesn't work since (a) the Vars therein aren't necessarily bubbled up far enough, and (b) it would be the wrong semantics anyway because of the possibility that the PHV is supposed to have gone to null at some point before the current join. Point (a) led to "variable not found in subplan target list" planner errors, but point (b) would have led to silently wrong answers. Per report from Roger Niederland.
*	Fix trigger WHEN conditions when both BEFORE and AFTER triggers exist.	Tom Lane	2011-08-21
\| \| \| \| \| \| \| \| \|	Due to tuple-slot mismanagement, evaluation of WHEN conditions for AFTER ROW UPDATE triggers could crash if there had been a BEFORE ROW trigger fired for the same update. Fix by not trying to overload the use of estate->es_trig_tuple_slot. Per report from Yoran Heling. Back-patch to 9.0, when trigger WHEN conditions were introduced.
*	Avoid integer overflow when LIMIT + OFFSET >= 2^63.	Heikki Linnakangas	2011-08-02
\| \| \| \|	This fixes bug #6139 reported by Hitoshi Harada.
*	Fix bugs in relpersistence handling during table creation.	Robert Haas	2011-07-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlike the relistemp field which it replaced, relpersistence must be set correctly quite early during the table creation process, as we rely on it quite early on for a number of purposes, including security checks. Normally, this is set based on whether the user enters CREATE TABLE, CREATE UNLOGGED TABLE, or CREATE TEMPORARY TABLE, but a relation may also be made implicitly temporary by creating it in pg_temp. This patch fixes the handling of that case, and also disables creation of unlogged tables in temporary tablespace (such table indeed skip WAL-logging, but we reject an explicit specification) and creation of relations in the temporary schemas of other sessions (which is not very sensible, and didn't work right anyway). Report by Amit Khandekar.
*	Move the PredicateLockRelation() call from nodeSeqscan.c to heapam.c. It's	Heikki Linnakangas	2011-06-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	more consistent that way, since all the other PredicateLock* calls are made in various heapam.c and index AM functions. The call in nodeSeqscan.c was unnecessarily aggressive anyway, there's no need to try to lock the relation every time a tuple is fetched, it's enough to do it once. This has the user-visible effect that if a seq scan is initialized in the executor, but never executed, we now acquire the predicate lock on the heap relation anyway. We could avoid that by taking the lock on the first heap_getnext() call instead, but it doesn't seem worth the trouble given that it feels more natural to do it in heap_beginscan(). Also, remove the retail PredicateLockTuple() calls from heap_getnext(). In a seqscan, started with heap_begin(), we're holding a whole-relation predicate lock on the heap so there's no need to lock the tuples individually. Kevin Grittner and me
*	Grab predicate locks on matching tuples in a lossy bitmap heap scan.	Heikki Linnakangas	2011-06-29
\| \| \| \| \| \|	Non-lossy case was already handled correctly. Kevin Grittner
*	Remove another no-longer-needed inclusion of predicate.h.	Tom Lane	2011-06-16
\|
*	Make non-MVCC snapshots exempt from predicate locking. Scans with non-MVCC	Heikki Linnakangas	2011-06-15
\| \| \| \| \| \| \| \|	snapshots, like in REINDEX, are basically non-transactional operations. The DDL operation itself might participate in SSI, but there's separate functions for that. Kevin Grittner and Dan Ports, with some changes by me.
*	Pgindent run before 9.1 beta2.	Bruce Momjian	2011-06-09
\|
*	Disallow SELECT FOR UPDATE/SHARE on sequences.	Tom Lane	2011-06-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can't allow this because such an operation stores its transaction XID into the sequence tuple's xmax. Because VACUUM doesn't process sequences (and we don't want it to start doing so), such an xmax value won't get frozen, meaning it will eventually refer to nonexistent pg_clog storage, and even wrap around completely. Since the row lock is ignored by nextval and setval, the usefulness of the operation is highly debatable anyway. Per reports of trouble with pgpool 3.0, which had ill-advisedly started using such commands as a form of locking. In HEAD, also disallow SELECT FOR UPDATE/SHARE on toast tables. Although this does work safely given the current implementation, there seems no good reason to allow it. I refrained from changing that behavior in back branches, however.
*	Allow hash joins to be interrupted while searching hash table for match.	Tom Lane	2011-06-01
\| \| \| \| \| \| \| \| \| \| \|	Per experimentation with a recent example, in which unreasonable amounts of time could elapse before the backend would respond to a query-cancel. This might be something to back-patch, but the patch doesn't apply cleanly because this code was rewritten for 9.1. Given the lack of field complaints I won't bother for now. Cédric Villemain
*	Install defenses against overflow in BuildTupleHashTable().	Tom Lane	2011-05-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The planner can sometimes compute very large values for numGroups, and in cases where we have no alternative to building a hashtable, such a value will get fed directly to BuildTupleHashTable as its nbuckets parameter. There were two ways in which that could go bad. First, BuildTupleHashTable declared the parameter as "int" but most callers were passing "long"s, so on 64-bit machines undetected overflow could occur leading to a bogus negative value. The obvious fix for that is to change the parameter to "long", which is what I've done in HEAD. In the back branches that seems a bit risky, though, since third-party code might be calling this function. So for them, just put in a kluge to treat negative inputs as INT_MAX. Second, hash_create can go nuts with extremely large requested table sizes (notably, my_log2 becomes an infinite loop for inputs larger than LONG_MAX/2). What seems most appropriate to avoid that is to bound the initial table size request to work_mem. This fixes bug #6035 reported by Daniel Schreiber. Although the reported case only occurs back to 8.4 since it involves WITH RECURSIVE, I think it's a good idea to install the defenses in all supported branches.
*	Reset per-tuple memory context between every row in a scan node, even when	Heikki Linnakangas	2011-05-21
\| \| \| \| \| \|	there's no quals or projections. Currently this only matters for foreign scans, as none of the other scan nodes litter the per-tuple memory context when there's no quals or projections.
*	Refactor broken CREATE TABLE IF NOT EXISTS support.	Robert Haas	2011-04-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Per bug #5988, reported by Marko Tiikkaja, and further analyzed by Tom Lane, the previous coding was broken in several respects: even if the target table already existed, a subsequent CREATE TABLE IF NOT EXISTS might try to add additional constraints or sequences-for-serial specified in the new CREATE TABLE statement. In passing, this also fixes a minor information leak: it's no longer possible to figure out whether a schema to which you don't have CREATE access contains a sequence named like "x_y_seq" by attempting to create a table in that schema called "x" with a serial column called "y". Some more refactoring of this code in the future might be warranted, but that will need to wait for a later major release.
*	Make a code-cleanup pass over the collations patch.	Tom Lane	2011-04-22
\| \| \| \| \| \| \|	This patch is almost entirely cosmetic --- mostly cleaning up a lot of neglected comments, and fixing code layout problems in places where the patch made lines too long and then pgindent did weird things with that. I did find a bug-of-omission in equalTupleDescs().
*	Pass collations to functions in FunctionCallInfoData, not FmgrInfo.	Tom Lane	2011-04-12
\| \| \| \| \| \| \| \| \| \| \|	Since collation is effectively an argument, not a property of the function, FmgrInfo is really the wrong place for it; and this becomes critical in cases where a cached FmgrInfo is used for varying purposes that might need different collation settings. Fix by passing it in FunctionCallInfoData instead. In particular this allows a clean fix for bug #5970 (record_cmp not working). This requires touching a bit more code than the original method, but nobody ever thought that collations would not be an invasive patch...
*	Clean up most -Wunused-but-set-variable warnings from gcc 4.6	Peter Eisentraut	2011-04-11
\| \| \| \| \| \|	This warning is new in gcc 4.6 and part of -Wall. This patch cleans up most of the noise, but there are some still warnings that are trickier to remove.
*	pgindent run before PG 9.1 beta 1.	Bruce Momjian	2011-04-10
\|
*	Fix check_exclusion_constraint() to insert correct collations in ScanKeys.	Tom Lane	2011-03-27
\|
*	Clean up cruft around collation initialization for tupdescs and scankeys.	Tom Lane	2011-03-26
\| \| \| \| \|	I found actual bugs in GiST and plpgsql; the rest of this is cosmetic but meant to decrease the odds of future bugs of omission.
*	Pass collation to makeConst() instead of looking it up internally.	Tom Lane	2011-03-25
\| \| \| \| \| \| \| \| \|	In nearly all cases, the caller already knows the correct collation, and in a number of places, the value the caller has handy is more correct than the default for the type would be. (In particular, this patch makes it significantly less likely that eval_const_expressions will result in changing the exposed collation of an expression.) So an internal lookup is both expensive and wrong.
*	Fix handling of collation in SQL-language functions.	Tom Lane	2011-03-24
\| \| \| \| \| \| \| \| \| \|	Ensure that parameter symbols receive collation from the function's resolved input collation, and fix inlining to behave properly. BTW, this commit lays about 90% of the infrastructure needed to support use of argument names in SQL functions. Parsing of parameters is now done via the parser-hook infrastructure ... we'd just need to supply a column-ref hook ...
*	Revise collation derivation method and expression-tree representation.	Tom Lane	2011-03-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	All expression nodes now have an explicit output-collation field, unless they are known to only return a noncollatable data type (such as boolean or record). Also, nodes that can invoke collation-aware functions store a separate field that is the collation value to pass to the function. This avoids confusion that arises when a function has collatable inputs and noncollatable output type, or vice versa. Also, replace the parser's on-the-fly collation assignment method with a post-pass over the completed expression tree. This allows us to use a more complex (and hopefully more nearly spec-compliant) assignment rule without paying for it in extra storage in every expression node. Fix assorted bugs in the planner's handling of collations by making collation one of the defining properties of an EquivalenceClass and by converting CollateExprs into discardable RelabelType nodes during expression preprocessing.
*	Split CollateClause into separate raw and analyzed node types.	Tom Lane	2011-03-11
\| \| \| \| \| \| \| \| \| \| \|	CollateClause is now used only in raw grammar output, and CollateExpr after parse analysis. This is for clarity and to avoid carrying collation names in post-analysis parse trees: that's both wasteful and possibly misleading, since the collation's name could be changed while the parsetree still exists. Also, clean up assorted infelicities and omissions in processing of the node type.