postgresql - postgresql mirror

	Commit message (Collapse)	Author	Age
*	Make pg_shseclabel available in early backend startup	Alvaro Herrera	2016-01-05
\| \| \| \| \| \| \| \| \| \| \| \|	While the in-core authentication mechanism doesn't need to access pg_shseclabel at all, it's reasonable to think that an authentication hook will want to look at the label for the role logging in, or for rows in other catalogs used during the authentication phase of startup. Catalog version bumped, because this changes the "is nailed" status for pg_shseclabel. Author: Adam Brightwell
*	Adjust behavior of row_security GUC to match the docs.	Tom Lane	2016-01-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some time back we agreed that row_security=off should not be a way to bypass RLS entirely, but only a way to get an error if it was being applied. However, the code failed to act that way for table owners. Per discussion, this is a must-fix bug for 9.5.0. Adjust the logic in rls.c to behave as expected; also, modify the error message to be more consistent with the new interpretation. The regression tests need minor corrections as well. Also update the comments about row_security in ddl.sgml to be correct. (The official description of the GUC in config.sgml is already correct.) I failed to resist the temptation to do some other very minor cleanup as well, such as getting rid of a duplicate extern declaration.
*	Fix regrole and regnamespace output functions to do quoting, too.	Tom Lane	2016-01-04
\| \| \| \|	We discussed this but somehow failed to implement it...
*	Fix regrole and regnamespace types to honor quoting like other reg* types.	Tom Lane	2016-01-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Aside from any consistency arguments, this is logically necessary because the I/O functions for these types also handle numeric OID values. Without a quoting rule it is impossible to distinguish numeric OIDs from role or namespace names that happen to contain only digits. Also change the to_regrole and to_regnamespace functions to dequote their arguments. While not logically essential, this seems like a good idea since the other to_reg* functions do it. Anyone who really wants raw lookup of an uninterpreted name can fall back on the time-honored solution of (SELECT oid FROM pg_namespace WHERE nspname = whatever). Report and patch by Jim Nasby, reviewed by Michael Paquier
*	Fix bogus lock release in RemovePolicyById and RemoveRoleFromObjectPolicy.	Tom Lane	2016-01-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Can't release the AccessExclusiveLock on the target table until commit. Otherwise there is a race condition whereby other backends might service our cache invalidation signals before they can actually see the updated catalog rows. Just to add insult to injury, RemovePolicyById was closing the rel (with incorrect lock drop) and then passing the now-dangling rel pointer to CacheInvalidateRelcache. Probably the reason this doesn't fall over on CLOBBER_CACHE buildfarm members is that some outer level of the DROP logic is still holding the rel open ... but it'd have bit us on the arse eventually, no doubt.
*	Guard against null arguments in binary_upgrade_create_empty_extension().	Tom Lane	2016-01-03
\| \| \| \| \| \| \| \| \| \| \| \| \|	The CHECK_IS_BINARY_UPGRADE macro is not sufficient security protection if we're going to dereference pass-by-reference arguments before it. But in any case we really need to explicitly check PG_ARGISNULL for all the arguments of a non-strict function, not only the ones we expect null values for. Oversight in commits 30982be4e5019684e1772dd9170aaa53f5a8e894 and f92fc4c95ddcc25978354a8248d3df22269201bc. Found by Andreas Seltenreich. (The other usages in pg_upgrade_support.c seem safe.)
*	Fix treatment of *lpNumberOfBytesRecvd == 0: that's a completion condition.	Tom Lane	2016-01-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pgwin32_recv() has treated a non-error return of zero bytes from WSARecv() as being a reason to block ever since the current implementation was introduced in commit a4c40f140d23cefb. However, so far as one can tell from Microsoft's documentation, that is just wrong: what it means is graceful connection closure (in stream protocols) or receipt of a zero-length message (in message protocols), and neither case should result in blocking here. The only reason the code worked at all was that control then fell into the retry loop, which did not treat zero bytes specially, so we'd get out after only wasting some cycles. But as of 9.5 we do not normally reach the retry loop and so the bug is exposed, as reported by Shay Rojansky and diagnosed by Andres Freund. Remove the unnecessary test on the byte count, and rearrange the code in the retry loop so that it looks identical to the initial sequence. Back-patch to 9.5. The code is wrong all the way back, AFAICS, but since it's relatively harmless in earlier branches we'll leave it alone.
*	Fix overly-strict assertions in spgtextproc.c.	Tom Lane	2016-01-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spg_text_inner_consistent is capable of reconstructing an empty string to pass down to the next index level; this happens if we have an empty string coming in, no prefix, and a dummy node label. (In practice, what is needed to trigger that is insertion of a whole bunch of empty-string values.) Then, we will arrive at the next level with in->level == 0 and a non-NULL (but zero length) in->reconstructedValue, which is valid but the Assert tests weren't expecting it. Per report from Andreas Seltenreich. This has no impact in non-Assert builds, so should not be a problem in production, but back-patch to all affected branches anyway. In passing, remove a couple of useless variable initializations and shorten the code by not duplicating DatumGetPointer() calls.
*	Update copyright for 2016	Bruce Momjian	2016-01-02
\| \| \| \|	Backpatch certain files through 9.1
*	Cover heap_page_prune_opt()'s cleanup lock tactic in README.	Noah Misch	2016-01-01
\| \| \| \|	Jeff Janes, reviewed by Jim Nasby.
*	Teach flatten_reloptions() to quote option values safely.	Tom Lane	2016-01-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	flatten_reloptions() supposed that it didn't really need to do anything beyond inserting commas between reloption array elements. However, in principle the value of a reloption could be nearly anything, since the grammar allows a quoted string there. Any restrictions on it would come from validity checking appropriate to the particular option, if any. A reloption value that isn't a simple identifier or number could thus lead to dump/reload failures due to syntax errors in CREATE statements issued by pg_dump. We've gotten away with not worrying about this so far with the core-supported reloptions, but extensions might allow reloption values that cause trouble, as in bug #13840 from Kouhei Sutou. To fix, split the reloption array elements explicitly, and then convert any value that doesn't look like a safe identifier to a string literal. (The details of the quoting rule could be debated, but this way is safe and requires little code.) While we're at it, also quote reloption names if they're not safe identifiers; that may not be a likely problem in the field, but we might as well try to be bulletproof here. It's been like this for a long time, so back-patch to all supported branches. Kouhei Sutou, adjusted some by me
*	Add some more defenses against silly estimates to gincostestimate().	Tom Lane	2016-01-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A report from Andy Colson showed that gincostestimate() was not being nearly paranoid enough about whether to believe the statistics it finds in the index metapage. The problem is that the metapage stats (other than the pending-pages count) are only updated by VACUUM, and in the worst case could still reflect the index's original empty state even when it has grown to many entries. We attempted to deal with that by scaling up the stats to match the current index size, but if nEntries is zero then scaling it up still gives zero. Moreover, the proportion of pages that are entry pages vs. data pages vs. pending pages is unlikely to be estimated very well by scaling if the index is now orders of magnitude larger than before. We can improve matters by expanding the use of the rule-of-thumb estimates I introduced in commit 7fb008c5ee59b040: if the index has grown by more than a cutoff amount (here set at 4X growth) since VACUUM, then use the rule-of-thumb numbers instead of scaling. This might not be exactly right but it seems much less likely to produce insane estimates. I also improved both the scaling estimate and the rule-of-thumb estimate to account for numPendingPages, since it's reasonable to expect that that is accurate in any case, and certainly pages that are in the pending list are not either entry or data pages. As a somewhat separate issue, adjust the estimation equations that are concerned with extra fetches for partial-match searches. These equations suppose that a fraction partialEntries / numEntries of the entry and data pages will be visited as a consequence of a partial-match search. Now, it's physically impossible for that fraction to exceed one, but our estimate of partialEntries is mostly bunk, and our estimate of numEntries isn't exactly gospel either, so we could arrive at a silly value. In the example presented by Andy we were coming out with a value of 100, leading to insane cost estimates. Clamp the fraction to one to avoid that. Like the previous patch, back-patch to all supported branches; this problem can be demonstrated in one form or another in all of them.
*	Fix comments about WAL rule "write xlog before data" versus pg_multixact.	Noah Misch	2016-01-01
\| \| \| \| \| \| \| \| \| \| \|	Recovery does not achieve its goal of zeroing all pg_multixact entries whose accompanying WAL records never reached disk. Remove that claim and justify its expendability. Detail the need for TrimMultiXact(), which has little in common with the TrimCLOG() rationale. Merge two tightly-related comments. Stop presenting pg_multixact as specific to heap_lock_tuple(); PostgreSQL 9.3 extended its use to heap_update(). Noticed while investigating a report from Andres Freund.
*	Fix ALTER OPERATOR to update dependencies properly.	Tom Lane	2015-12-31
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix an oversight in commit 321eed5f0f7563a0: replacing an operator's selectivity functions needs to result in a corresponding update in pg_depend. We have a function that can handle that, but it was not called by AlterOperator(). To fix this without enlarging pg_operator.h's #include list beyond what clients can safely include, split off the function definitions into a new file pg_operator_fn.h, similarly to what we've done for some other catalog header files. It's not entirely clear whether any client-side code needs to include pg_operator.h, but it seems prudent to assume that there is some such code somewhere.
*	Dept of second thoughts: the !scan_all exit mustn't increase scanned_pages.	Tom Lane	2015-12-30
\| \| \| \| \| \|	In the extreme edge case where contended pages are the only ones that escape being scanned, the previous commit would have allowed us to think that relfrozenxid could be advanced, which is exactly wrong.
*	Avoid useless truncation attempts during VACUUM.	Tom Lane	2015-12-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VACUUM can skip heap pages altogether when there's a run of consecutive pages that are all-visible according to the visibility map. This causes it to not update its nonempty_pages count, just as if those pages were empty, which means that at the end we will think they are candidates for deletion. Thus, we may take the table's AccessExclusive lock only to find that no pages are really truncatable. This usually causes no real problems on a master server, thanks to the lock being acquired only conditionally; but on hot-standby servers, the same lock must be acquired unconditionally which can result in unnecessary query cancellations. To improve matters, force examination of the table's last page whenever we reach there with a nonempty_pages count that would allow a truncation attempt. If it's not empty, we'll advance nonempty_pages and thereby prevent the truncation attempt. If we are unable to acquire cleanup lock on that page, there's no need to force it, unless we're doing an anti-wraparound vacuum. We can just check for tuples with a shared buffer lock and then give up. (When we are doing an anti-wraparound vacuum, and decide it's okay to skip the page because it contains no freezable tuples, this patch still improves matters because nonempty_pages is properly updated, which it was not before.) Since only the last page is special-cased in this way, we might attempt a truncation that will release many fewer pages than the normal heuristic would suggest; at worst, only one page would be truncated. But that seems all right, because the situation won't repeat during the next vacuum. The real problem with the old logic is that the useless truncation attempt happens every time we vacuum, so long as the state of the last few dozen pages doesn't change. This is a longstanding deficiency, but since the consequences aren't very severe in most scenarios, I'm not going to risk a back-patch. Jeff Janes and Tom Lane
*	Add some comments about division of labor between rewriter and planner.	Tom Lane	2015-12-29
\| \| \| \| \| \|	The rationale for the way targetlist processing is done wasn't clearly stated anywhere, and I for one had forgotten some of the details. Having just painfully re-learned them, add some breadcrumbs for the next person.
*	Put back one copyObject() in rewriteTargetView().	Tom Lane	2015-12-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 6f8cb1e23485bd6d tried to centralize rewriteTargetView's copying of a target view's Query struct. However, it ignored the fact that the jointree->quals field was used twice. This only accidentally failed to fail immediately because the same ChangeVarNodes mutation is applied in both cases, so that we end up with logically identical expression trees for both uses (and, as the code stands, the second ChangeVarNodes call actually does nothing). However, we end up linking physically identical expression trees into both an RTE's securityQuals list and the WithCheckOption list. That's pretty dangerous, mainly because prepsecurity.c is utterly cavalier about further munging such structures without copying them first. There may be no live bug in HEAD as a consequence of the fact that we apply preprocess_expression in between here and prepsecurity.c, and that will make a copy of the tree anyway. Or it may just be that the regression tests happen to not trip over it. (I noticed this only because things fell over pretty badly when I tried to relocate the planner's call of expand_security_quals to before expression preprocessing.) In any case it's very fragile because if anyone tried to make the securityQuals and WithCheckOption trees diverge before we reach preprocess_expression, it would not work. The fact that the current code will preprocess securityQuals and WithCheckOptions lists at completely different times in different query levels does nothing to increase my trust that that can't happen. In view of the fact that 9.5.0 is almost upon us and the aforesaid commit has seen exactly zero field testing, the prudent course is to make an extra copy of the quals so that the behavior is not different from what has been in the field during beta.
*	Rename (new\|old)estCommitTs to (new\|old)estCommitTsXid	Joe Conway	2015-12-28
\| \| \| \| \| \| \| \| \| \| \| \| \|	The variables newestCommitTs and oldestCommitTs sound as if they are timestamps, but in fact they are the transaction Ids that correspond to the newest and oldest timestamps rather than the actual timestamps. Rename these variables to reflect that they are actually xids: to wit newestCommitTsXid and oldestCommitTsXid respectively. Also modify related code in a similar fashion, particularly the user facing output emitted by pg_controldata and pg_resetxlog. Complaint and patch by me, review by Tom Lane and Alvaro Herrera. Backpatch to 9.5 where these variables were first introduced.
*	Include typmod when complaining about inherited column type mismatches.	Tom Lane	2015-12-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MergeAttributes() rejects cases where columns to be merged have the same type but different typmod, which is correct; but the error message it printed didn't show either typmod, which is unhelpful. Changing this requires using format_type_with_typemod() in place of TypeNameToString(), which will have some minor side effects on the way some type names are printed, but on balance this is an improvement: the old code sometimes printed one type according to one set of rules and the other type according to the other set, which could be confusing in its own way. Oddly, there were no regression test cases covering any of this behavior, so add some. Complaint and fix by Amit Langote
*	Fix brin_summarize_new_values() to check index type and ownership.	Tom Lane	2015-12-26
\| \| \| \| \| \| \| \| \| \|	brin_summarize_new_values() did not check that the passed OID was for an index at all, much less that it was a BRIN index, and would fail in obscure ways if it wasn't (possibly damaging data first?). It also lacked any permissions test; by analogy to VACUUM, we should only allow the table's owner to summarize. Noted by Jeff Janes, fix by Michael Paquier and me
*	Read from the same worker repeatedly until it returns no tuple.	Robert Haas	2015-12-23
\| \| \| \| \| \| \| \| \| \| \| \| \|	The original coding read tuples from workers in round-robin fashion, but performance testing shows that it works much better to read enough to empty one queue before moving on to the next. I believe the reason for this is that, with the old approach, we could easily wake up a worker repeatedly to write only one new tuple into the shm_mq each time. With this approach, by the time the process gets scheduled, it has a decent chance of being able to fill the entire buffer in one go. Patch by me. Dilip Kumar helped with performance testing.
*	Change Gather not to use a physical tlist.	Robert Haas	2015-12-23
\| \| \| \| \| \| \|	This should have been part of the original commit, but was missed. Pushing data between processes is expensive, so we definitely want to project away unneeded columns here, just as we do for other nodes like Sort and Hash that care about the volume of data.
*	Remove unnecessary escaping in C character literals	Peter Eisentraut	2015-12-22
\| \| \| \|	'\"' is more commonly written simply as '"'.
*	Allow omitting one or both boundaries in an array slice specifier.	Tom Lane	2015-12-22
\| \| \| \| \| \| \| \| \| \|	Omitted boundaries represent the upper or lower limit of the corresponding array subscript. This allows simpler specification of many common use-cases. (Revised version of commit 9246af6799819847faa33baf441251003acbb8fe) YUriy Zhuravlev
*	Comment improvements for abbreviated keys.	Robert Haas	2015-12-22
\| \| \| \|	Peter Geoghegan and Robert Haas
*	postgres_fdw: Consider requesting sorted data so we can do a merge join.	Robert Haas	2015-12-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When use_remote_estimate is enabled, consider adding ORDER BY to the query we sending to the remote server so that we can use that ordered data for a merge join. Commit f18c944b6137329ac4a6b2dce5745c5dc21a8578 arranges to push down the query pathkeys, which seems like the case mostly likely to be a win, but testing shows this can sometimes win, too. For a regular table, we know which indexes are present and therefore test whether the ordering provided by each such index is useful. Here, we take the opposite approach: guess what orderings would be useful if they could be generated cheaply, and then ask the remote side what those will cost. Ashutosh Bapat, with very substantial cosmetic revisions by me. Also reviewed by Rushabh Lathia.
*	Make viewquery a copy in rewriteTargetView()	Stephen Frost	2015-12-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than expect the Query returned by get_view_query() to be read-only and then copy bits and pieces of it out, simply copy the entire structure when we get it. This addresses an issue where AcquireRewriteLocks, which is called by acquireLocksOnSubLinks(), scribbles on the parsetree passed in, which was actually an entry in relcache, leading to segfaults with certain view definitions. This also future-proofs us a bit for anyone adding more code to this path. The acquireLocksOnSubLinks() was added in commit c3e0ddd40. Back-patch to 9.3 as that commit was.
*	Revert 9246af6799819847faa33baf441251003acbb8fe because	Teodor Sigaev	2015-12-18
\| \| \| \|	I miss too much. Patch is returned to commitfest process.
*	Remove duplicate word.	Robert Haas	2015-12-18
\| \| \| \|	Kyotaro Horiguchi
*	Fix TupleQueueReaderNext not to ignore its nowait argument.	Robert Haas	2015-12-18
\| \| \| \| \| \|	This was a silly goof on my (rhaas's) part. Report and fix by Rushabh Lathia.
*	Fix copy-and-paste error in logical decoding callback.	Robert Haas	2015-12-18
\| \| \| \| \| \| \|	This could result in the error context misidentifying where the error actually occurred. Craig Ringer
*	Fix typo in comment.	Robert Haas	2015-12-18
\| \| \| \|	Amit Langote
*	Allow to omit boundaries in array subscript	Teodor Sigaev	2015-12-18
\| \| \| \| \| \| \|	Allow to omiy lower or upper or both boundaries in array subscript for selecting slice of array. Author: YUriy Zhuravlev
*	Use just one standalone-backend session for initdb's post-bootstrap steps.	Tom Lane	2015-12-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, each subroutine in initdb fired up its own standalone backend session. Over time we'd grown as many as fifteen of these sessions, and the cumulative startup and shutdown work for them was getting pretty noticeable. Combining things so that all these steps share a single backend session cuts a good 10% off the total runtime of initdb, more if you're not fsync'ing. The main stumbling block to doing this before was that some of the sessions were run with -j and some not. The improved definition of -j mode implemented by my previous commit makes it possible to fix that by running all the post-bootstrap steps with -j; we just have to use double instead of single newlines to end command strings. (This is only absolutely necessary around the VACUUM and CREATE DATABASE steps, since those can't be run in a transaction block. But it seems best to make them all use double newlines so that the commands remain separate for error-reporting purposes.) A minor disadvantage is that since initdb can't tell how much of its output the backend has executed, we can no longer have the per-step progress reporting initdb used to print. But things are fast enough nowadays that that's not really all that useful anyway. In passing, add more const decoration to some of the static arrays in initdb.c.
*	Adjust behavior of single-user -j mode for better initdb error reporting.	Tom Lane	2015-12-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, -j caused the entire input file to be read in and executed as a single command string. That's undesirable, not least because any error causes the entire file to be regurgitated as the "failing query". Some experimentation suggests a better rule: end the command string when we see a semicolon immediately followed by two newlines, ie, an empty line after a query. This serves nicely to break up the existing examples such as information_schema.sql and system_views.sql. A limitation is that it's no longer possible to write such a sequence within a string literal or multiline comment in a file meant to be read with -j; but there are no instances of such a problem within the data currently used by initdb. (If someone does make such a mistake in future, it'll be obvious because they'll get an unterminated-literal or unterminated-comment syntax error.) Other than that, there shouldn't be any negative consequences; you're not forced to end statements that way, it's just a better idea in most cases. In passing, remove src/include/tcop/tcopdebug.h, which is dead code because it's not included anywhere, and hasn't been for more than ten years. One of the debug-support symbols it purported to describe has been unreferenced for at least the same amount of time, and the other is removed by this commit on the grounds that it was useless: forcing -j mode all the time would have broken initdb. The lack of complaints about that, or about the missing inclusion, shows that no one has tried to use TCOP_DONTUSENEWLINE in many years.
*	Rework internals of changing a type's ownership	Alvaro Herrera	2015-12-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is necessary so that REASSIGN OWNED does the right thing with composite types, to wit, that it also alters ownership of the type's pg_class entry -- previously, the pg_class entry remained owned by the original user, which caused later other failures such as the new owner's inability to use ALTER TYPE to rename an attribute of the affected composite. Also, if the original owner is later dropped, the pg_class entry becomes owned by a non-existant user which is bogus. To fix, create a new routine AlterTypeOwner_oid which knows whether to pass the request to ATExecChangeOwner or deal with it directly, and use that in shdepReassignOwner rather than calling AlterTypeOwnerInternal directly. AlterTypeOwnerInternal is now simpler in that it only modifies the pg_type entry and recurses to handle a possible array type; higher-level tasks are handled by either AlterTypeOwner directly or AlterTypeOwner_oid. I took the opportunity to add a few more objects to the test rig for REASSIGN OWNED, so that more cases are exercised. Additional ones could be added for superuser-only-ownable objects (such as FDWs and event triggers) but I didn't want to push my luck by adding a new superuser to the tests on a backpatchable bug fix. Per bug #13666 reported by Chris Pacejo. Backpatch to 9.5. (I would back-patch this all the way back, except that it doesn't apply cleanly in 9.4 and earlier because 59367fdf9 wasn't backpatched. If we decide that we need this in earlier branches too, we should backpatch both.)
*	Speed up CREATE INDEX CONCURRENTLY's TID sort.	Robert Haas	2015-12-16
\| \| \| \| \| \| \| \| \|	Encode TIDs as 64-bit integers to speed up comparisons. This seems to speed things up on all platforms, but is even more beneficial when 8-byte integers are passed by value. Peter Geoghegan. Design suggestions and review by Tom Lane. Review also by Simon Riggs and by me.
*	Mark CHECK constraints declared NOT VALID valid if created with table.	Robert Haas	2015-12-16
\| \| \| \| \| \| \| \|	FOREIGN KEY constraints have behaved this way for a long time, but for some reason the behavior of CHECK constraints has been inconsistent up until now. Amit Langote and Amul Sul, with assorted tweaks by me.
*	Teach mdnblocks() not to create zero-length files.	Robert Haas	2015-12-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's entirely surprising that mdnblocks() has the side effect of creating new files on disk, so let's make it not do that. One consequence of the old behavior is that, if running on a damaged cluster that is missing a file, mdnblocks() can recreate the file and allow a subsequent _mdfd_getseg() for a higher segment to succeed. This happens because, while mdnblocks() stops when it finds a segment that is shorter than 1GB, _mdfd_getseg() has no such check, and thus the empty file created by mdnblocks() can allow it to continue its traversal and find higher-numbered segments which remain. It might be a good idea for _mdfd_getseg() to actually verify that each segment it finds is exactly 1GB before proceeding to the next one, but that would involve some additional system calls, so for now I'm just doing this much. Patch by me, per off-list analysis by Kevin Grittner and Rahila Syed. Review by Andres Freund.
*	Move buffer I/O and content LWLocks out of the main tranche.	Robert Haas	2015-12-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the content lock directly into the BufferDesc, so that locking and pinning a buffer touches only one cache line rather than two. Adjust the definition of BufferDesc slightly so that this doesn't make the BufferDesc any larger than one cache line (at least on platforms where a spinlock is only 1 or 2 bytes). We can't fit the I/O locks into the BufferDesc and stay within one cache line, so move those to a completely separate tranche. This leaves a relatively limited number of LWLocks in the main tranche, so increase the padding of those remaining locks to a full cache line, rather than allowing adjacent locks to share a cache line, hopefully reducing false sharing. Performance testing shows that these changes make little difference on laptop-class machines, but help significantly on larger servers, especially those with more than 2 sockets. Andres Freund, originally based on an earlier patch by Simon Riggs. Review and cosmetic adjustments (including heavy rewriting of the comments) by me.
*	Provide a way to predefine LWLock tranche IDs.	Robert Haas	2015-12-15
\| \| \| \| \| \| \| \| \| \| \|	It's a bit cumbersome to use LWLockNewTrancheId(), because the returned value needs to be shared between backends so that each backend can call LWLockRegisterTranche() with the correct ID. So, for built-in tranches, use a hard-coded value instead. This is motivated by an upcoming patch adding further built-in tranches. Andres Freund and Robert Haas
*	Collect the global OR of hasRowSecurity flags for plancache	Stephen Frost	2015-12-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We carry around information about if a given query has row security or not to allow the plancache to use that information to invalidate a planned query in the event that the environment changes. Previously, the flag of one of the subqueries was simply being copied into place to indicate if the query overall included RLS components. That's wrong as we need the global OR of all subqueries. Fix by changing the code to match how fireRIRules works, which is results in OR'ing all of the flags. Noted by Tom. Back-patch to 9.5 where RLS was introduced.
*	Add missing CHECK_FOR_INTERRUPTS in lseg_inside_poly	Alvaro Herrera	2015-12-14
\| \| \| \| \| \| \| \| \|	Apparently, there are bugs in this code that cause it to loop endlessly. That bug still needs more research, but in the meantime it's clear that the loop is missing a check for interrupts so that it can be cancelled timely. Backpatch to 9.1 -- this has been missing since 49475aab8d0d.
*	Fix bug in SetOffsetVacuumLimit() triggered by find_multixact_start() failure.	Andres Freund	2015-12-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, if find_multixact_start() failed, SetOffsetVacuumLimit() would install 0 into MultiXactState->offsetStopLimit if it previously succeeded. Luckily, there are no known cases where find_multixact_start() will return an error in 9.5 and above. But if it were to happen, for example due to filesystem permission issues, it'd be somewhat bad: GetNewMultiXactId() could continue allocating mxids even if close to a wraparound, or it could erroneously stop allocating mxids, even if no wraparound is looming. The wrong value would be corrected the next time SetOffsetVacuumLimit() is called, or by a restart. Reported-By: Noah Misch, although this is not his preferred fix Discussion: 20151210140450.GA22278@alap3.anarazel.de Backpatch: 9.5, where the bug was introduced as part of 4f627f
*	Correct statement to actually be the intended assert statement.	Andres Freund	2015-12-14
\| \| \| \| \| \| \| \| \|	e3f4cfc7 introduced a LWLockHeldByMe() call, without the corresponding Assert() surrounding it. Spotted by Coverity. Backpatch: 9.1+, like the previous commit
*	Consistently set all fields in pg_stat_replication to null instead of 0	Magnus Hagander	2015-12-13
\| \| \| \| \| \|	Previously the "sent" field would be set to 0 and all other xlog pointers be set to NULL if there were no valid values (such as when in a backup sending walsender).
*	Properly initialize write, flush and replay locations in walsender slots	Magnus Hagander	2015-12-13
\| \| \| \| \| \| \| \| \|	These would leak random xlog positions if a walsender used for backup would a walsender slot previously used by a replication walsender. In passing also fix a couple of cases where the xlog pointer is directly compared to zero instead of using XLogRecPtrIsInvalid, noted by Michael Paquier.
*	Fix ALTER TABLE ... SET TABLESPACE for unlogged relations.	Andres Freund	2015-12-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changing the tablespace of an unlogged relation did not WAL log the creation and content of the init fork. Thus, after a standby is promoted, unlogged relation cannot be accessed anymore, with errors like: ERROR: 58P01: could not open file "pg_tblspc/...": No such file or directory Additionally the init fork was not synced to disk, independent of the configured wal_level, a relatively small durability risk. Investigation of that problem also brought to light that, even for permanent relations, the creation of !main forks was not WAL logged, i.e. no XLOG_SMGR_CREATE record were emitted. That mostly turns out not to be a problem, because these files were created when the actual relation data is copied; nonexistent files are not treated as an error condition during replay. But that doesn't work for empty files, and generally feels a bit haphazard. Luckily, outside init and main forks, empty forks don't occur often or are not a problem. Add the required WAL logging and syncing to disk. Reported-By: Michael Paquier Author: Michael Paquier and Andres Freund Discussion: 20151210163230.GA11331@alap3.anarazel.de Backpatch: 9.1, where unlogged relations were introduced
*	For REASSIGN OWNED for foreign user mappings	Alvaro Herrera	2015-12-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	As reported in bug #13809 by Alexander Ashurkov, the code for REASSIGN OWNED hadn't gotten word about user mappings. Deal with them in the same way default ACLs do, which is to ignore them altogether; they are handled just fine by DROP OWNED. The other foreign object cases are already handled correctly by both commands. Also add a REASSIGN OWNED statement to foreign_data test to exercise the foreign data objects. (The changes are just before the "cleanup" phase, so it shouldn't remove any existing live test.) Reported by Alexander Ashurkov, then independently by Jaime Casanova.