aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access
Commit message (Collapse)AuthorAge
* Fix start WAL filename for concurrent backups from standbyMagnus Hagander2016-07-11
| | | | | | | | | | On a standby, ThisTimelineID is always 0, so we would generate a filename in timeline 0 even for other timelines. Instead, use starttli which we have retreived from the controlfile. Report by: Francesco Canovai in bug #14230 Author: Marco Nenciarini Reviewed by: Michael Paquier and Amit Kapila
* Fix several mistakes around parallel workers and client_encoding.Robert Haas2016-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | Previously, workers sent data to the leader using the client encoding. That mostly worked, but the leader the converted the data back to the server encoding. Since not all encoding conversions are reversible, that could provoke failures. Fix by using the database encoding for all communication between worker and leader. Also, while temporary changes to GUC settings, as from the SET clause of a function, are in general OK for parallel query, changing client_encoding this way inside of a parallel worker is not OK. Previously, that would have confused the leader; with these changes, it would not confuse the leader, but it wouldn't do anything either. So refuse such changes in parallel workers. Also, the previous code naively assumed that when it received a NotifyResonse from the worker, it could pass that directly back to the user. But now that worker-to-leader communication always uses the database encoding, that's clearly no longer correct - though, actually, the old way was always broken for V2 clients. So disassemble and reconstitute the message instead. Issues reported by Peter Eisentraut. Patch by me, reviewed by Peter Eisentraut.
* Remove unused arguments in two GiST subroutinesAlvaro Herrera2016-06-28
| | | | | | | These arguments became unused in commit 2c03216d8311. Noticed while skimming code for unrelated development. This is cosmetic, so no backpatch.
* Fix handling of multixacts predating pg_upgradeAlvaro Herrera2016-06-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After pg_upgrade, it is possible that some tuples' Xmax have multixacts corresponding to the old installation; such multixacts cannot have running members anymore. In many code sites we already know not to read them and clobber them silently, but at least when VACUUM tries to freeze a multixact or determine whether one needs freezing, there's an attempt to resolve it to its member transactions by calling GetMultiXactIdMembers, and if the multixact value is "in the future" with regards to the current valid multixact range, an error like this is raised: ERROR: MultiXactId 123 has not been created yet -- apparent wraparound and vacuuming fails. Per discussion with Andrew Gierth, it is completely bogus to try to resolve multixacts coming from before a pg_upgrade, regardless of where they stand with regards to the current valid multixact range. It's possible to get from under this problem by doing SELECT FOR UPDATE of the problem tuples, but if tables are large, this is slow and tedious, so a more thorough solution is desirable. To fix, we realize that multixacts in xmax created in 9.2 and previous have a specific bit pattern that is never used in 9.3 and later (we already knew this, per comments and infomask tests sprinkled in various places, but we weren't leveraging this knowledge appropriately). Whenever the infomask of the tuple matches that bit pattern, we just ignore the multixact completely as if Xmax wasn't set; or, in the case of tuple freezing, we act as if an unwanted value is set and clobber it without decoding. This guarantees that no errors will be raised, and that the values will be progressively removed until all tables are clean. Most callers of GetMultiXactIdMembers are patched to recognize directly that the value is a removable "empty" multixact and avoid calling GetMultiXactIdMembers altogether. To avoid changing the signature of GetMultiXactIdMembers() in back branches, we keep the "allow_old" boolean flag but rename it to "from_pgupgrade"; if the flag is true, we always return an empty set instead of looking up the multixact. (I suppose we could remove the argument in the master branch, but I chose not to do so in this commit). This was broken all along, but the error-facing message appeared first because of commit 8e9a16ab8f7f and was partially fixed in a25c2b7c4db3. This fix, backpatched all the way back to 9.3, goes approximately in the same direction as a25c2b7c4db3 but should cover all cases. Bug analysis by Andrew Gierth and Álvaro Herrera. A number of public reports match this bug: https://www.postgresql.org/message-id/20140330040029.GY4582@tamriel.snowman.net https://www.postgresql.org/message-id/538F3D70.6080902@publicrelay.com https://www.postgresql.org/message-id/556439CF.7070109@pscs.co.uk https://www.postgresql.org/message-id/SG2PR06MB0760098A111C88E31BD4D96FB3540@SG2PR06MB0760.apcprd06.prod.outlook.com https://www.postgresql.org/message-id/20160615203829.5798.4594@wrigleys.postgresql.org
* Fix building of large (bigger than shared_buffers) hash indexes.Tom Lane2016-06-24
| | | | | | | | | | | | | | | | When the index is predicted to need more than NBuffers buckets, CREATE INDEX attempts to sort the index entries by hash key before insertion, so as to reduce thrashing. This code path got broken by commit 9f03ca915196dfc8, which overlooked that _hash_form_tuple() is not just an alias for index_form_tuple(). The index got built anyway, but with garbage data, so that searches for pre-existing tuples always failed. Fix by refactoring to separate construction of the indexable data from calling index_form_tuple(). Per bug #14210 from Daniel Newman. Back-patch to 9.5 where the bug was introduced. Report: <20160623162507.17237.39471@wrigleys.postgresql.org>
* Finish up XLOG_HINT renamingAlvaro Herrera2016-06-17
| | | | | | | Commit b8fd1a09f3 renamed XLOG_HINT to XLOG_FPI, but neglected two places. Backpatch to 9.3, like that commit.
* pg_visibility: Add pg_truncate_visibility_map function.Robert Haas2016-06-17
| | | | | | | | This requires some core changes as well so that we can properly WAL-log the truncation. Specifically, it changes the format of the XLOG_SMGR_TRUNCATE WAL record, so bump XLOG_PAGE_MAGIC. Patch by me, reviewed but not fully endorsed by Andres Freund.
* Fix typo.Robert Haas2016-06-17
| | | | Thomas Munro
* Remove PID from 'parallel worker' context message.Robert Haas2016-06-17
| | | | Discussion: <bfd204ab-ab1a-792a-b345-0274a09a4b5f@2ndquadrant.com>
* Fix fuzzy thinking in ReinitializeParallelDSM().Tom Lane2016-06-16
| | | | | | | | | The fact that no workers were successfully launched in the previous iteration does not excuse us from setting up properly to try again. This appears to explain crashes I saw in parallel regression testing due to error_mqh being NULL when it shouldn't be. Minor other cosmetic fixes too.
* Fix lazy_scan_heap so that it won't mark pages all-frozen too soon.Robert Haas2016-06-15
| | | | | | | | | | | | | Commit a892234f830e832110f63fc0a2afce2fb21d1584 added a new bit per page to the visibility map fork indicating whether the page is all-frozen, but incorrectly assumed that if lazy_scan_heap chose to freeze a tuple then that tuple would not need to later be frozen again. This turns out to be false, because xmin and xmax (and conceivably xvac, if dealing with tuples from very old releases) could be frozen at separate times. Thanks to Andres Freund for help in uncovering and tracking down this issue.
* Improve the situation for parallel query versus temp relations.Tom Lane2016-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Transmit the leader's temp-namespace state to workers. This is important because without it, the workers do not really have the same search path as the leader. For example, there is no good reason (and no extant code either) to prevent a worker from executing a temp function that the leader created previously; but as things stood it would fail to find the temp function, and then either fail or execute the wrong function entirely. We still prohibit a worker from creating a temp namespace on its own. In effect, a worker can only see the session's temp namespace if the leader had created it before starting the worker, which seems like the right semantics. Also, transmit the leader's BackendId to workers, and arrange for workers to use that when determining the physical file path of a temp relation belonging to their session. While the original intent was to prevent such accesses entirely, there were a number of holes in that, notably in places like dbsize.c which assume they can safely access temp rels of other sessions anyway. We might as well get this right, as a small down payment on someday allowing workers to access the leader's temp tables. (With this change, directly using "MyBackendId" as a relation or buffer backend ID is deprecated; you should use BackendIdForTempRelations() instead. I left a couple of such uses alone though, as they're not going to be reachable in parallel workers until we do something about localbuf.c.) Move the thou-shalt-not-access-thy-leader's-temp-tables prohibition down into localbuf.c, which is where it actually matters, instead of having it in relation_open(). This amounts to recognizing that access to temp tables' catalog entries is perfectly safe in a worker, it's only the data in local buffers that is problematic. Having done all that, we can get rid of the test in has_parallel_hazard() that says that use of a temp table's rowtype is unsafe in parallel workers. That test was unduly expensive, and if we really did need such a prohibition, that was not even close to being a bulletproof guard for it. (For example, any user-defined function executed in a parallel worker might have attempted such access.)
* pgindent run for 9.6Robert Haas2016-06-09
|
* Eliminate "parallel degree" terminology.Robert Haas2016-06-09
| | | | | | | | | | | | This terminology provoked widespread complaints. So, instead, rename the GUC max_parallel_degree to max_parallel_workers_per_gather (leaving room for a possible future GUC max_parallel_workers that acts as a system-wide limit), and rename the parallel_degree reloption to parallel_workers. Rename structure members to match. These changes create a dump/restore hazard for users of PostgreSQL 9.6beta1 who have set the reloption (or applied the GUC using ALTER USER or ALTER DATABASE).
* Message style and wording fixesPeter Eisentraut2016-06-07
|
* Stop the executor if no more tuples can be sent from worker to leader.Robert Haas2016-06-06
| | | | | | | | | | | | | | | | | | | | | If a Gather node has read as many tuples as it needs (for example, due to Limit) it may detach the queue connecting it to the worker before reading all of the worker's tuples. Rather than let the worker continue to generate and send all of the results, have it stop after sending the next tuple. More could be done here to stop the worker even quicker, but this is about as well as we can hope to do for 9.6. This is in response to a problem report from Andreas Seltenreich. Commit 44339b892a04e94bbb472235882dc6f7023bdc65 should be actually be sufficient to fix that example even without this change, but it seems better to do this, too, since we might otherwise waste quite a large amount of effort in one or more workers. Discussion: CAA4eK1KOKGqmz9bGu+Z42qhRwMbm4R5rfnqsLCNqFs9j14jzEA@mail.gmail.com Amit Kapila
* Fix typo.Robert Haas2016-06-06
| | | | Jim Nasby
* Fix various common mispellings.Greg Stark2016-06-03
| | | | | | | | | | Mostly these are just comments but there are a few in documentation and a handful in code and tests. Hopefully this doesn't cause too much unnecessary pain for backpatching. I relented from some of the most common like "thru" for that reason. The rest don't seem numerous enough to cause problems. Thanks to Kevin Lyda's tool https://pypi.python.org/pypi/misspellings
* Cosmetic improvements to freeze map code.Robert Haas2016-06-03
| | | | | | | Per post-commit review comments from Andres Freund, improve variable names, comments, and in one place, slightly improve the code structure. Masahiko Sawada
* Fix btree mark/restore bug.Kevin Grittner2016-06-02
| | | | | | | | | | | | | | | | | | | | | Commit 2ed5b87f96d473962ec5230fd820abfeaccb2069 introduced a bug in mark/restore, in an attempt to optimize repeated restores to the same page. This caused an assertion failure during a merge join which fed directly from an index scan, although the impact would not be limited to that case. Revert the bad chunk of code from that commit. While investigating this bug it was discovered that a particular "paranoia" set of the mark position field would not prevent bad behavior; it would just make it harder to diagnose. Change that into an assertion, which will draw attention to any future problem in that area more directly. Backpatch to 9.5, where the bug was introduced. Bug #14169 reported by Shinta Koyanagi. Preliminary analysis by Tom Lane identified which commit caused the bug.
* Fix PageAddItem BRIN bugAlvaro Herrera2016-05-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BRIN was relying on the ability to remove a tuple from an index page, then putting another tuple in the same line pointer. But PageAddItem refuses to add a tuple beyond the first free item past the last used item, and in particular, it rejects an attempt to add an item to an empty page anywhere other than the first line pointer. PageAddItem issues a WARNING and indicates to the caller that it failed, which in turn causes the BRIN calling code to issue a PANIC, so the whole sequence looks like this: WARNING: specified item offset is too large PANIC: failed to add BRIN tuple To fix, create a new function PageAddItemExtended which is like PageAddItem except that the two boolean arguments become a flags bitmap; the "overwrite" and "is_heap" boolean flags in PageAddItem become PAI_OVERWITE and PAI_IS_HEAP flags in the new function, and a new flag PAI_ALLOW_FAR_OFFSET enables the behavior required by BRIN. PageAddItem() retains its original signature, for compatibility with third-party modules (other callers in core code are not modified, either). Also, in the belt-and-suspenders spirit, I added a new sanity check in brinGetTupleForHeapBlock to raise an error if an TID found in the revmap is not marked as live by the page header. This causes it to react with "ERROR: corrupted BRIN index" to the bug at hand, rather than a hard crash. Backpatch to 9.5. Bug reported by Andreas Seltenreich as detected by his handy sqlsmith fuzzer. Discussion: https://www.postgresql.org/message-id/87mvni77jh.fsf@elite.ansel.ydns.eu
* Fix BTREE_BUILD_STATS build.Tom Lane2016-05-23
| | | | | | | | | | Commit 65c5fcd353a859da9e61bfb2b92a99f12937de3b broke this by removing a header include directive that is conditionally required. Add that back to nbtree.c, with annotation to keep pgrminclude from re-breaking it. Peter Geoghegan Report: <CAM3SWZTNjHFYW_UG8bu0BnogqQ2HfsTgkzXLueuUhfTcYbu5HA@mail.gmail.com>
* Allocate all page images at once in generic wal interfaceTeodor Sigaev2016-05-17
| | | | | | That reduces number of allocation. Per gripe from Michael Paquier and Tom Lane suggestion.
* Correctly align page's images in generic wal APITeodor Sigaev2016-05-17
| | | | | | | Page image should be MAXALIGN'ed because existing code could directly align pointers in page instead of align offset from beginning of page. Found during play with indexes as extenstion, Alexander Korotkov and me
* Fix bogus commentsAlvaro Herrera2016-05-12
| | | | | | Some comments mentioned XLogReplayBuffer, but there's no such function: that was an interim name for a function that got renamed to XLogReadBufferForRedo, before commit 2c03216d831160 was pushed.
* Fix obsolete commentAlvaro Herrera2016-05-12
|
* Fix poorly-worded log message.Tom Lane2016-05-08
| | | | Euler Taveira
* Limit maximum parallel degree to 1024.Robert Haas2016-05-06
| | | | | | | | | | | | | | This new limit affects both the max_parallel_degree GUC and the parallel_degree reloption. There may some day be a use case for using more than 1024 CPUs for a single query, but that's surely not the case right now. Not only do not very many people have that many CPUs, but the code hasn't been tested at that kind of scale and is very unlikely to perform well, or even work at all, without a lot more work. The issue addressed by commit 06bd458cb812623c3f1fdd55216c4c08b06a8447 is probably just one problem of many. The idea of a more reasonable limit here was suggested by Tom Lane; the value of 1024 was suggested by Amit Kapila.
* Use mul_size when multiplying by the number of parallel workers.Robert Haas2016-05-06
| | | | | | | | | | That way, if the result overflows size_t, you'll get an error instead of undefined behavior, which seems like a plus. This also has the effect of casting the number of workers from int to Size, which is better because it's harder to overflow int than size_t. Dilip Kumar reported this issue and provided a patch upon which this patch is based, but his version did use mul_size.
* Fix hash index vs "snapshot too old" problemmsKevin Grittner2016-05-06
| | | | | | | | | | | | | | | | Hash indexes are not WAL-logged, and so do not maintain the LSN of index pages. Since the "snapshot too old" feature counts on detecting error conditions using the LSN of a table and all indexes on it, this makes it impossible to safely do early vacuuming on any table with a hash index, so add this to the tests for whether the xid used to vacuum a table can be adjusted based on old_snapshot_threshold. While at it, add a paragraph to the docs for old_snapshot_threshold which specifically mentions this and other aspects of the feature which may otherwise surprise users. Problem reported and patch reviewed by Amit Kapila
* Revert timeline following in replication slotsAlvaro Herrera2016-05-04
| | | | | | | | | This reverts commits f07d18b6e94d, 82c83b337202, 3a3b309041b0, and 24c5f1a103ce. This feature has shown enough immaturity that it was deemed better to rip it out before rushing some more fixes at the last minute. There are discussions on larger changes in this area for the next release.
* Prevent to use magic constantsTeodor Sigaev2016-04-28
| | | | | | | Use macroses for definition amstrategies/amsupport fields instead of hardcoded values. Author: Nikolay Shaplov with addition for contrib/bloom
* Prevent multiple cleanup process for pending list in GIN.Teodor Sigaev2016-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, ginInsertCleanup could exit early if it detects that someone else is cleaning up the pending list, without waiting for that someone else to finish the job. But in this case vacuum could miss tuples to be deleted. Cleanup process now locks metapage with a help of heavyweight LockPage(ExclusiveLock), and it guarantees that there is no another cleanup process at the same time. Lock is taken differently depending on caller of cleanup process: any vacuums and gin_clean_pending_list() will be blocked until lock becomes available, ordinary insert uses conditional lock to prevent indefinite waiting on lock. Insert into pending list doesn't use this lock, so insertion isn't blocked. Also, patch adds stopping of cleanup process when at-start-cleanup-tail is reached in order to prevent infinite cleanup in case of massive insertion. But it will stop only for automatic maintenance tasks like autovacuum. Patch introduces choice of limit of memory to use: autovacuum_work_mem, maintenance_work_mem or work_mem depending on call path. Patch for previous releases should be reworked due to changes between 9.6 and previous ones in this area. Discover and diagnostics by Jeff Janes and Tomas Vondra Patch by me with some ideas of Jeff Janes
* Emit invalidations to standby for transactions without xid.Andres Freund2016-04-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | So far, when a transaction with pending invalidations, but without an assigned xid, committed, we simply ignored those invalidation messages. That's problematic, because those are actually sent for a reason. Known symptoms of this include that existing sessions on a hot-standby replica sometimes fail to notice new concurrently built indexes and visibility map updates. The solution is to WAL log such invalidations in transactions without an xid. We considered to alternatively force-assign an xid, but that'd be problematic for vacuum, which might be run in systems with few xids. Important: This adds a new WAL record, but as the patch has to be back-patched, we can't bump the WAL page magic. This means that standbys have to be updated before primaries; otherwise "PANIC: standby_redo: unknown op code 32" errors can be encountered. XXX: Reported-By: Васильев Дмитрий, Masahiko Sawada Discussion: CAB-SwXY6oH=9twBkXJtgR4UC1NqT-vpYAtxCseME62ADwyK5OA@mail.gmail.com CAD21AoDpZ6Xjg=gFrGPnSn4oTRRcwK1EBrWCq9OqOHuAcMMC=w@mail.gmail.com
* Fix memory leak and other bugs in ginPlaceToPage() & subroutines.Tom Lane2016-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and its subroutines in gindatapage.c and ginentrypage.c into a royal mess: page-update critical sections were started in one place and finished in another place not even in the same file, and the very same subroutine might return having started a critical section or not. Subsequent patches band-aided over some of the problems with this design by making things even messier. One user-visible resulting problem is memory leaks caused by the need for the subroutines to allocate storage that would survive until ginPlaceToPage calls XLogInsert (as reported by Julien Rouhaud). This would not typically be noticeable during retail index updates. It could be visible in a GIN index build, in the form of memory consumption swelling to several times the commanded maintenance_work_mem. Another rather nasty problem is that in the internal-page-splitting code path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before entering the critical section that it's supposed to be cleared in; a failure in between would leave the index in a corrupt state. There were also assorted coding-rule violations with little immediate consequence but possible long-term hazards, such as beginning an XLogInsert sequence before entering a critical section, or calling elog(DEBUG) inside a critical section. To fix, redefine the API between ginPlaceToPage() and its subroutines by splitting the subroutines into two parts. The "beginPlaceToPage" subroutine does what can be done outside a critical section, including full computation of the result pages into temporary storage when we're going to split the target page. The "execPlaceToPage" subroutine is called within a critical section established by ginPlaceToPage(), and it handles the actual page update in the non-split code path. The critical section, as well as the XLOG insertion call sequence, are both now always started and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and work in a short-lived memory context to eliminate the leakage problem. (Since a short-lived memory context had been getting created in the most common code path in the subroutines, this shouldn't cause any noticeable performance penalty; we're just moving the overhead up one call level.) In passing, fix a bunch of comments that had gone unmaintained throughout all this klugery. Report: <571276DD.5050303@dalibo.com>
* Revert no-op changes to BufferGetPage()Kevin Grittner2016-04-20
| | | | | | | | | | | | | | | | | | The reverted changes were intended to force a choice of whether any newly-added BufferGetPage() calls needed to be accompanied by a test of the snapshot age, to support the "snapshot too old" feature. Such an accompanying test is needed in about 7% of the cases, where the page is being used as part of a scan rather than positioning for other purposes (such as DML or vacuuming). The additional effort required for back-patching, and the doubt whether the intended benefit would really be there, have indicated it is best just to rely on developers to do the right thing based on comments and existing usage, as we do with many other conventions. This change should have little or no effect on generated executable code. Motivated by the back-patching pain of Tom Lane and Robert Haas
* Fix memory leak in GIN index scans.Tom Lane2016-04-15
| | | | | | | | | | | The code had a query-lifespan memory leak when encountering GIN entries that have posting lists (rather than posting trees, ie, there are a relatively small number of heap tuples containing this index key value). With a suitable data distribution this could add up to a lot of leakage. Problem seems to have been introduced by commit 36a35c550, so back-patch to 9.4. Julien Rouhaud
* Improve API of GenericXLogRegister().Tom Lane2016-04-12
| | | | | | | | | | Rename this function to GenericXLogRegisterBuffer() to make it clearer what it does, and leave room for other sorts of "register" actions in future. Also, replace its "bool isNew" argument with an integer flags argument, so as to allow adding more flags in future without an API break. Alexander Korotkov, adjusted slightly by me
* In generic WAL application and replay, ensure page "hole" is always zero.Tom Lane2016-04-12
| | | | | | | | | | | The previous coding could allow the contents of the "hole" between pd_lower and pd_upper to diverge during replay from what it had been when the update was originally applied. This would pose a problem if checksums were in use, and in any case would complicate forensic comparisons between master and slave servers. So force the "hole" to contain zeroes, both at initial application of a generically-logged action, and at replay. Alexander Korotkov, adjusted slightly by me
* Correct copyright for newly added genericdesc.cStephen Frost2016-04-12
| | | | | | It's 2016 these days (no, not entirely sure how we got here either). Pointed out by Amit Langote
* Further minor improvement in generic_xlog.c: always say REGBUF_STANDARD.Tom Lane2016-04-10
| | | | | | | | | | | Since we're requiring pages handled by generic_xlog.c to be standard format, specify REGBUF_STANDARD when doing a full-page image, so that xloginsert.c can compress out the "hole" between pd_lower and pd_upper. Given the current API in which this path will be taken only for a newly initialized page, the hole is likely to be particularly large in such cases, so that this oversight could easily be performance-significant. I don't notice any particular change in the runtime of contrib/bloom's regression test, though.
* Micro-optimize GenericXLogFinish().Tom Lane2016-04-09
| | | | | | | | | | | | | | | | | | | Make the inner comparison loops of computeDelta() as tight as possible by pulling considerations of valid and invalid ranges out of the inner loops, and extending a match or non-match detection as far as possible before deciding what to do next. To keep this tractable, give up the possibility of merging fragments across the pd_lower to pd_upper gap. The fraction of pages where that could happen (ie, there are 4 or fewer bytes in the gap, *and* data changes immediately adjacent to it on both sides) is too small to be worth spending cycles on. Also, avoid two BLCKSZ-length memcpy()s by computing the delta before moving data into the target buffer, instead of after. This doesn't save nearly as many cycles as being tenser about computeDelta(), but it still seems worth doing. On my machine, this patch cuts a full 40% off the runtime of contrib/bloom's regression test.
* Get rid of GenericXLogUnregister().Tom Lane2016-04-09
| | | | | | | | | | | This routine is unsafe as implemented, because it invalidates the page image pointers returned by previous GenericXLogRegister() calls. Rather than complicate the API or the implementation to avoid that, let's just get rid of it; the use-case for having it seems much too thin to justify a lot of work here. While at it, do some wordsmithing on the SGML docs for generic WAL.
* Code review/prettification for generic_xlog.c.Tom Lane2016-04-09
| | | | | | | | | Improve commentary, use more specific names for the delta fields, const-ify pointer arguments where possible, avoid assuming that initializing only the first element of a local array will guarantee that the remaining elements end up as we need them. (I think that code in generic_redo actually worked, but only because InvalidBuffer is zero; this is a particularly ugly way of depending on that ...)
* Run pgindent on generic_xlog.c.Tom Lane2016-04-09
| | | | | This code desperately needs some micro-optimization, and I'd like it to be formatted a bit more nicely while I work on it.
* Add the "snapshot too old" featureKevin Grittner2016-04-08
| | | | | | | | | | | | | | | | This feature is controlled by a new old_snapshot_threshold GUC. A value of -1 disables the feature, and that is the default. The value of 0 is just intended for testing. Above that it is the number of minutes a snapshot can reach before pruning and vacuum are allowed to remove dead tuples which the snapshot would otherwise protect. The xmin associated with a transaction ID does still protect dead tuples. A connection which is using an "old" snapshot does not get an error unless it accesses a page modified recently enough that it might not be able to produce accurate results. This is similar to the Oracle feature, and we use the same SQLSTATE and error message for compatibility.
* Modify BufferGetPage() to prepare for "snapshot too old" featureKevin Grittner2016-04-08
| | | | | | | | | | | This patch is a no-op patch which is intended to reduce the chances of failures of omission once the functional part of the "snapshot too old" patch goes in. It adds parameters for snapshot, relation, and an enum to specify whether the snapshot age check needs to be done for the page at this point. This initial patch passes NULL for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the third. The follow-on patch will change the places where the test needs to be made.
* Revert CREATE INDEX ... INCLUDING ...Teodor Sigaev2016-04-08
| | | | | | It's not ready yet, revert two commits 690c543550b0d2852060c18d270cdb534d339d9a - unstable test output 386e3d7609c49505e079c40c65919d99feb82505 - patch itself
* CREATE INDEX ... INCLUDING (column[, ...])Teodor Sigaev2016-04-08
| | | | | | | | | | Now indexes (but only B-tree for now) can contain "extra" column(s) which doesn't participate in index structure, they are just stored in leaf tuples. It allows to use index only scan by using single index instead of two or more indexes. Author: Anastasia Lubennikova with minor editorializing by me Reviewers: David Rowley, Peter Geoghegan, Jeff Janes
* Increase maximum number of clog buffers.Andres Freund2016-04-08
| | | | | | | | | | | | | | | | | | | | | | Benchmarking has shown that the current number of clog buffers limits scalability. We've previously increased the number in 33aaa139, but that's not sufficient with a large number of clients. We've benchmarked the cost of increasing the limit by benchmarking worst case scenarios; testing showed that 128 buffers don't cause a regression, even in contrived scenarios, whereas 256 does There are a number of more complex patches flying around to address various clog scalability problems, but this is simple enough that we can get it into 9.6; and is beneficial even after those patches have been applied. It is a bit unsatisfactory to increase this in small steps every few releases, but a better solution seems to require a rewrite of slru.c; not something done quickly. Author: Amit Kapila and Andres Freund Discussion: CAA4eK1+-=18HOrdqtLXqOMwZDbC_15WTyHiFruz7BvVArZPaAw@mail.gmail.com