aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/transam/xloginsert.c
Commit message (Collapse)AuthorAge
* Initial pgindent and pgperltidy run for v14.Tom Lane2021-05-12
| | | | | | | | Also "make reformat-dat-files". The only change worthy of note is that pgindent messed up the formatting of launcher.c's struct LogicalRepWorkerId, which led me to notice that that struct wasn't used at all anymore, so I just took it out.
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Add support for building GiST index by sorting.Heikki Linnakangas2020-09-17
| | | | | | | | | | | | | | | | | | | | | | This adds a new optional support function to the GiST access method: sortsupport. If it is defined, the GiST index is built by sorting all data to the order defined by the sortsupport's comparator function, and packing the tuples in that order to GiST pages. This is similar to how B-tree index build works, and is much faster than inserting the tuples one by one. The resulting index is smaller too, because the pages are packed more tightly, upto 'fillfactor'. The normal build method works by splitting pages, which tends to lead to more wasted space. The quality of the resulting index depends on how good the opclass-defined sort order is. A good order preserves locality of the input data. As the first user of this facility, add 'sortsupport' function to the point_ops opclass. It sorts the points in Z-order (aka Morton Code), by interleaving the bits of the X and Y coordinates. Author: Andrey Borodin Reviewed-by: Pavel Borisov, Thomas Munro Discussion: https://www.postgresql.org/message-id/1A36620E-CAD8-4267-9067-FB31385E7C0D%40yandex-team.ru
* Immediately WAL-log subtransaction and top-level XID association.Amit Kapila2020-07-20
| | | | | | | | | | | | | | | | | | | | The logical decoding infrastructure needs to know which top-level transaction the subxact belongs to, in order to decode all the changes. Until now that might be delayed until commit, due to the caching (GPROC_MAX_CACHED_SUBXIDS), preventing features requiring incremental decoding. So we also write the assignment info into WAL immediately, as part of the next WAL record (to minimize overhead) only when wal_level=logical. We can not remove the existing XLOG_XACT_ASSIGNMENT WAL as that is required for avoiding overflow in the hot standby snapshot. Bump XLOG_PAGE_MAGIC, since this introduces XLR_BLOCK_ID_TOPLEVEL_XID. Author: Tomas Vondra, Dilip Kumar, Amit Kapila Reviewed-by: Amit Kapila Tested-by: Neha Sharma and Mahendra Singh Thalor Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
* Change the display of WAL usage statistics in Explain.Amit Kapila2020-05-05
| | | | | | | | | | | | | | | | In commit 33e05f89c5, we have added the option to display WAL usage statistics in Explain and auto_explain. The display format used two spaces between each field which is inconsistent with Buffer usage statistics which is using one space between each field. Change the format to make WAL usage statistics consistent with Buffer usage statistics. This commit also changed the usage of "full page writes" to "full page images" for WAL usage statistics to make it consistent with other parts of code and docs. Author: Julien Rouhaud, Amit Kapila Reviewed-by: Justin Pryzby, Kyotaro Horiguchi and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
* snapshot scalability: Move delayChkpt from PGXACT to PGPROC.Andres Freund2020-04-07
| | | | | | | | | | | | | | | | | | | | | The goal of separating hotly accessed per-backend data from PGPROC into PGXACT is to make accesses fast (GetSnapshotData() in particular). But delayChkpt is not actually accessed frequently; only when starting a checkpoint. As it is frequently modified (multiple times in the course of a single transaction), storing it in the same cacheline as hotly accessed data unnecessarily dirties a contended cacheline. Therefore move delayChkpt to PGPROC. This is part of a larger series of patches intending to improve GetSnapshotData() scalability. It is committed and pushed separately, as it is independently beneficial (small but measurable win, limited by the other frequent modifications of PGXACT). Author: Andres Freund Reviewed-By: Robert Haas, Thomas Munro, David Rowley Discussion: https://postgr.es/m/20200301083601.ews6hz5dduc3w2se@alap3.anarazel.de
* Add infrastructure to track WAL usage.Amit Kapila2020-04-04
| | | | | | | | | | | | | | | | | | | | | | This allows gathering the WAL generation statistics for each statement execution. The three statistics that we collect are the number of WAL records, the number of full page writes and the amount of WAL bytes generated. This helps the users who have write-intensive workload to see the impact of I/O due to WAL. This further enables us to see approximately what percentage of overall WAL is due to full page writes. In the future, we can extend this functionality to allow us to compute the the exact amount of WAL data due to full page writes. This patch in itself is just an infrastructure to compute WAL usage data. The upcoming patches will expose this data via explain, auto_explain, pg_stat_statements and verbose (auto)vacuum output. Author: Kirill Bychik, Julien Rouhaud Reviewed-by: Dilip Kumar, Fujii Masao and Amit Kapila Discussion: https://postgr.es/m/CAB-hujrP8ZfUkvL5OYETipQwA=e3n7oqHFU=4ZLxWS_Cza3kQQ@mail.gmail.com
* In log_newpage_range(), heed forkNum and page_std arguments.Noah Misch2020-03-21
| | | | | | | | | The function assumed forkNum=MAIN_FORKNUM and page_std=true, ignoring the actual arguments. Existing callers passed exactly those values, so there's no live bug. Back-patch to v12, where the function first appeared, because another fix needs this. Discussion: https://postgr.es/m/20191118045434.GA1173436@rfd.leadboat.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Make the order of the header file includes consistent in backend modules.Amit Kapila2019-11-12
| | | | | | | | | | | Similar to commits 7e735035f2 and dddf4cdc33, this commit makes the order of header file inclusion consistent for backend modules. In the passing, removed a couple of duplicate inclusions. Author: Vignesh C Reviewed-by: Kuntal Ghosh and Amit Kapila Discussion: https://postgr.es/m/CALDaNm2Sznv8RR6Ex-iJO6xAdsxgWhCoETkaYX=+9DW3q0QCfA@mail.gmail.com
* Fix assorted inconsistencies.Amit Kapila2019-06-08
| | | | | | | | | | There were a number of issues in the recent commits which include typos, code and comments mismatch, leftover function declarations. Fix them. Reported-by: Alexander Lakhin Author: Alexander Lakhin, Amit Kapila and Amit Langote Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/ef0c0232-0c1d-3a35-63d4-0ebd06e31387@gmail.com
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Generate less WAL during GiST, GIN and SP-GiST index build.Heikki Linnakangas2019-04-03
| | | | | | | | | | | | | | | | | | | | Instead of WAL-logging every modification during the build separately, first build the index without any WAL-logging, and make a separate pass through the index at the end, to write all pages to the WAL. This significantly reduces the amount of WAL generated, and is usually also faster, despite the extra I/O needed for the extra scan through the index. WAL generated this way is also faster to replay. For GiST, the LSN-NSN interlock makes this a little tricky. All pages must be marked with a valid (i.e. non-zero) LSN, so that the parent-child LSN-NSN interlock works correctly. We now use magic value 1 for that during index build. Change the fake LSN counter to begin from 1000, so that 1 is safely smaller than any real or fake LSN. 2 would've been enough for our purposes, but let's reserve a bigger range, in case we need more special values in the future. Author: Anastasia Lubennikova, Andrey V. Lepikhov Reviewed-by: Heikki Linnakangas, Dmitry Dolgov
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Cleanup comments in xlog compressionStephen Frost2018-12-06
| | | | | | | | | | | | Skipping over the "hole" in full page images in the XLOG code was described as being a form of compression, but this got a bit confusing since we now have PGLZ-based compression happening, so adjust the wording to discuss "removing" the "hole" and keeping the talk about compression to where we're talking about using PGLZ-based compression of the full page images. Reviewed-By: Kyotaro Horiguchi Discussion: https://postgr.es/m/20181127234341.GM3415@tamriel.snowman.net
* Avoid using potentially-under-aligned page buffers.Tom Lane2018-09-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a project policy against using plain "char buf[BLCKSZ]" local or static variables as page buffers; preferred style is to palloc or malloc each buffer to ensure it is MAXALIGN'd. However, that policy's been ignored in an increasing number of places. We've apparently got away with it so far, probably because (a) relatively few people use platforms on which misalignment causes core dumps and/or (b) the variables chance to be sufficiently aligned anyway. But this is not something to rely on. Moreover, even if we don't get a core dump, we might be paying a lot of cycles for misaligned accesses. To fix, invent new union types PGAlignedBlock and PGAlignedXLogBlock that the compiler must allocate with sufficient alignment, and use those in place of plain char arrays. I used these types even for variables where there's no risk of a misaligned access, since ensuring proper alignment should make kernel data transfers faster. I also changed some places where we had been palloc'ing short-lived buffers, for coding style uniformity and to save palloc/pfree overhead. Since this seems to be a live portability hazard (despite the lack of field reports), back-patch to all supported versions. Patch by me; thanks to Michael Paquier for review. Discussion: https://postgr.es/m/1535618100.1286.3.camel@credativ.de
* Silence assorted "variable may be used uninitialized" warnings.Tom Lane2018-02-14
| | | | | | | | | | | | | | | All of these are false positives, but in each case a fair amount of analysis is needed to see that, and it's not too surprising that not all compilers are smart enough. (In particular, in the logtape.c case, a compiler lacking the knowledge provided by the Assert would almost surely complain, so that this warning will be seen in any non-assert build.) Some of these are of long standing while others are pretty recent, but it only seems worth fixing them in HEAD. Jaime Casanova, tweaked a bit by me Discussion: https://postgr.es/m/CAJGNTeMcYAMJdPAom52dppLMtF-UnEZi0dooj==75OEv1EoBZA@mail.gmail.com
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Change TRUE/FALSE to true/falsePeter Eisentraut2017-11-08
| | | | | | | | | | | | | | The lower case spellings are C and C++ standard and are used in most parts of the PostgreSQL sources. The upper case spellings are only used in some files/modules. So standardize on the standard spellings. The APIs for ICU, Perl, and Windows define their own TRUE and FALSE, so those are left as is when using those APIs. In code comments, we use the lower-case spelling for the C concepts and keep the upper-case spelling for the SQL concepts. Reviewed-by: Michael Paquier <michael.paquier@gmail.com>
* Phase 3 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Phase 2 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change pg_bsd_indent to follow upstream rules for placement of comments to the right of code, and remove pgindent hack that caused comments following #endif to not obey the general rule. Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using the published version of pg_bsd_indent, but a hacked-up version that tried to minimize the amount of movement of comments to the right of code. The situation of interest is where such a comment has to be moved to the right of its default placement at column 33 because there's code there. BSD indent has always moved right in units of tab stops in such cases --- but in the previous incarnation, indent was working in 8-space tab stops, while now it knows we use 4-space tabs. So the net result is that in about half the cases, such comments are placed one tab stop left of before. This is better all around: it leaves more room on the line for comment text, and it means that in such cases the comment uniformly starts at the next 4-space tab stop after the code, rather than sometimes one and sometimes two tabs after. Also, ensure that comments following #endif are indented the same as comments following other preprocessor commands such as #else. That inconsistency turns out to have been self-inflicted damage from a poorly-thought-through post-indent "fixup" in pgindent. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Post-PG 10 beta1 pgindent runBruce Momjian2017-05-17
| | | | perltidy run not included.
* Suppress implicit-conversion warnings seen with newer clang versions.Tom Lane2017-03-28
| | | | | | | | | | | | | | | We were assigning values near 255 through "char *" pointers. On machines where char is signed, that's not entirely kosher, and it's reasonable for compilers to warn about it. A better solution would be to change the pointer type to "unsigned char *", but that would be vastly more invasive. For the moment, let's just apply this simple backpatchable solution. Aleksander Alekseev Discussion: https://postgr.es/m/20170220141239.GD12278@e733.localdomain Discussion: https://postgr.es/m/2839.1490714708@sss.pgh.pa.us
* Rename dtrace probes for ongoing xlog -> wal conversion.Robert Haas2017-02-09
| | | | xlog-switch becomes wal-switch, and xlog-insert becomes wal-insert.
* Add WAL consistency checking facility.Robert Haas2017-02-08
| | | | | | | | | | | | | | When the new GUC wal_consistency_checking is set to a non-empty value, it triggers recording of additional full-page images, which are compared on the standby against the results of applying the WAL record (without regard to those full-page images). Allowable differences such as hints are masked out, and the resulting pages are compared; any difference results in a FATAL error on the standby. Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki Linnakangas. Extensively reviewed and revised by Michael Paquier and by me, with additional reviews and comments from Amit Kapila, Álvaro Herrera, Simon Riggs, and Peter Eisentraut.
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Skip checkpoints, archiving on idle systems.Andres Freund2016-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some background activity (like checkpoints, archive timeout, standby snapshots) is not supposed to happen on an idle system. Unfortunately so far it was not easy to determine when a system is idle, which defeated some of the attempts to avoid redundant activity on an idle system. To make that easier, allow to make individual WAL insertions as not being "important". By checking whether any important activity happened since the last time an activity was performed, it now is easy to check whether some action needs to be repeated. Use the new facility for checkpoints, archive timeout and standby snapshots. The lack of a facility causes some issues in older releases, but in my opinion the consequences (superflous checkpoints / archived segments) aren't grave enough to warrant backpatching. Author: Michael Paquier, editorialized by Andres Freund Reviewed-By: Andres Freund, David Steele, Amit Kapila, Kyotaro HORIGUCHI Bug: #13685 Discussion: https://www.postgresql.org/message-id/20151016203031.3019.72930@wrigleys.postgresql.org https://www.postgresql.org/message-id/CAB7nPqQcPqxEM3S735Bd2RzApNqSNJVietAC=6kfkYv_45dKwA@mail.gmail.com Backpatch: -
* Add macros to make AllocSetContextCreate() calls simpler and safer.Tom Lane2016-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>
* Revert no-op changes to BufferGetPage()Kevin Grittner2016-04-20
| | | | | | | | | | | | | | | | | | The reverted changes were intended to force a choice of whether any newly-added BufferGetPage() calls needed to be accompanied by a test of the snapshot age, to support the "snapshot too old" feature. Such an accompanying test is needed in about 7% of the cases, where the page is being used as part of a scan rather than positioning for other purposes (such as DML or vacuuming). The additional effort required for back-patching, and the doubt whether the intended benefit would really be there, have indicated it is best just to rely on developers to do the right thing based on comments and existing usage, as we do with many other conventions. This change should have little or no effect on generated executable code. Motivated by the back-patching pain of Tom Lane and Robert Haas
* Modify BufferGetPage() to prepare for "snapshot too old" featureKevin Grittner2016-04-08
| | | | | | | | | | | This patch is a no-op patch which is intended to reduce the chances of failures of omission once the functional part of the "snapshot too old" patch goes in. It adds parameters for snapshot, relation, and an enum to specify whether the snapshot age check needs to be done for the page at this point. This initial patch passes NULL for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the third. The follow-on patch will change the places where the test needs to be made.
* Fix misspelled function name in comment.Heikki Linnakangas2016-02-01
|
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Fix "sesssion" typoAlvaro Herrera2015-09-28
| | | | | | | It was introduced alongside replication origins, by commit 5aa2350426c, so backpatch to 9.5. Pointed out by Fujii Masao
* Address points made in post-commit review of replication origins.Andres Freund2015-08-07
| | | | | | | | | | Amit reviewed the replication origins patch and made some good points. Address them. This fixes typos in error messages, docs and comments and adds a missing error check (although in a should-never-happen scenario). Discussion: CAA4eK1JqUBVeWWKwUmBPryFaje4190ug0y-OAUHWQ6tD83V4xg@mail.gmail.com Backpatch: 9.5, where replication origins were introduced.
* Promote the assertion that XLogBeginInsert() is not called twice into ERROR.Heikki Linnakangas2015-06-28
| | | | | | | | | Seems like cheap insurance for WAL bugs. A spurious call to XLogBeginInsert() in itself would be fairly harmless, but if there is any data registered and the insertion is not completed/cancelled properly, there is a risk that the data ends up in a wrong WAL record. Per Jeff Janes's suggestion.
* pgindent run for 9.5Bruce Momjian2015-05-23
|
* Fix the same-rel optimization when creating WAL records.Heikki Linnakangas2015-05-04
| | | | | | | prev_regbuf was never set, and therefore the same-rel flag was never set on WAL records. Report and fix by Zhanq Zq
* Introduce replication progress tracking infrastructure.Andres Freund2015-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de
* Reorganize our CRC source files again.Heikki Linnakangas2015-04-14
| | | | | | | | | | Now that we use CRC-32C in WAL and the control file, the "traditional" and "legacy" CRC-32 variants are not used in any frontend programs anymore. Move the code for those back from src/common to src/backend/utils/hash. Also move the slicing-by-8 implementation (back) to src/port. This is in preparation for next patch that will add another implementation that uses Intel SSE 4.2 instructions to calculate CRC-32C, where available.
* Suppress maybe-uninitialized compiler warnings.Fujii Masao2015-03-15
| | | | | | | Previously some compilers were thinking that the variables that 57aa5b2 added maybe-uninitialized. Spotted by Andres Freund
* Add GUC to enable compression of full page images stored in WAL.Fujii Masao2015-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When newly-added GUC parameter, wal_compression, is on, the PostgreSQL server compresses a full page image written to WAL when full_page_writes is on or during a base backup. A compressed page image will be decompressed during WAL replay. Turning this parameter on can reduce the WAL volume without increasing the risk of unrecoverable data corruption, but at the cost of some extra CPU spent on the compression during WAL logging and on the decompression during WAL replay. This commit changes the WAL format (so bumping WAL version number) so that the one-byte flag indicating whether a full page image is compressed or not is included in its header information. This means that the commit increases the WAL volume one-byte per a full page image even if WAL compression is not used at all. We can save that one-byte by borrowing one-bit from the existing field like hole_offset in the header and using it as the flag, for example. But which would reduce the code readability and the extensibility of the feature. Per discussion, it's not worth paying those prices to save only one-byte, so we decided to add the one-byte flag to the header. This commit doesn't introduce any new compression algorithm like lz4. Currently a full page image is compressed using the existing PGLZ algorithm. Per discussion, we decided to use it at least in the first version of the feature because there were no performance reports showing that its compression ratio is unacceptably lower than that of other algorithm. Of course, in the future, it's worth considering the support of other compression algorithm for the better compression. Rahila Syed and Michael Paquier, reviewed in various versions by myself, Andres Freund, Robert Haas, Abhijit Menon-Sen and many others.
* Fix typo in comment.Fujii Masao2015-03-05
|
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Fix typosAlvaro Herrera2014-12-03
|
* Fix assertion failure at end of PITR.Heikki Linnakangas2014-11-28
| | | | | | | | | | | | | InitXLogInsert() cannot be called in a critical section, because it allocates memory. But CreateCheckPoint() did that, when called for the end-of-recovery checkpoint by the startup process. In the passing, fix the scratch space allocation in InitXLogInsert to go to the right memory context. Also update the comment at InitXLOGAccess, which hasn't been totally accurate since hot standby was introduced (in a hot standby backend, InitXLOGAccess isn't called at backend startup). Reported by Michael Paquier
* Distinguish XLOG_FPI records generated for hint-bit updates.Heikki Linnakangas2014-11-24
| | | | | | | Add a new XLOG_FPI_FOR_HINT record type, and use that for full-page images generated for hint bit updates, when checksums are enabled. The new record type is replayed exactly the same as XLOG_FPI, but allows them to be tallied separately e.g. in pg_xlogdump.
* Fix bogus comments in XLogRecordAssembleHeikki Linnakangas2014-11-21
| | | | Pointed out by Michael Paquier
* Revamp the WAL record format.Heikki Linnakangas2014-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each WAL record now carries information about the modified relation and block(s) in a standardized format. That makes it easier to write tools that need that information, like pg_rewind, prefetching the blocks to speed up recovery, etc. There's a whole new API for building WAL records, replacing the XLogRecData chains used previously. The new API consists of XLogRegister* functions, which are called for each buffer and chunk of data that is added to the record. The new API also gives more control over when a full-page image is written, by passing flags to the XLogRegisterBuffer function. This also simplifies the XLogReadBufferForRedo() calls. The function can dig the relation and block number from the WAL record, so they no longer need to be passed as arguments. For the convenience of redo routines, XLogReader now disects each WAL record after reading it, copying the main data part and the per-block data into MAXALIGNed buffers. The data chunks are not aligned within the WAL record, but the redo routines can assume that the pointers returned by XLogRecGet* functions are. Redo routines are now passed the XLogReaderState, which contains the record in the already-disected format, instead of the plain XLogRecord. The new record format also makes the fixed size XLogRecord header smaller, by removing the xl_len field. The length of the "main data" portion is now stored at the end of the WAL record, and there's a separate header after XLogRecord for it. The alignment padding at the end of XLogRecord is also removed. This compansates for the fact that the new format would otherwise be more bulky than the old format. Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera, Fujii Masao.
* Move the backup-block logic from XLogInsert to a new file, xloginsert.c.Heikki Linnakangas2014-11-06
xlog.c is huge, this makes it a little bit smaller, which is nice. Functions related to putting together the WAL record are in xloginsert.c, and the lower level stuff for managing WAL buffers and such are in xlog.c. Also move the definition of XLogRecord to a separate header file. This causes churn in the #includes of all the files that write WAL records, and redo routines, but it avoids pulling in xlog.h into most places. Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.