aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/transam/xlogreader.c
Commit message (Collapse)AuthorAge
...
* Revert timeline following in replication slotsAlvaro Herrera2016-05-04
| | | | | | | | | This reverts commits f07d18b6e94d, 82c83b337202, 3a3b309041b0, and 24c5f1a103ce. This feature has shown enough immaturity that it was deemed better to rip it out before rushing some more fixes at the last minute. There are discussions on larger changes in this area for the next release.
* Silence compiler warningAlvaro Herrera2016-04-04
| | | | Reported by Peter Eisentraut to occur on 32bit systems
* Enable logical slots to follow timeline switchesAlvaro Herrera2016-03-30
| | | | | | | | | | | | | | | | | | When decoding from a logical slot, it's necessary for xlog reading to be able to read xlog from historical (i.e. not current) timelines; otherwise, decoding fails after failover, because the archives are in the historical timeline. This is required to make "failover logical slots" possible; it currently has no other use, although theoretically it could be used by an extension that creates a slot on a standby and continues to replay from the slot when the standby is promoted. This commit includes a module in src/test/modules with functions to manipulate the slots (which is not otherwise possible in SQL code) in order to enable testing, and a new test in src/test/recovery to ensure that the behavior is as expected. Author: Craig Ringer Reviewed-By: Oleksii Kliukin, Andres Freund, Petr JelĂ­nek
* XLogReader general code cleanupAlvaro Herrera2016-03-30
| | | | | | | | | Some minor tweaks and comment additions, for cleanliness sake and to avoid having the upcoming timeline-following patch be polluted with unrelated cleanup. Extracted from a larger patch by Craig Ringer, reviewed by Andres Freund, with some additions by myself.
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Fix message punctuation according to style guidePeter Eisentraut2015-10-01
|
* Another attempt at fixing memory leak in xlogreader.Heikki Linnakangas2015-07-28
| | | | | | max_block_id is also reset between reading records. Michael Paquier
* Fix memory leak in xlogreader facility.Heikki Linnakangas2015-07-27
| | | | | | | XLogReaderFree failed to free the per-block data buffers, when they happened to not be used by the latest read WAL record. Michael Paquier. Backpatch to 9.5, where the per-block buffers were added.
* pgindent run for 9.5Bruce Momjian2015-05-23
|
* Introduce replication progress tracking infrastructure.Andres Freund2015-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When implementing a replication solution ontop of logical decoding, two related problems exist: * How to safely keep track of replication progress * How to change replication behavior, based on the origin of a row; e.g. to avoid loops in bi-directional replication setups The solution to these problems, as implemented here, consist out of three parts: 1) 'replication origins', which identify nodes in a replication setup. 2) 'replication progress tracking', which remembers, for each replication origin, how far replay has progressed in a efficient and crash safe manner. 3) The ability to filter out changes performed on the behest of a replication origin during logical decoding; this allows complex replication topologies. E.g. by filtering all replayed changes out. Most of this could also be implemented in "userspace", e.g. by inserting additional rows contain origin information, but that ends up being much less efficient and more complicated. We don't want to require various replication solutions to reimplement logic for this independently. The infrastructure is intended to be generic enough to be reusable. This infrastructure also replaces the 'nodeid' infrastructure of commit timestamps. It is intended to provide all the former capabilities, except that there's only 2^16 different origins; but now they integrate with logical decoding. Additionally more functionality is accessible via SQL. Since the commit timestamp infrastructure has also been introduced in 9.5 (commit 73c986add) changing the API is not a problem. For now the number of origins for which the replication progress can be tracked simultaneously is determined by the max_replication_slots GUC. That GUC is not a perfect match to configure this, but there doesn't seem to be sufficient reason to introduce a separate new one. Bumps both catversion and wal page magic. Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer Discussion: 20150216002155.GI15326@awork2.anarazel.de, 20140923182422.GA15776@alap3.anarazel.de, 20131114172632.GE7522@alap2.anarazel.de
* Reorganize our CRC source files again.Heikki Linnakangas2015-04-14
| | | | | | | | | | Now that we use CRC-32C in WAL and the control file, the "traditional" and "legacy" CRC-32 variants are not used in any frontend programs anymore. Move the code for those back from src/common to src/backend/utils/hash. Also move the slicing-by-8 implementation (back) to src/port. This is in preparation for next patch that will add another implementation that uses Intel SSE 4.2 instructions to calculate CRC-32C, where available.
* Fix error handling of XLogReaderAllocate in case of OOMFujii Masao2015-04-03
| | | | | | | | | | | Similarly to previous fix 9b8d478, commit 2c03216 has switched XLogReaderAllocate() to use a set of palloc calls instead of malloc, causing any callers of this function to fail with an error instead of receiving a NULL pointer in case of out-of-memory error. Fix this by using palloc_extended with MCXT_ALLOC_NO_OOM that will safely return NULL in case of an OOM. Michael Paquier, slightly modified by me.
* Rework handling of OOM when allocating record buffer in XLOG reader.Fujii Masao2015-04-03
| | | | | | | | | | Commit 2c03216 changed allocate_recordbuf() so that it uses a palloc to allocate the read buffer and fails immediately when an out-of-memory error shows up, even though its callers still expect that NULL is returned in that case. This bug is fixed making allocate_recordbuf() use a palloc_extended with MCXT_ALLOC_NO_OOM flag and return NULL in OOM case. Michael Paquier
* Tweak __attribute__-wrapping macros for better pgindent results.Tom Lane2015-03-26
| | | | | | | | | | | | | | | | | | | | | This improves on commit bbfd7edae5aa5ad5553d3c7e102f2e450d4380d4 by making two simple changes: * pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn(). Likewise pg_attribute_unused(), pg_attribute_packed(). This reduces pgindent's tendency to misformat declarations involving them. * attributes are now always attached to function declarations, not definitions. Previously some places were taking creative shortcuts, which were not merely candidates for bad misformatting by pgindent but often were outright wrong anyway. (It does little good to put a noreturn annotation where callers can't see it.) In any case, if we would like to believe that these macros can be used with non-gcc compilers, we should avoid gratuitous variance in usage patterns. I also went through and manually improved the formatting of a lot of declarations, and got rid of excessively repetitive (and now obsolete anyway) comments informing the reader what pg_attribute_printf is for.
* Add macros wrapping all usage of gcc's __attribute__.Andres Freund2015-03-11
| | | | | | | | | | | | | | | | | | | | Until now __attribute__() was defined to be empty for all compilers but gcc. That's problematic because it prevents using it in other compilers; which is necessary e.g. for atomics portability. It's also just generally dubious to do so in a header as widely included as c.h. Instead add pg_attribute_format_arg, pg_attribute_printf, pg_attribute_noreturn macros which are implemented in the compilers that understand them. Also add pg_attribute_noreturn and pg_attribute_packed, but don't provide fallbacks, since they can affect functionality. This means that external code that, possibly unwittingly, relied on __attribute__ defined to be empty on !gcc compilers may now run into warnings or errors on those compilers. But there shouldn't be many occurances of that and it's hard to work around... Discussion: 54B58BA3.8040302@ohmu.fi Author: Oskari Saarenmaa, with some minor changes by me.
* Add GUC to enable compression of full page images stored in WAL.Fujii Masao2015-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When newly-added GUC parameter, wal_compression, is on, the PostgreSQL server compresses a full page image written to WAL when full_page_writes is on or during a base backup. A compressed page image will be decompressed during WAL replay. Turning this parameter on can reduce the WAL volume without increasing the risk of unrecoverable data corruption, but at the cost of some extra CPU spent on the compression during WAL logging and on the decompression during WAL replay. This commit changes the WAL format (so bumping WAL version number) so that the one-byte flag indicating whether a full page image is compressed or not is included in its header information. This means that the commit increases the WAL volume one-byte per a full page image even if WAL compression is not used at all. We can save that one-byte by borrowing one-bit from the existing field like hole_offset in the header and using it as the flag, for example. But which would reduce the code readability and the extensibility of the feature. Per discussion, it's not worth paying those prices to save only one-byte, so we decided to add the one-byte flag to the header. This commit doesn't introduce any new compression algorithm like lz4. Currently a full page image is compressed using the existing PGLZ algorithm. Per discussion, we decided to use it at least in the first version of the feature because there were no performance reports showing that its compression ratio is unacceptably lower than that of other algorithm. Of course, in the future, it's worth considering the support of other compression algorithm for the better compression. Rahila Syed and Michael Paquier, reviewed in various versions by myself, Andres Freund, Robert Haas, Abhijit Menon-Sen and many others.
* Add missing "goto err" statements in xlogreader.c.Fujii Masao2015-03-09
| | | | Spotted by Andres Freund.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Remove superflous variable from xlogreader's XLogFindNextRecord().Andres Freund2015-01-04
| | | | | | | Pointed out by Coverity. Since this is mere, and debatable, cosmetics I'm not backpatching this.
* Revamp the WAL record format.Heikki Linnakangas2014-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each WAL record now carries information about the modified relation and block(s) in a standardized format. That makes it easier to write tools that need that information, like pg_rewind, prefetching the blocks to speed up recovery, etc. There's a whole new API for building WAL records, replacing the XLogRecData chains used previously. The new API consists of XLogRegister* functions, which are called for each buffer and chunk of data that is added to the record. The new API also gives more control over when a full-page image is written, by passing flags to the XLogRegisterBuffer function. This also simplifies the XLogReadBufferForRedo() calls. The function can dig the relation and block number from the WAL record, so they no longer need to be passed as arguments. For the convenience of redo routines, XLogReader now disects each WAL record after reading it, copying the main data part and the per-block data into MAXALIGNed buffers. The data chunks are not aligned within the WAL record, but the redo routines can assume that the pointers returned by XLogRecGet* functions are. Redo routines are now passed the XLogReaderState, which contains the record in the already-disected format, instead of the plain XLogRecord. The new record format also makes the fixed size XLogRecord header smaller, by removing the xl_len field. The length of the "main data" portion is now stored at the end of the WAL record, and there's a separate header after XLogRecord for it. The alignment padding at the end of XLogRecord is also removed. This compansates for the fact that the new format would otherwise be more bulky than the old format. Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera, Fujii Masao.
* Move the backup-block logic from XLogInsert to a new file, xloginsert.c.Heikki Linnakangas2014-11-06
| | | | | | | | | | | | xlog.c is huge, this makes it a little bit smaller, which is nice. Functions related to putting together the WAL record are in xloginsert.c, and the lower level stuff for managing WAL buffers and such are in xlog.c. Also move the definition of XLogRecord to a separate header file. This causes churn in the #includes of all the files that write WAL records, and redo routines, but it avoids pulling in xlog.h into most places. Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
* Switch to CRC-32C in WAL and other places.Heikki Linnakangas2014-11-04
| | | | | | | | | | | | | | | | | | | | | | | The old algorithm was found to not be the usual CRC-32 algorithm, used by Ethernet et al. We were using a non-reflected lookup table with code meant for a reflected lookup table. That's a strange combination that AFAICS does not correspond to any bit-wise CRC calculation, which makes it difficult to reason about its properties. Although it has worked well in practice, seems safer to use a well-known algorithm. Since we're changing the algorithm anyway, we might as well choose a different polynomial. The Castagnoli polynomial has better error-correcting properties than the traditional CRC-32 polynomial, even if we had implemented it correctly. Another reason for picking that is that some new CPUs have hardware support for calculating CRC-32C, but not CRC-32, let alone our strange variant of it. This patch doesn't add any support for such hardware, but a future patch could now do that. The old algorithm is kept around for tsquery and pg_trgm, which use the values in indexes that need to remain compatible so that pg_upgrade works. While we're at it, share the old lookup table for CRC-32 calculation between hstore, ltree and core. They all use the same table, so might as well.
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Fix typos in comments.Heikki Linnakangas2013-10-24
|
* Minor spelling fixesStephen Frost2013-06-01
| | | | | | Fix a few spelling mistakes. Per bug report #8193 from Lajos Veres.
* pgindent run for release 9.3Bruce Momjian2013-05-29
| | | | | This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
* Remove duplicate initialization in XLogReadRecord.Robert Haas2013-04-09
| | | | Per a note from Dickson S. Guedes.
* Use the right timeline when beginning to stream from master.Heikki Linnakangas2013-01-18
| | | | | | | | | | | | | | | | | | | | | The xlogreader refactoring broke the logic to decide which timeline to start streaming from. XLogPageRead() uses the timeline history to check which timeline the requested WAL position falls into. However, after the refactoring, XLogPageRead() is always first called with the first page in the segment, to verify the segment header, and only then with the actual WAL position we're interested in. That first read of the segment's header made XLogPageRead() to always start streaming from the old timeline containing the segment header, not the timeline containing the actual record, if there was a timeline switch within the segment. I thought I fixed this yesterday, but that fix was too narrow and only fixed this for the corner-case that the timeline switch happened in the first page of the segment. To fix this more robustly, pass explicitly the position of the record we're actually interested in to XLogPageRead, and use that to decide which timeline to read from, rather than deduce it from the page and offset. Per report from Fujii Masao.
* When xlogreader asks the callback function to read a page, make sure weHeikki Linnakangas2013-01-17
| | | | | | | | | get a large enough part of the page to include the beginning of the next record we're interested in. The XLogPageRead callback uses the requested length to decide which timeline to stream WAL from, and if the first call is short, and the page contains a timeline switch, we'll repeatedly try to stream that page from the old timeline, and never get across the timeline switch.
* Fix a couple of error-handling bugs in the xlogreader patch.Heikki Linnakangas2013-01-17
| | | | | | | | | | | | XLogReadRecord should reset its state on every error, to make sure it re-reads the page on next call. It was inconsistent in that some errors did that, but some did not. In ReadRecord(), don't give up on an error if we're in standby mode. The loop was set up to retry, but the checks within the loop broke out of the loop on any error. Andres Freund, with some tweaking by me.
* Split out XLog reading as an independent facilityAlvaro Herrera2013-01-16
This new facility can not only be used by xlog.c to carry out crash recovery, but also by external programs. By supplying a function to read XLog pages from somewhere, all the WAL reading can be used for completely different purposes. For the standard backend use, the behavior should be pretty much the same as previously. As for non-backend programs, an hypothetical pg_xlogdump program is now closer to reality, but some more backend support is still necessary. This patch was originally submitted by Andres Freund in a different form, but Heikki Linnakangas opted for and authored another design of the concept. Andres has advanced the patch since Heikki's initial version. Review and some (mostly cosmetics) changes by me.