aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/transam/xlogutils.c
Commit message (Collapse)AuthorAge
* Revert recovery prefetching feature.Thomas Munro2021-05-10
| | | | | | | | | | | | | | | | | | This set of commits has some bugs with known fixes, but at this late stage in the release cycle it seems best to revert and resubmit next time, along with some new automated test coverage for this whole area. Commits reverted: dc88460c: Doc: Review for "Optionally prefetch referenced data in recovery." 1d257577: Optionally prefetch referenced data in recovery. f003d9f8: Add circular WAL decoding buffer. 323cbe7c: Remove read_page callback from XLogReader. Remove the new GUC group WAL_RECOVERY recently added by a55a9847, as the corresponding section of config.sgml is now reverted. Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com
* Optionally prefetch referenced data in recovery.Thomas Munro2021-04-08
| | | | | | | | | | | | | | | | | | | | | | | | Introduce a new GUC recovery_prefetch, disabled by default. When enabled, look ahead in the WAL and try to initiate asynchronous reading of referenced data blocks that are not yet cached in our buffer pool. For now, this is done with posix_fadvise(), which has several caveats. Better mechanisms will follow in later work on the I/O subsystem. The GUC maintenance_io_concurrency is used to limit the number of concurrent I/Os we allow ourselves to initiate, based on pessimistic heuristics used to infer that I/Os have begun and completed. The GUC wal_decode_buffer_size is used to limit the maximum distance we are prepared to read ahead in the WAL to find uncached blocks. Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts) Reviewed-by: Andres Freund <andres@anarazel.de> (parts) Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts) Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
* Add circular WAL decoding buffer.Thomas Munro2021-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | Teach xlogreader.c to decode its output into a circular buffer, to support optimizations based on looking ahead. * XLogReadRecord() works as before, consuming records one by one, and allowing them to be examined via the traditional XLogRecGetXXX() macros. * An alternative new interface XLogNextRecord() is added that returns pointers to DecodedXLogRecord structs that can be examined directly. * XLogReadAhead() provides a second cursor that lets you see further ahead, as long as data is available and there is enough space in the decoding buffer. This returns DecodedXLogRecord pointers to the caller, but also adds them to a queue of records that will later be consumed by XLogNextRecord()/XLogReadRecord(). The buffer's size is controlled with wal_decode_buffer_size. The buffer could potentially be placed into shared memory, for future projects. Large records that don't fit in the circular buffer are called "oversized" and allocated separately with palloc(). Discussion: https://postgr.es/m/CA+hUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq=AovOddfHpA@mail.gmail.com
* Remove read_page callback from XLogReader.Thomas Munro2021-04-08
| | | | | | | | | | | | | | | | | | | | | Previously, the XLogReader module would fetch new input data using a callback function. Redesign the interface so that it tells the caller to insert more data with a special return value instead. This API suits later patches for prefetching, encryption and maybe other future projects that would otherwise require continually extending the callback interface. As incidental cleanup work, move global variables readOff, readLen and readSegNo inside XlogReaderState. Author: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Author: Heikki Linnakangas <hlinnaka@iki.fi> (parts of earlier version) Reviewed-by: Antonin Houska <ah@cybertec.at> Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> Reviewed-by: Takashi Menjo <takashi.menjo@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/20190418.210257.43726183.horiguchi.kyotaro%40lab.ntt.co.jp
* Fix some typos, grammar and style in docs and commentsMichael Paquier2021-02-24
| | | | | | | | The portions fixing the documentation are backpatched where needed. Author: Justin Pryzby Discussion: https://postgr.es/m/20210210235557.GQ20012@telsasoft.com backpatch-through: 9.6
* Simplify printing of LSNsPeter Eisentraut2021-02-23
| | | | | | | | | | Add a macro LSN_FORMAT_ARGS for use in printf-style printing of LSNs. Convert all applicable code to use it. Reviewed-by: Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://www.postgresql.org/message-id/flat/CAExHW5ub5NaTELZ3hJUCE6amuvqAtsSxc7O+uK7y4t9Rrk23cw@mail.gmail.com
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Improve hash_create()'s API for some added robustness.Tom Lane2020-12-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Invent a new flag bit HASH_STRINGS to specify C-string hashing, which was formerly the default; and add assertions insisting that exactly one of the bits HASH_STRINGS, HASH_BLOBS, and HASH_FUNCTION be set. This is in hopes of preventing recurrences of the type of oversight fixed in commit a1b8aa1e4 (i.e., mistakenly omitting HASH_BLOBS). Also, when HASH_STRINGS is specified, insist that the keysize be more than 8 bytes. This is a heuristic, but it should catch accidental use of HASH_STRINGS for integer or pointer keys. (Nearly all existing use-cases set the keysize to NAMEDATALEN or more, so there's little reason to think this restriction should be problematic.) Tweak hash_create() to insist that the HASH_ELEM flag be set, and remove the defaults it had for keysize and entrysize. Since those defaults were undocumented and basically useless, no callers omitted HASH_ELEM anyway. Also, remove memset's zeroing the HASHCTL parameter struct from those callers that had one. This has never been really necessary, and while it wasn't a bad coding convention it was confusing that some callers did it and some did not. We might as well save a few cycles by standardizing on "not". Also improve the documentation for hash_create(). In passing, improve reinit.c's usage of a hash table by storing the key as a binary Oid rather than a string; and, since that's a temporary hash table, allocate it in CurrentMemoryContext for neatness. Discussion: https://postgr.es/m/590625.1607878171@sss.pgh.pa.us
* Centralize logic for skipping useless ereport/elog calls.Tom Lane2020-11-23
| | | | | | | | | | | | | | | | | | | | | While ereport() and elog() themselves are quite cheap when the error message level is too low to be printed, some places need to do substantial work before they can call those macros at all. To allow optimizing away such setup work when nothing is to be printed, make elog.c export a new function message_level_is_interesting(elevel) that reports whether ereport/elog will do anything. Make use of that in various places that had ad-hoc direct tests of log_min_messages etc. Also teach ProcSleep to use it to avoid some work. (There may well be other places that could usefully use this; I didn't search hard.) Within elog.c, refactor a little bit to avoid having duplicate copies of the policy-setting logic. When that code was written, we weren't relying on the availability of inline functions; so it had some duplications in the name of efficiency, which I got rid of. Alvaro Herrera and Tom Lane Discussion: https://postgr.es/m/129515.1606166429@sss.pgh.pa.us
* Fix obsolete comment in xlogutils.c.Peter Geoghegan2020-08-14
| | | | Oversight in commit 2c03216d831.
* code: replace 'master' with 'primary' where appropriate.Andres Freund2020-07-08
| | | | | | | | | Also changed "in the primary" to "on the primary", and added a few "the" before "primary". Author: Andres Freund Reviewed-By: David Steele Discussion: https://postgr.es/m/20200615182235.x7lch5n6kcjq4aue@alap3.anarazel.de
* Adjust walsender usage of xlogreader, simplify APIsAlvaro Herrera2020-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Have both physical and logical walsender share a 'xlogreader' state struct for tracking state. This replaces the existing globals sendSeg and sendCxt. * Change WALRead not to receive XLogReaderState->seg and ->segcxt as separate arguments anymore; just use the ones from 'state'. This is made possible by the above change. * have the XLogReader segment_open contract require the callbacks to install the file descriptor in the state struct themselves instead of returning it. xlogreader was already ignoring any possible failed return from the callbacks, relying solely on them never returning. (This point is not altogether excellent, as it means the callbacks have to know more of XLogReaderState; but to really improve on that we would have to pass back error info from the callbacks to xlogreader. And the complexity would not be saved but instead just transferred to the callers of WALRead, which would have to learn how to throw errors from the open_segment callback in addition of, as currently, from pg_pread.) * segment_open no longer receives the 'segcxt' as a separate argument, since it's part of the XLogReaderState argument. Per comments from Kyotaro Horiguchi. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200511203336.GA9913@alvherre.pgsql
* Fix comment in xlogutils.cMichael Paquier2020-05-12
| | | | | | | | | | | The existing callers of XLogReadDetermineTimeline() performing recovery need to check a replay LSN position when determining on which timeline to read a WAL page. A portion of the comment describing this function said exactly that, while referring to a routine for fetching a write LSN, something not available in recovery. Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20200511.101619.2043820539323292957.horikyota.ntt@gmail.com
* Rework XLogReader callback systemAlvaro Herrera2020-05-08
| | | | | | | | | | | | | | | | | | | Code review for 0dc8ead46363, prompted by a bug closed by 91c40548d5f7. XLogReader's system for opening and closing segments had gotten too complicated, with callbacks being passed at both the XLogReaderAllocate level (read_page) as well as at the WALRead level (segment_open). This was confusing and hard to follow, so restructure things so that these callbacks are passed together at XLogReaderAllocate time, and add another callback to the set (segment_close) to make it a coherent whole. Also, ensure XLogReaderState is an argument to all the callbacks, so that they can grab at the ->private data if necessary. Document the whole arrangement more clearly. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20200422175754.GA19858@alvherre.pgsql
* Skip WAL for new relfilenodes, under wal_level=minimal.Noah Misch2020-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN. Future servers accept older WAL, so this bump is discretionary. Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
* Revert "Skip WAL for new relfilenodes, under wal_level=minimal."Noah Misch2020-03-22
| | | | | | | | This reverts commit cb2fd7eac285b1b0a24eeb2b8ed4456b66c5a09f. Per numerous buildfarm members, it was incompatible with parallel query, and a test case assumed LP64. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
* Skip WAL for new relfilenodes, under wal_level=minimal.Noah Misch2020-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Back-patch to 9.5 (all supported versions). This introduces a new WAL record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As always, update standby systems before master systems. This changes sizeof(RelationData) and sizeof(IndexStmt), breaking binary compatibility for affected extensions. (The most recent commit to affect the same class of extensions was 089e4d405d0f3b94c74a2c6a54357a84a681754b.) Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
* Add GUC ignore_invalid_pages.Fujii Masao2020-01-22
| | | | | | | | | | | | | | | Detection of WAL records having references to invalid pages during recovery causes PostgreSQL to raise a PANIC-level error, aborting the recovery. Setting ignore_invalid_pages to on causes the system to ignore those WAL records (but still report a warning), and continue recovery. This behavior may cause crashes, data loss, propagate or hide corruption, or other serious problems. However, it may allow you to get past the PANIC-level error, to finish the recovery, and to cause the server to start up. Author: Fujii Masao Reviewed-by: Michael Paquier Discussion: https://www.postgresql.org/message-id/CAHGQGwHCK6f77yeZD4MHOnN+PaTf6XiJfEB+Ce7SksSHjeAWtg@mail.gmail.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Remove XLogFileNameP() from the treeMichael Paquier2019-12-03
| | | | | | | | | | | | | | | | | | | | XLogFileNameP() is a wrapper routine able to build a palloc'd string for a WAL segment name, which is used for error string generation. There were several code paths where it gets called in a critical section, where memory allocation is not allowed. This results in triggering an assertion failure instead of generating the wanted error message. Another, more annoying, problem is that if the allocation to generate the WAL segment name fails on OOM, then the failure would be escalated to a PANIC. This removes the routine and all its callers are replaced with a logic using a fixed-size buffer. This way, all the existing mistakes are fixed and future ones are prevented. Author: Masahiko Sawada Reviewed-by: Michael Paquier, Álvaro Herrera Discussion: https://postgr.es/m/CA+fd4k5gC9H4uoWMLg9K_QfNrnkkdEw+-AFveob9YX7z8JnKTA@mail.gmail.com
* Refactor WAL file-reading code into WALRead()Alvaro Herrera2019-11-25
| | | | | | | | | | | | | | | | | | | | | XLogReader, walsender and pg_waldump all had their own routines to read data from WAL files to memory, with slightly different approaches according to the particular conditions of each environment. There's a lot of commonality, so we can refactor that into a single routine WALRead in XLogReader, and move the differences to a separate (simpler) callback that just opens the next WAL-segment. This results in a clearer (ahem) code flow. The error reporting needs are covered by filling in a new error-info struct, WALReadError, and it's the caller's responsibility to act on it. The backend has WALReadRaiseError() to do so. We no longer ever need to seek in this interface; switch to using pg_pread(). Author: Antonin Houska, with contributions from Álvaro Herrera Reviewed-by: Michaël Paquier, Kyotaro Horiguchi Discussion: https://postgr.es/m/14984.1554998742@spoje.net
* Rework WAL-reading supporting structsAlvaro Herrera2019-09-24
| | | | | | | | | | | | | | The state-tracking of WAL reading in various places was pretty messy, mostly because the ancient physical-replication WAL reading code wasn't using the XLogReader abstraction. This led to some untidy code. Make it prettier by creating two additional supporting structs, WALSegmentContext and WALOpenSegment which keep track of WAL-reading state. This makes code cleaner, as well as supports more future cleanup. Author: Antonin Houska Reviewed-by: Álvaro Herrera and (older versions) Robert Haas Discussion: https://postgr.es/m/14984.1554998742@spoje.net
* Fix inconsistencies and typos in the tree, take 11Michael Paquier2019-08-19
| | | | | | | | This fixes various typos in docs and comments, and removes some orphaned definitions. Author: Alexander Lakhin Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Address set of issues with errno handlingMichael Paquier2018-06-25
| | | | | | | | | | | | | | | | | | | | System calls mixed up in error code paths are causing two issues which several code paths have not correctly handled: 1) For write() calls, sometimes the system may return less bytes than what has been written without errno being set. Some paths were careful enough to consider that case, and assumed that errno should be set to ENOSPC, other calls missed that. 2) errno generated by a system call is overwritten by other system calls which may succeed once an error code path is taken, causing what is reported to the user to be incorrect. This patch uses the brute-force approach of correcting all those code paths. Some refactoring could happen in the future, but this is let as future work, which is not targeted for back-branches anyway. Author: Michael Paquier Reviewed-by: Ashutosh Sharma Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz
* Further cleanup of client dependencies on src/include/catalog headers.Tom Lane2018-04-09
| | | | | | | | | | | | | | | | | In commit 9c0a0de4c, I'd failed to notice that catalog/catalog.h should also be considered a frontend-unsafe header, because it includes (and needs) the full form of pg_class.h, not to mention relcache.h. However, various frontend code was depending on it to get TABLESPACE_VERSION_DIRECTORY, so refactoring of some sort is called for. The cleanest answer seems to be to move TABLESPACE_VERSION_DIRECTORY, as well as the OIDCHARS symbol, to common/relpath.h. Do that, and mop up inclusions as necessary. (I found that quite a few current users of catalog/catalog.h don't seem to need it at all anymore, apparently as a result of the refactorings that created common/relpath.[hc]. And initdb.c needed it only as a route to pg_class_d.h.) Discussion: https://postgr.es/m/6629.1523294509@sss.pgh.pa.us
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Update typedefs.list and re-run pgindentRobert Haas2017-11-29
| | | | Discussion: http://postgr.es/m/CA+TgmoaA9=1RWKtBWpDaj+sF3Stgc8sHgf5z=KGtbjwPLQVDMA@mail.gmail.com
* Refactor new file permission handlingPeter Eisentraut2017-09-23
| | | | | | | | | | | | | | | | | | | The file handling functions from fd.c were called with a diverse mix of notations for the file permissions when they were opening new files. Almost all files created by the server should have the same permissions set. So change the API so that e.g. OpenTransientFile() automatically uses the standard permissions set, and OpenTransientFilePerm() is a new function that takes an explicit permissions set for the few cases where it is needed. This also saves an unnecessary argument for call sites that are just opening an existing file. While we're reviewing these APIs, get rid of the FileName typedef and use the standard const char * for the file name and mode_t for the file mode. This makes these functions match other file handling functions and removes an unnecessary layer of mysteriousness. We can also get rid of a few casts that way. Author: David Steele <david@pgmasters.net>
* Make WAL segment size configurable at initdb time.Andres Freund2017-09-19
| | | | | | | | | | | | | | | | | | | | | | | For performance reasons a larger segment size than the default 16MB can be useful. A larger segment size has two main benefits: Firstly, in setups using archiving, it makes it easier to write scripts that can keep up with higher amounts of WAL, secondly, the WAL has to be written and synced to disk less frequently. But at the same time large segment size are disadvantageous for smaller databases. So far the segment size had to be configured at compile time, often making it unrealistic to choose one fitting to a particularly load. Therefore change it to a initdb time setting. This includes a breaking changes to the xlogreader.h API, which now requires the current segment size to be configured. For that and similar reasons a number of binaries had to be taught how to recognize the current segment size. Author: Beena Emerson, editorialized by Andres Freund Reviewed-By: Andres Freund, David Steele, Kuntal Ghosh, Michael Paquier, Peter Eisentraut, Robert Hass, Tushar Ahuja Discussion: https://postgr.es/m/CAOG9ApEAcQ--1ieKbhFzXSQPw_YLmepaa4hNdnY5+ZULpt81Mw@mail.gmail.com
* Phase 3 of pgindent updates.Tom Lane2017-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | Don't move parenthesized lines to the left, even if that means they flow past the right margin. By default, BSD indent lines up statement continuation lines that are within parentheses so that they start just to the right of the preceding left parenthesis. However, traditionally, if that resulted in the continuation line extending to the right of the desired right margin, then indent would push it left just far enough to not overrun the margin, if it could do so without making the continuation line start to the left of the current statement indent. That makes for a weird mix of indentations unless one has been completely rigid about never violating the 80-column limit. This behavior has been pretty universally panned by Postgres developers. Hence, disable it with indent's new -lpl switch, so that parenthesized lines are always lined up with the preceding left paren. This patch is much less interesting than the first round of indent changes, but also bulkier, so I thought it best to separate the effects. Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
* Post-PG 10 beta1 pgindent runBruce Momjian2017-05-17
| | | | perltidy run not included.
* Replace "transaction log" with "write-ahead log"Peter Eisentraut2017-05-12
| | | | | This makes documentation and error messages match the renaming of "xlog" to "wal" in APIs and file naming.
* Fix typo in commentPeter Eisentraut2017-04-16
| | | | Author: Masahiko Sawada <sawada.mshk@gmail.com>
* Teach xlogreader to follow timeline switchesSimon Riggs2017-03-22
| | | | | | | | | | Uses page-based mechanism to ensure we’re using the correct timeline. Tests are included to exercise the functionality using a cold disk-level copy of the master that's started up as a replica with slots intact, but the intended use of the functionality is with later features. Craig Ringer, reviewed by Simon Riggs and Andres Freund
* Create and use wait events for read, write, and fsync operations.Robert Haas2017-03-18
| | | | | | | | | | | | | Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and 6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from pg_stat_activity when a backend was stuck waiting for another backend, but it's also fairly common for a backend to be stuck waiting for an I/O. Add wait events for those operations, too. Rushabh Lathia, with further hacking by me. Reviewed and tested by Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed. Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
* Rename user-facing tools with "xlog" in the name to say "wal".Robert Haas2017-02-09
| | | | | This means pg_receivexlog because pg_receivewal, pg_resetxlog becomes pg_resetwal, and pg_xlogdump becomes pg_waldump.
* Add WAL consistency checking facility.Robert Haas2017-02-08
| | | | | | | | | | | | | | When the new GUC wal_consistency_checking is set to a non-empty value, it triggers recording of additional full-page images, which are compared on the standby against the results of applying the WAL record (without regard to those full-page images). Allowable differences such as hints are masked out, and the resulting pages are compared; any difference results in a FATAL error on the standby. Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki Linnakangas. Extensively reviewed and revised by Michael Paquier and by me, with additional reviews and comments from Amit Kapila, Álvaro Herrera, Simon Riggs, and Peter Eisentraut.
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Fix bogus commentsAlvaro Herrera2016-05-12
| | | | | | Some comments mentioned XLogReplayBuffer, but there's no such function: that was an interim name for a function that got renamed to XLogReadBufferForRedo, before commit 2c03216d831160 was pushed.
* Revert timeline following in replication slotsAlvaro Herrera2016-05-04
| | | | | | | | | This reverts commits f07d18b6e94d, 82c83b337202, 3a3b309041b0, and 24c5f1a103ce. This feature has shown enough immaturity that it was deemed better to rip it out before rushing some more fixes at the last minute. There are discussions on larger changes in this area for the next release.
* Revert no-op changes to BufferGetPage()Kevin Grittner2016-04-20
| | | | | | | | | | | | | | | | | | The reverted changes were intended to force a choice of whether any newly-added BufferGetPage() calls needed to be accompanied by a test of the snapshot age, to support the "snapshot too old" feature. Such an accompanying test is needed in about 7% of the cases, where the page is being used as part of a scan rather than positioning for other purposes (such as DML or vacuuming). The additional effort required for back-patching, and the doubt whether the intended benefit would really be there, have indicated it is best just to rely on developers to do the right thing based on comments and existing usage, as we do with many other conventions. This change should have little or no effect on generated executable code. Motivated by the back-patching pain of Tom Lane and Robert Haas
* Modify BufferGetPage() to prepare for "snapshot too old" featureKevin Grittner2016-04-08
| | | | | | | | | | | This patch is a no-op patch which is intended to reduce the chances of failures of omission once the functional part of the "snapshot too old" patch goes in. It adds parameters for snapshot, relation, and an enum to specify whether the snapshot age check needs to be done for the page at this point. This initial patch passes NULL for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the third. The follow-on patch will change the places where the test needs to be made.
* Enable logical slots to follow timeline switchesAlvaro Herrera2016-03-30
| | | | | | | | | | | | | | | | | | When decoding from a logical slot, it's necessary for xlog reading to be able to read xlog from historical (i.e. not current) timelines; otherwise, decoding fails after failover, because the archives are in the historical timeline. This is required to make "failover logical slots" possible; it currently has no other use, although theoretically it could be used by an extension that creates a slot on a standby and continues to replay from the slot when the standby is promoted. This commit includes a module in src/test/modules with functions to manipulate the slots (which is not otherwise possible in SQL code) in order to enable testing, and a new test in src/test/recovery to ensure that the behavior is as expected. Author: Craig Ringer Reviewed-By: Oleksii Kliukin, Andres Freund, Petr Jelínek
* XLogReader general code cleanupAlvaro Herrera2016-03-30
| | | | | | | | | Some minor tweaks and comment additions, for cleanliness sake and to avoid having the upcoming timeline-following patch be polluted with unrelated cleanup. Extracted from a larger patch by Craig Ringer, reviewed by Andres Freund, with some additions by myself.
* Refactor to create generic WAL page read callbackSimon Riggs2016-01-20
| | | | | | | | | | Previously we didn’t have a generic WAL page read callback function, surprisingly. Logical decoding has logical_read_local_xlog_page(), which was actually generic, so move that to xlogfunc.c and rename to read_local_xlog_page(). Maintain logical_read_local_xlog_page() so existing callers still work. As requested by Michael Paquier, Alvaro Herrera and Andres Freund
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Fix bug leading to restoring unlogged relations from empty files.Andres Freund2015-12-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the end of crash recovery, unlogged relations are reset to the empty state, using their init fork as the template. The init fork is copied to the main fork without going through shared buffers. Unfortunately WAL replay so far has not necessarily flushed writes from shared buffers to disk at that point. In normal crash recovery, and before the introduction of 'fast promotions' in fd4ced523 / 9.3, the END_OF_RECOVERY checkpoint flushes the buffers out in time. But with fast promotions that's not the case anymore. To fix, force WAL writes targeting the init fork to be flushed immediately (using the new FlushOneBuffer() function). In 9.5+ that flush can centrally be triggered from the code dealing with restoring full page writes (XLogReadBufferForRedoExtended), in earlier releases that responsibility is in the hands of XLOG_HEAP_NEWPAGE's replay function. Backpatch to 9.1, even if this currently is only known to trigger in 9.3+. Flushing earlier is more robust, and it is advantageous to keep the branches similar. Typical symptoms of this bug are errors like 'ERROR: index "..." contains unexpected zero page at block 0' shortly after promoting a node. Reported-By: Thom Brown Author: Andres Freund and Michael Paquier Discussion: 20150326175024.GJ451@alap3.anarazel.de Backpatch: 9.1-
* Sanity-check that a page zeroed by redo routine is marked with WILL_INIT.Heikki Linnakangas2015-07-20
| | | | | | | | | | | | | | | There was already a sanity-check in the other direction: if a page was marked with WILL_INIT, it had to be initialized by the redo routine. It's not strictly necessary for correctness that a page is marked with WILL_INIT if it's going to be initialized at redo, but it's a missed optimization if nothing else. Fix a few instances of this issue in SP-GiST, where a block in WAL record was not marked with WILL_INIT, but was in fact always initialized at redo. We were creating a full-page image of the page unnecessarily in those cases. Backpatch to 9.5, where the new WILL_INIT flag was added.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0