aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/rmgrdesc/gistdesc.c
Commit message (Collapse)AuthorAge
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Skip WAL for new relfilenodes, under wal_level=minimal.Noah Misch2020-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Bump XLOG_PAGE_MAGIC, since this introduces XLOG_GIST_ASSIGN_LSN. Future servers accept older WAL, so this bump is discretionary. Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
* Revert "Skip WAL for new relfilenodes, under wal_level=minimal."Noah Misch2020-03-22
| | | | | | | | This reverts commit cb2fd7eac285b1b0a24eeb2b8ed4456b66c5a09f. Per numerous buildfarm members, it was incompatible with parallel query, and a test case assumed LP64. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
* Skip WAL for new relfilenodes, under wal_level=minimal.Noah Misch2020-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, only selected bulk operations (e.g. COPY) did this. If a given relfilenode received both a WAL-skipping COPY and a WAL-logged operation (e.g. INSERT), recovery could lose tuples from the COPY. See src/backend/access/transam/README section "Skipping WAL for New RelFileNode" for the new coding rules. Maintainers of table access methods should examine that section. To maintain data durability, just before commit, we choose between an fsync of the relfilenode and copying its contents to WAL. A new GUC, wal_skip_threshold, guides that choice. If this change slows a workload that creates small, permanent relfilenodes under wal_level=minimal, try adjusting wal_skip_threshold. Users setting a timeout on COMMIT may need to adjust that timeout, and log_min_duration_statement analysis will reflect time consumption moving to COMMIT from commands like COPY. Internally, this requires a reliable determination of whether RollbackAndReleaseCurrentSubTransaction() would unlink a relation's current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the specification of rd_createSubid such that the field is zero when a new rel has an old rd_node. Make relcache.c retain entries for certain dropped relations until end of transaction. Back-patch to 9.5 (all supported versions). This introduces a new WAL record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As always, update standby systems before master systems. This changes sizeof(RelationData) and sizeof(IndexStmt), breaking binary compatibility for affected extensions. (The most recent commit to affect the same class of extensions was 089e4d405d0f3b94c74a2c6a54357a84a681754b.) Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert Haas. Heikki Linnakangas and Michael Paquier implemented earlier designs that materially clarified the problem. Reviewed, in earlier designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane, Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout. Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Use full 64-bit XID for checking if a deleted GiST page is old enough.Heikki Linnakangas2019-07-24
| | | | | | | | | | | Otherwise, after a deleted page gets even older, it becomes unrecyclable again. B-tree has the same problem, and has had since time immemorial, but let's at least fix this in GiST, where this is new. Backpatch to v12, where GiST page deletion was introduced. Reviewed-by: Andrey Borodin Discussion: https://www.postgresql.org/message-id/835A15A5-F1B4-4446-A711-BF48357EB602%40yandex-team.ru
* Convert gist to compute page level xid horizon on primary.Andres Freund2019-04-22
| | | | | | | | | | | | | | | | | Due to parallel development, gist added the missing conflict information in c952eae52a3, while 558a9165e08 moved that computation to the primary for the index types that already had it. Thus adapt gist to also compute on the primary, using index_compute_xid_horizon_for_tuples() instead of its own copy of the logic. This also adds pg_waldump support for XLOG_GIST_DELETE records, which previously was not properly present. Bumps WAL version. Author: Andres Freund Discussion: https://postgr.es/m/20190406050243.bszosdg4buvabfrt@alap3.anarazel.de
* Generate less WAL during GiST, GIN and SP-GiST index build.Heikki Linnakangas2019-04-03
| | | | | | | | | | | | | | | | | | | | Instead of WAL-logging every modification during the build separately, first build the index without any WAL-logging, and make a separate pass through the index at the end, to write all pages to the WAL. This significantly reduces the amount of WAL generated, and is usually also faster, despite the extra I/O needed for the extra scan through the index. WAL generated this way is also faster to replay. For GiST, the LSN-NSN interlock makes this a little tricky. All pages must be marked with a valid (i.e. non-zero) LSN, so that the parent-child LSN-NSN interlock works correctly. We now use magic value 1 for that during index build. Change the fake LSN counter to begin from 1000, so that 1 is safely smaller than any real or fake LSN. 2 would've been enough for our purposes, but let's reserve a bigger range, in case we need more special values in the future. Author: Anastasia Lubennikova, Andrey V. Lepikhov Reviewed-by: Heikki Linnakangas, Dmitry Dolgov
* Delete empty pages during GiST VACUUM.Heikki Linnakangas2019-03-22
| | | | | | | | | | | | | | | To do this, we scan GiST two times. In the first pass we make note of empty leaf pages and internal pages. At second pass we scan through internal pages, looking for downlinks to the empty pages. Deleting internal pages is still not supported, like in nbtree, the last child of an internal page is never deleted. That means that if you have a workload where new keys are always inserted to different area than where old keys are removed, the index will still grow without bound. But the rate of growth will be an order of magnitude slower than before. Author: Andrey Borodin Discussion: https://www.postgresql.org/message-id/B1E4DF12-6CD3-4706-BDBD-BF3283328F60@yandex-team.ru
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Check for conflicting queries during replay of gistvacuumpage()Alexander Korotkov2018-12-21
| | | | | | | | | | | | | | | | | | | | | | | 013ebc0a7b implements so-called GiST microvacuum. That is gistgettuple() marks index tuples as dead when kill_prior_tuple is set. Later, when new tuple insertion claims page space, those dead index tuples are physically deleted from page. When this deletion is replayed on standby, it might conflict with read-only queries. But 013ebc0a7b doesn't handle this. That may lead to disappearance of some tuples from read-only snapshots on standby. This commit implements resolving of conflicts between replay of GiST microvacuum and standby queries. On the master we implement new WAL record type XLOG_GIST_DELETE, which comprises necessary information. On stable releases we've to be tricky to keep WAL compatibility. Information required for conflict processing is just appended to data of XLOG_GIST_PAGE_UPDATE record. So, PostgreSQL version, which doesn't know about conflict processing, will just ignore that. Reported-by: Andres Freund Diagnosed-by: Andres Freund Discussion: https://postgr.es/m/20181212224524.scafnlyjindmrbe6%40alap3.anarazel.de Author: Alexander Korotkov Backpatch-through: 9.6
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Split index xlog headers from other private index headers.Robert Haas2017-02-14
| | | | | | | | | | | | | The xlog-specific headers need to be included in both frontend code - specifically, pg_waldump - and the backend, but the remainder of the private headers for each index are only needed by the backend. By splitting the xlog stuff out into separate headers, pg_waldump pulls in fewer backend headers, which is a good thing. Patch by me, reviewed by Michael Paquier and Andres Freund, per a complaint from Dilip Kumar. Discussion: http://postgr.es/m/CA+TgmoZ=F=GkxV0YEv-A8tb+AEGy_Qa7GSiJ8deBKFATnzfEug@mail.gmail.com
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Revamp the WAL record format.Heikki Linnakangas2014-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each WAL record now carries information about the modified relation and block(s) in a standardized format. That makes it easier to write tools that need that information, like pg_rewind, prefetching the blocks to speed up recovery, etc. There's a whole new API for building WAL records, replacing the XLogRecData chains used previously. The new API consists of XLogRegister* functions, which are called for each buffer and chunk of data that is added to the record. The new API also gives more control over when a full-page image is written, by passing flags to the XLogRegisterBuffer function. This also simplifies the XLogReadBufferForRedo() calls. The function can dig the relation and block number from the WAL record, so they no longer need to be passed as arguments. For the convenience of redo routines, XLogReader now disects each WAL record after reading it, copying the main data part and the per-block data into MAXALIGNed buffers. The data chunks are not aligned within the WAL record, but the redo routines can assume that the pointers returned by XLogRecGet* functions are. Redo routines are now passed the XLogReaderState, which contains the record in the already-disected format, instead of the plain XLogRecord. The new record format also makes the fixed size XLogRecord header smaller, by removing the xl_len field. The length of the "main data" portion is now stored at the end of the WAL record, and there's a separate header after XLogRecord for it. The alignment padding at the end of XLogRecord is also removed. This compansates for the fact that the new format would otherwise be more bulky than the old format. Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera, Fujii Masao.
* Improve code around the recently added rm_identify rmgr callback.Andres Freund2014-09-22
| | | | | | | | | | | | | | | There are four weaknesses in728f152e07f998d2cb4fe5f24ec8da2c3bda98f2: * append_init() in heapdesc.c was ugly and required that rm_identify return values are only valid till the next call. Instead just add a couple more switch() cases for the INIT_PAGE cases. Now the returned value will always be valid. * a couple rm_identify() callbacks missed masking xl_info with ~XLR_INFO_MASK. * pg_xlogdump didn't map a NULL rm_identify to UNKNOWN or a similar string. * append_init() was called when id=NULL - which should never actually happen. But it's better to be careful.
* Add rmgr callback to name xlog record types for display purposes.Andres Freund2014-09-19
| | | | | | | | | | | | | | | | | | | This is primarily useful for the upcoming pg_xlogdump --stats feature, but also allows to remove some duplicated code in the rmgr_desc routines. Due to the separation and harmonization, the output of dipsplayed records changes somewhat. But since this isn't enduser oriented content that's ok. It's potentially desirable to further change pg_xlogdump's display of records. It previously wasn't possible to show the record type separately from the description forcing it to be in the last column. But that's better done in a separate commit. Author: Abhijit Menon-Sen, slightly editorialized by me Reviewed-By: Álvaro Herrera, Andres Freund, and Heikki Linnakangas Discussion: 20140604104716.GA3989@toroid.org
* Change the signature of rm_desc so that it's passed a XLogRecord.Heikki Linnakangas2014-06-14
| | | | Just feels more natural, and is more consistent with rm_redo.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Use appendStringInfoString instead of appendStringInfo where possible.Robert Haas2013-10-31
| | | | | | | This shaves a few cycles, and generally seems like good programming practice. David Rowley
* pgindent run for release 9.3Bruce Momjian2013-05-29
| | | | | This is the first run of the Perl-based pgindent script. Also update pgindent instructions.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Split out rmgr rm_desc functions into their own filesAlvaro Herrera2012-11-28
This is necessary (but not sufficient) to have them compilable outside of a backend environment.