aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/gin/ginxlog.c
Commit message (Collapse)AuthorAge
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Update copyright for 2023Bruce Momjian2023-01-02
| | | | Backpatch-through: 11
* Revert 56-bit relfilenode change and follow-up commits.Robert Haas2022-09-28
| | | | | | | | There are still some alignment-related failures in the buildfarm, which might or might not be able to be fixed quickly, but I've also just realized that it increased the size of many WAL records by 4 bytes because a block reference contains a RelFileLocator. The effect of that hasn't been studied or discussed, so revert for now.
* Increase width of RelFileNumbers from 32 bits to 56 bits.Robert Haas2022-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | RelFileNumbers are now assigned using a separate counter, instead of being assigned from the OID counter. This counter never wraps around: if all 2^56 possible RelFileNumbers are used, an internal error occurs. As the cluster is limited to 2^64 total bytes of WAL, this limitation should not cause a problem in practice. If the counter were 64 bits wide rather than 56 bits wide, we would need to increase the width of the BufferTag, which might adversely impact buffer lookup performance. Also, this lets us use bigint for pg_class.relfilenode and other places where these values are exposed at the SQL level without worrying about overflow. This should remove the need to keep "tombstone" files around until the next checkpoint when relations are removed. We do that to keep RelFileNumbers from being recycled, but now that won't happen anyway. However, this patch doesn't actually change anything in this area; it just makes it possible for a future patch to do so. Dilip Kumar, based on an idea from Andres Freund, who also reviewed some earlier versions of the patch. Further review and some wordsmithing by me. Also reviewed at various points by Ashutosh Sharma, Vignesh C, Amul Sul, Álvaro Herrera, and Tom Lane. Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com
* Change internal RelFileNode references to RelFileNumber or RelFileLocator.Robert Haas2022-07-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have been using the term RelFileNode to refer to either (1) the integer that is used to name the sequence of files for a certain relation within the directory set aside for that tablespace/database combination; or (2) that value plus the OIDs of the tablespace and database; or occasionally (3) the whole series of files created for a relation based on those values. Using the same name for more than one thing is confusing. Replace RelFileNode with RelFileNumber when we're talking about just the single number, i.e. (1) from above, and with RelFileLocator when we're talking about all the things that are needed to locate a relation's files on disk, i.e. (2) from above. In the places where we refer to (3) as a relfilenode, instead refer to "relation storage". Since there is a ton of SQL code in the world that knows about pg_class.relfilenode, don't change the name of that column, or of other SQL-facing things that derive their name from it. On the other hand, do adjust closely-related internal terminology. For example, the structure member names dbNode and spcNode appear to be derived from the fact that the structure itself was called RelFileNode, so change those to dbOid and spcOid. Likewise, various variables with names like rnode and relnode get renamed appropriately, according to how they're being used in context. Hopefully, this is clearer than before. It is also preparation for future patches that intend to widen the relfilenumber fields from its current width of 32 bits. Variables that store a relfilenumber are now declared as type RelFileNumber rather than type Oid; right now, these are the same, but that can now more easily be changed. Dilip Kumar, per an idea from me. Reviewed also by Andres Freund. I fixed some whitespace issues, changed a couple of words in a comment, and made one other minor correction. Discussion: http://postgr.es/m/CA+TgmoamOtXbVAQf9hWFzonUo6bhhjS6toZQd7HZ-pmojtAmag@mail.gmail.com Discussion: http://postgr.es/m/CA+Tgmobp7+7kmi4gkq7Y+4AM9fTvL+O1oQ4-5gFTT+6Ng-dQ=g@mail.gmail.com Discussion: http://postgr.es/m/CAFiTN-vTe79M8uDH1yprOU64MNFE+R3ODRuA+JWf27JbhY4hJw@mail.gmail.com
* Update copyright for 2022Bruce Momjian2022-01-07
| | | | Backpatch-through: 10
* Update copyright for 2021Bruce Momjian2021-01-02
| | | | Backpatch-through: 9.5
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Fix traversing to the deleted GIN page via downlinkAlexander Korotkov2019-11-20
| | | | | | | | | | | | | | | Current GIN code appears to don't handle traversing to the deleted page via downlink. This commit fixes that by stepping right from the delete page like we do in nbtree. This commit also fixes setting 'deleted' flag to the GIN pages. Now other page flags are not erased once page is deleted. That helps to keep our assertions true if we arrive deleted page via downlink. Discussion: https://postgr.es/m/CAPpHfdvMvsw-NcE5bRS7R1BbvA4BxoDnVVjkXC5W0Czvy9LVrg%40mail.gmail.com Author: Alexander Korotkov Reviewed-by: Peter Geoghegan Backpatch-through: 9.4
* Initial pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
* Generate less WAL during GiST, GIN and SP-GiST index build.Heikki Linnakangas2019-04-03
| | | | | | | | | | | | | | | | | | | | Instead of WAL-logging every modification during the build separately, first build the index without any WAL-logging, and make a separate pass through the index at the end, to write all pages to the WAL. This significantly reduces the amount of WAL generated, and is usually also faster, despite the extra I/O needed for the extra scan through the index. WAL generated this way is also faster to replay. For GiST, the LSN-NSN interlock makes this a little tricky. All pages must be marked with a valid (i.e. non-zero) LSN, so that the parent-child LSN-NSN interlock works correctly. We now use magic value 1 for that during index build. Change the fake LSN counter to begin from 1000, so that 1 is safely smaller than any real or fake LSN. 2 would've been enough for our purposes, but let's reserve a bigger range, in case we need more special values in the future. Author: Anastasia Lubennikova, Andrey V. Lepikhov Reviewed-by: Heikki Linnakangas, Dmitry Dolgov
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Prevent GIN deleted pages from being reclaimed too earlyAlexander Korotkov2018-12-13
| | | | | | | | | | | | | | | | | | When GIN vacuum deletes a posting tree page, it assumes that no concurrent searchers can access it, thanks to ginStepRight() locking two pages at once. However, since 9.4 searches can skip parts of posting trees descending from the root. That leads to the risk that page is deleted and reclaimed before concurrent search can access it. This commit prevents the risk of above by waiting for every transaction, which might wait to reference this page, to finish. Due to binary compatibility we can't change GinPageOpaqueData to store corresponding transaction id. Instead we reuse page header pd_prune_xid field, which is unused in index pages. Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Andrey Borodin, Alexander Korotkov Reviewed-by: Alexander Korotkov Backpatch-through: 9.4
* Prevent deadlock in ginRedoDeletePage()Alexander Korotkov2018-12-13
| | | | | | | | | | | | | | | | | | | | On standby ginRedoDeletePage() can work concurrently with read-only queries. Those queries can traverse posting tree in two ways. 1) Using rightlinks by ginStepRight(), which locks the next page before unlocking its left sibling. 2) Using downlinks by ginFindLeafPage(), which locks at most one page at time. Original lock order was: page, parent, left sibling. That lock order can deadlock with ginStepRight(). In order to prevent deadlock this commit changes lock order to: left sibling, page, parent. Note, that position of parent in locking order seems insignificant, because we only lock one page at time while traversing downlinks. Reported-by: Chen Huajun Diagnosed-by: Chen Huajun, Peter Geoghegan, Andrey Borodin Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com Author: Alexander Korotkov Backpatch-through: 9.4
* Fix past pd_upper write in ginRedoRecompress()Alexander Korotkov2018-09-09
| | | | | | | | | | | | | | | | ginRedoRecompress() replays actions over compressed segments of posting list in-place. However, it might lead to write past pg_upper, because intermediate state during playing the changes can take more space than both original state and final state. This commit fixes that by refuse from in-place modification. Instead page tail is copied once modification is started, and then it's used as the source of original segments. Backpatch to 9.4 where posting list compression was introduced. Reported-by: Sivasubramanian Ramasubramanian Discussion: https://postgr.es/m/1536091151804.6588%40amazon.com Author: Alexander Korotkov based on patch from and ideas by Sivasubramanian Ramasubramanian Review: Sivasubramanian Ramasubramanian Backpatch-through: 9.4
* Fix handling of empty uncompressed posting list pages in GINAlexander Korotkov2018-07-19
| | | | | | | | | | | | | | PostgreSQL 9.4 introduces posting list compression in GIN. This feature supports online upgrade, so that after pg_upgrade uncompressed posting lists are compressed on-the-fly. Underlying code appears to always expect at least one item on uncompressed posting list page. But there could be completely empty pages, because VACUUM never deletes leftmost and rightmost pages from posting trees. This commit fixes that. Reported-by: Sivasubramanian Ramasubramanian Discussion: https://postgr.es/m/1531867212836.63354%40amazon.com Author: Sivasubramanian Ramasubramanian, Alexander Korotkov Backpatch-through: 9.4
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Set the metapage's pd_lower correctly in brin, gin, and spgist indexes.Tom Lane2017-11-02
| | | | | | | | | | | | | | | | | | | | | | | Previously, these index types left the pd_lower field set to the default SizeOfPageHeaderData, which is really a lie because it ought to point past whatever space is being used for metadata. The coding accidentally failed to fail because we never told xlog.c that the metapage is of standard format --- but that's not very good, because it impedes WAL consistency checking, and in some cases prevents compression of full-page images. To fix, ensure that we set pd_lower correctly, not only when creating a metapage but whenever we write it out (these apparently redundant steps are needed to cope with pg_upgrade'd indexes that don't yet contain the right value). This allows telling xlog.c that the page is of standard format. The WAL consistency check mask functions are made to mask only if pd_lower appears valid, which I think is likely unnecessary complication, since any metapage appearing in a v11 WAL stream should contain valid pd_lower. But it doesn't cost much to be paranoid. Amit Langote, reviewed by Michael Paquier and Amit Kapila Discussion: https://postgr.es/m/0d273805-0e9e-ec1a-cb84-d4da400b8f85@lab.ntt.co.jp
* For wal_consistency_checking, mask page checksum as well as page LSN.Robert Haas2017-09-22
| | | | | | | | If the LSN is different, the checksum will be different, too. Ashwin Agrawal, reviewed by Michael Paquier and Kuntal Ghosh Discussion: http://postgr.es/m/CALfoeis5iqrAU-+JAN+ZzXkpPr7+-0OAGv7QUHwFn=-wDy4o4Q@mail.gmail.com
* Split index xlog headers from other private index headers.Robert Haas2017-02-14
| | | | | | | | | | | | | The xlog-specific headers need to be included in both frontend code - specifically, pg_waldump - and the backend, but the remainder of the private headers for each index are only needed by the backend. By splitting the xlog stuff out into separate headers, pg_waldump pulls in fewer backend headers, which is a good thing. Patch by me, reviewed by Michael Paquier and Andres Freund, per a complaint from Dilip Kumar. Discussion: http://postgr.es/m/CA+TgmoZ=F=GkxV0YEv-A8tb+AEGy_Qa7GSiJ8deBKFATnzfEug@mail.gmail.com
* Add WAL consistency checking facility.Robert Haas2017-02-08
| | | | | | | | | | | | | | When the new GUC wal_consistency_checking is set to a non-empty value, it triggers recording of additional full-page images, which are compared on the standby against the results of applying the WAL record (without regard to those full-page images). Allowable differences such as hints are masked out, and the resulting pages are compared; any difference results in a FATAL error on the standby. Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki Linnakangas. Extensively reviewed and revised by Michael Paquier and by me, with additional reviews and comments from Amit Kapila, Álvaro Herrera, Simon Riggs, and Peter Eisentraut.
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Add macros to make AllocSetContextCreate() calls simpler and safer.Tom Lane2016-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls had typos in the context-sizing parameters. While none of these led to especially significant problems, they did create minor inefficiencies, and it's now clear that expecting people to copy-and-paste those calls accurately is not a great idea. Let's reduce the risk of future errors by introducing single macros that encapsulate the common use-cases. Three such macros are enough to cover all but two special-purpose contexts; those two calls can be left as-is, I think. While this patch doesn't in itself improve matters for third-party extensions, it doesn't break anything for them either, and they can gradually adopt the simplified notation over time. In passing, change TopMemoryContext to use the default allocation parameters. Formerly it could only be extended 8K at a time. That was probably reasonable when this code was written; but nowadays we create many more contexts than we did then, so that it's not unusual to have a couple hundred K in TopMemoryContext, even without considering various dubious code that sticks other things there. There seems no good reason not to let it use growing blocks like most other contexts. Back-patch to 9.6, mostly because that's still close enough to HEAD that it's easy to do so, and keeping the branches in sync can be expected to avoid some future back-patching pain. The bugs fixed by these changes don't seem to be significant enough to justify fixing them further back. Discussion: <21072.1472321324@sss.pgh.pa.us>
* Revert no-op changes to BufferGetPage()Kevin Grittner2016-04-20
| | | | | | | | | | | | | | | | | | The reverted changes were intended to force a choice of whether any newly-added BufferGetPage() calls needed to be accompanied by a test of the snapshot age, to support the "snapshot too old" feature. Such an accompanying test is needed in about 7% of the cases, where the page is being used as part of a scan rather than positioning for other purposes (such as DML or vacuuming). The additional effort required for back-patching, and the doubt whether the intended benefit would really be there, have indicated it is best just to rely on developers to do the right thing based on comments and existing usage, as we do with many other conventions. This change should have little or no effect on generated executable code. Motivated by the back-patching pain of Tom Lane and Robert Haas
* Modify BufferGetPage() to prepare for "snapshot too old" featureKevin Grittner2016-04-08
| | | | | | | | | | | This patch is a no-op patch which is intended to reduce the chances of failures of omission once the functional part of the "snapshot too old" patch goes in. It adds parameters for snapshot, relation, and an enum to specify whether the snapshot age check needs to be done for the page at this point. This initial patch passes NULL for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the third. The follow-on patch will change the places where the test needs to be made.
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Initialize GIN metapage correctly when replaying metapage-update WAL record.Heikki Linnakangas2015-06-30
| | | | | | | | | | | | | | | | | I broke this with my WAL format refactoring patch. Before that, the metapage was read from disk, and modified in-place regardless of the LSN. That was always a bit silly, as there's no need to read the old page version from disk disk when we're overwriting it anyway. So that was changed in 9.5, but I failed to add a GinInitPage call to initialize the page-headers correctly. Usually you wouldn't notice, because the metapage is already in the page cache and is not zeroed. One way to reproduce this is to perform a VACUUM on an already vacuumed table (so that the vacuum has no real work to do), immediately after a checkpoint, and then perform an immediate shutdown. After recovery, the page headers of the metapage will be incorrectly all-zeroes. Reported by Jeff Janes
* Collection of typo fixes.Heikki Linnakangas2015-05-20
| | | | | | | | | | | | | | | Use "a" and "an" correctly, mostly in comments. Two error messages were also fixed (they were just elogs, so no translation work required). Two function comments in pg_proc.h were also fixed. Etsuro Fujita reported one of these, but I found a lot more with grep. Also fix a few other typos spotted while grepping for the a/an typos. For example, "consists out of ..." -> "consists of ...". Plus a "though"/ "through" mixup reported by Euler Taveira. Many of these typos were in old code, which would be nice to backpatch to make future backpatching easier. But much of the code was new, and I didn't feel like crafting separate patches for each branch. So no backpatching.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Revamp the WAL record format.Heikki Linnakangas2014-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each WAL record now carries information about the modified relation and block(s) in a standardized format. That makes it easier to write tools that need that information, like pg_rewind, prefetching the blocks to speed up recovery, etc. There's a whole new API for building WAL records, replacing the XLogRecData chains used previously. The new API consists of XLogRegister* functions, which are called for each buffer and chunk of data that is added to the record. The new API also gives more control over when a full-page image is written, by passing flags to the XLogRegisterBuffer function. This also simplifies the XLogReadBufferForRedo() calls. The function can dig the relation and block number from the WAL record, so they no longer need to be passed as arguments. For the convenience of redo routines, XLogReader now disects each WAL record after reading it, copying the main data part and the per-block data into MAXALIGNed buffers. The data chunks are not aligned within the WAL record, but the redo routines can assume that the pointers returned by XLogRecGet* functions are. Redo routines are now passed the XLogReaderState, which contains the record in the already-disected format, instead of the plain XLogRecord. The new record format also makes the fixed size XLogRecord header smaller, by removing the xl_len field. The length of the "main data" portion is now stored at the end of the WAL record, and there's a separate header after XLogRecord for it. The alignment padding at the end of XLogRecord is also removed. This compansates for the fact that the new format would otherwise be more bulky than the old format. Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera, Fujii Masao.
* Refactor per-page logic common to all redo routines to a new function.Heikki Linnakangas2014-09-02
| | | | | | | | | | | | Every redo routine uses the same idiom to determine what to do to a page: check if there's a backup block for it, and if not read, the buffer if the block exists, and check its LSN. Refactor that into a common function, XLogReadBufferForRedo, making all the redo routines shorter and more readable. This has no user-visible effect, and makes no changes to the WAL format. Reviewed by Andres Freund, Alvaro Herrera, Michael Paquier.
* Move log_newpage and log_newpage_buffer to xlog.c.Heikki Linnakangas2014-07-31
| | | | | | | | | | | log_newpage is used by many indexams, in addition to heap, but for historical reasons it's always been part of the heapam rmgr. Starting with 9.3, we have another WAL record type for logging an image of a page, XLOG_FPI. Simplify things by moving log_newpage and log_newpage_buffer to xlog.c, and switch to using the XLOG_FPI record type. Bump the WAL version number because the code to replay the old HEAP_NEWPAGE records is removed.
* Protect against torn pages when deleting GIN list pages.Heikki Linnakangas2014-05-08
| | | | | | | | | To-be-deleted list pages contain no useful information, as they are being deleted, but we must still protect the writes from being torn by a crash after a partial write. To do that, re-initialize the pages on WAL replay. Jeff Janes caught this with a test program to test partial writes. Backpatch to all supported versions.
* pgindent run for 9.4Bruce Momjian2014-05-06
| | | | | This includes removing tabs after periods in C comments, which was applied to back branches, so this change should not effect backpatching.
* Fix two bugs in WAL-logging of GIN pending-list pages.Heikki Linnakangas2014-04-28
| | | | | | | | | | | | | | | | | | In writeListPage, never take a full-page image of the page, because we have all the information required to re-initialize in the WAL record anyway. Before this fix, a full-page image was always generated, unless full_page_writes=off, because when the page is initialized its LSN is always 0. In stable-branches, keep the code to restore the backup blocks if they exist, in case that the WAL is generated with an older minor version, but in master Assert that there are no full-page images. In the redo routine, add missing "off++". Otherwise the tuples are added to the page in reverse order. That happens to be harmless because we always scan and remove all the tuples together, but it was clearly wrong. Also, it was masked by the first bug unless full_page_writes=off, because the page was always restored from a full-page image. Backpatch to all supported versions.
* Set pd_lower on internal GIN posting tree pages.Heikki Linnakangas2014-04-14
| | | | | | | | | | | | | | | | This allows squeezing out the unused space in full-page writes. And more importantly, it can be a useful debugging aid. In hindsight we should've done this back when GIN was added - we wouldn't need the 'maxoff' field in the page opaque struct if we had used pd_lower and pd_upper like on normal pages. But as long as there can be pages in the index that have been binary-upgraded from pre-9.4 versions, we can't rely on that, and have to continue using 'maxoff'. Most of the code churn comes from renaming some macros, now that they're used on internal pages, too. This change is completely backwards-compatible, no effect on pg_upgrade.
* Fix WAL replay bug in the new GIN incomplete-split code.Heikki Linnakangas2014-04-07
| | | | | | | | Forgot to set the incomplete-split flag on the left page half, in redo of a page split. Spotted this by comparing the page contents on master and standby, after inserting/applying each WAL record.
* Remove dead check for backup block, replace with Assert.Heikki Linnakangas2014-04-01
| | | | | We don't use backup blocks with GIN vacuum records anymore, the page is always recreated from scratch.
* Rewrite the way GIN posting lists are packed on a page, to reduce WAL volume.Heikki Linnakangas2014-03-31
| | | | | | | | | | | | | | Inserting (in retail) into the new 9.4 format GIN posting tree created much larger WAL records than in 9.3. The previous strategy to WAL logging was basically to log the whole page on each change, with the exception of completely unmodified segments up to the first modified one. That was not too bad when appending to the end of the page, as only the last segment had to be WAL-logged, but per Fujii Masao's testing, even that produced 2x the WAL volume that 9.3 did. The new strategy is to keep track of changes to the posting lists in a more fine-grained fashion, and also make the repacking" code smarter to avoid decoding and re-encoding segments unnecessarily.
* In WAL replay, restore GIN metapage unconditionally to avoid torn page.Heikki Linnakangas2014-03-12
| | | | | | | | | | | | | | | | We don't take a full-page image of the GIN metapage; instead, the WAL record contains all the information required to reconstruct it from scratch. But to avoid torn page hazards, we must re-initialize it from the WAL record every time, even if it already has a greater LSN, similar to how normal full page images are restored. This was highly unlikely to cause any problems in practice, because the GIN metapage is small. We rely on an update smaller than a 512 byte disk sector to be atomic elsewhere, at least in pg_control. But better safe than sorry, and this would be easy to overlook if more fields are added to the metapage so that it's no longer small. Reported by Noah Misch. Backpatch to all supported versions.
* Reset unused fields in GIN data leaf page footer.Heikki Linnakangas2014-01-24
| | | | | | | The maxoff field is not used in the new, compressed page format. Let's reset it when converting an old-format page to the new format. The code won't care either way, but this makes it possible to use the field for something else in the future.
* Compress GIN posting lists, for smaller index size.Heikki Linnakangas2014-01-22
| | | | | | | | | | | | | | | | | | | | | GIN posting lists are now encoded using varbyte-encoding, which allows them to fit in much smaller space than the straight ItemPointer array format used before. The new encoding is used for both the lists stored in-line in entry tree items, and in posting tree leaf pages. To maintain backwards-compatibility and keep pg_upgrade working, the code can still read old-style pages and tuples. Posting tree leaf pages in the new format are flagged with GIN_COMPRESSED flag, to distinguish old and new format pages. Likewise, entry tree tuples in the new format have a GIN_ITUP_COMPRESSED flag set in a bit that was previously unused. This patch bumps GIN_CURRENT_VERSION from 1 to 2. New indexes created with version 9.4 will therefore have version number 2 in the metapage, while old pg_upgraded indexes will have version 1. The code treats them the same, but it might be come handy in the future, if we want to drop support for the uncompressed format. Alexander Korotkov and me. Reviewed by Tomas Vondra and Amit Langote.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Get rid of the post-recovery cleanup step of GIN page splits.Heikki Linnakangas2013-11-27
| | | | | | | | | | | | | | | | | | Replace it with an approach similar to what GiST uses: when a page is split, the left sibling is marked with a flag indicating that the parent hasn't been updated yet. When the parent is updated, the flag is cleared. If an insertion steps on a page with the flag set, it will finish split before proceeding with the insertion. The post-recovery cleanup mechanism was never totally reliable, as insertion to the parent could fail e.g because of running out of memory or disk space, leaving the tree in an inconsistent state. This also divides the responsibility of WAL-logging more clearly between the generic ginbtree.c code, and the parts specific to entry and posting trees. There is now a common WAL record format for insertions and deletions, which is written by ginbtree.c, followed by tree-specific payload, which is returned by the placetopage- and split- callbacks.
* More GIN refactoring.Heikki Linnakangas2013-11-27
| | | | | | | | | | | | Separate the insertion payload from the more static portions of GinBtree. GinBtree now only contains information related to searching the tree, and the information of what to insert is passed separately. Add root block number to GinBtree, instead of passing it around all the functions as argument. Split off ginFinishSplit() from ginInsertValue(). ginFinishSplit is responsible for finding the parent and inserting the downlink to it.
* Refactor the internal GIN B-tree interface for forming a downlink.Heikki Linnakangas2013-11-20
| | | | | | This creates a new gin-btree callback function for creating a downlink for a page. Previously, ginxlog.c duplicated the logic used during normal operation.
* Minor GIN code refactoring.Heikki Linnakangas2013-10-03
| | | | | | | | | | | It makes for cleaner code to have separate Get/Add functions for PostingItems and ItemPointers. A few callsites that have to deal with both types need to be duplicated because of this, but all the callers have to know which one they're dealing with anyway. Overall, this reduces the amount of casting required. Extracted from Alexander Korotkov's larger patch to change the data page format.
* Remove PageSetTLI and rename pd_tli to pd_checksumSimon Riggs2013-03-18
| | | | | | | | | | | | | | Remove use of PageSetTLI() from all page manipulation functions and adjust README to indicate change in the way we make changes to pages. Repurpose those bytes into the pd_checksum field and explain how that works in comments about page header. Refactoring ahead of actual feature patch which would make use of the checksum field, arriving later. Jeff Davis, with comments and doc changes by Simon Riggs Direction suggested by Robert Haas; many others providing review comments.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.