aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/transam/xlogutils.c
Commit message (Collapse)AuthorAge
...
* Teach xlogreader to follow timeline switchesSimon Riggs2017-03-22
| | | | | | | | | | Uses page-based mechanism to ensure we’re using the correct timeline. Tests are included to exercise the functionality using a cold disk-level copy of the master that's started up as a replica with slots intact, but the intended use of the functionality is with later features. Craig Ringer, reviewed by Simon Riggs and Andres Freund
* Create and use wait events for read, write, and fsync operations.Robert Haas2017-03-18
| | | | | | | | | | | | | Previous commits, notably 53be0b1add7064ca5db3cd884302dfc3268d884e and 6f3bd98ebfc008cbd676da777bb0b2376c4c4bfa, made it possible to see from pg_stat_activity when a backend was stuck waiting for another backend, but it's also fairly common for a backend to be stuck waiting for an I/O. Add wait events for those operations, too. Rushabh Lathia, with further hacking by me. Reviewed and tested by Michael Paquier, Amit Kapila, Rajkumar Raghuwanshi, and Rahila Syed. Discussion: http://postgr.es/m/CAGPqQf0LsYHXREPAZqYGVkDqHSyjf=KsD=k0GTVPAuzyThh-VQ@mail.gmail.com
* Rename user-facing tools with "xlog" in the name to say "wal".Robert Haas2017-02-09
| | | | | This means pg_receivexlog because pg_receivewal, pg_resetxlog becomes pg_resetwal, and pg_xlogdump becomes pg_waldump.
* Add WAL consistency checking facility.Robert Haas2017-02-08
| | | | | | | | | | | | | | When the new GUC wal_consistency_checking is set to a non-empty value, it triggers recording of additional full-page images, which are compared on the standby against the results of applying the WAL record (without regard to those full-page images). Allowable differences such as hints are masked out, and the resulting pages are compared; any difference results in a FATAL error on the standby. Kuntal Ghosh, based on earlier patches by Michael Paquier and Heikki Linnakangas. Extensively reviewed and revised by Michael Paquier and by me, with additional reviews and comments from Amit Kapila, Álvaro Herrera, Simon Riggs, and Peter Eisentraut.
* Update copyright via script for 2017Bruce Momjian2017-01-03
|
* Fix bogus commentsAlvaro Herrera2016-05-12
| | | | | | Some comments mentioned XLogReplayBuffer, but there's no such function: that was an interim name for a function that got renamed to XLogReadBufferForRedo, before commit 2c03216d831160 was pushed.
* Revert timeline following in replication slotsAlvaro Herrera2016-05-04
| | | | | | | | | This reverts commits f07d18b6e94d, 82c83b337202, 3a3b309041b0, and 24c5f1a103ce. This feature has shown enough immaturity that it was deemed better to rip it out before rushing some more fixes at the last minute. There are discussions on larger changes in this area for the next release.
* Revert no-op changes to BufferGetPage()Kevin Grittner2016-04-20
| | | | | | | | | | | | | | | | | | The reverted changes were intended to force a choice of whether any newly-added BufferGetPage() calls needed to be accompanied by a test of the snapshot age, to support the "snapshot too old" feature. Such an accompanying test is needed in about 7% of the cases, where the page is being used as part of a scan rather than positioning for other purposes (such as DML or vacuuming). The additional effort required for back-patching, and the doubt whether the intended benefit would really be there, have indicated it is best just to rely on developers to do the right thing based on comments and existing usage, as we do with many other conventions. This change should have little or no effect on generated executable code. Motivated by the back-patching pain of Tom Lane and Robert Haas
* Modify BufferGetPage() to prepare for "snapshot too old" featureKevin Grittner2016-04-08
| | | | | | | | | | | This patch is a no-op patch which is intended to reduce the chances of failures of omission once the functional part of the "snapshot too old" patch goes in. It adds parameters for snapshot, relation, and an enum to specify whether the snapshot age check needs to be done for the page at this point. This initial patch passes NULL for the first two new parameters and BGP_NO_SNAPSHOT_TEST for the third. The follow-on patch will change the places where the test needs to be made.
* Enable logical slots to follow timeline switchesAlvaro Herrera2016-03-30
| | | | | | | | | | | | | | | | | | When decoding from a logical slot, it's necessary for xlog reading to be able to read xlog from historical (i.e. not current) timelines; otherwise, decoding fails after failover, because the archives are in the historical timeline. This is required to make "failover logical slots" possible; it currently has no other use, although theoretically it could be used by an extension that creates a slot on a standby and continues to replay from the slot when the standby is promoted. This commit includes a module in src/test/modules with functions to manipulate the slots (which is not otherwise possible in SQL code) in order to enable testing, and a new test in src/test/recovery to ensure that the behavior is as expected. Author: Craig Ringer Reviewed-By: Oleksii Kliukin, Andres Freund, Petr Jelínek
* XLogReader general code cleanupAlvaro Herrera2016-03-30
| | | | | | | | | Some minor tweaks and comment additions, for cleanliness sake and to avoid having the upcoming timeline-following patch be polluted with unrelated cleanup. Extracted from a larger patch by Craig Ringer, reviewed by Andres Freund, with some additions by myself.
* Refactor to create generic WAL page read callbackSimon Riggs2016-01-20
| | | | | | | | | | Previously we didn’t have a generic WAL page read callback function, surprisingly. Logical decoding has logical_read_local_xlog_page(), which was actually generic, so move that to xlogfunc.c and rename to read_local_xlog_page(). Maintain logical_read_local_xlog_page() so existing callers still work. As requested by Michael Paquier, Alvaro Herrera and Andres Freund
* Update copyright for 2016Bruce Momjian2016-01-02
| | | | Backpatch certain files through 9.1
* Fix bug leading to restoring unlogged relations from empty files.Andres Freund2015-12-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the end of crash recovery, unlogged relations are reset to the empty state, using their init fork as the template. The init fork is copied to the main fork without going through shared buffers. Unfortunately WAL replay so far has not necessarily flushed writes from shared buffers to disk at that point. In normal crash recovery, and before the introduction of 'fast promotions' in fd4ced523 / 9.3, the END_OF_RECOVERY checkpoint flushes the buffers out in time. But with fast promotions that's not the case anymore. To fix, force WAL writes targeting the init fork to be flushed immediately (using the new FlushOneBuffer() function). In 9.5+ that flush can centrally be triggered from the code dealing with restoring full page writes (XLogReadBufferForRedoExtended), in earlier releases that responsibility is in the hands of XLOG_HEAP_NEWPAGE's replay function. Backpatch to 9.1, even if this currently is only known to trigger in 9.3+. Flushing earlier is more robust, and it is advantageous to keep the branches similar. Typical symptoms of this bug are errors like 'ERROR: index "..." contains unexpected zero page at block 0' shortly after promoting a node. Reported-By: Thom Brown Author: Andres Freund and Michael Paquier Discussion: 20150326175024.GJ451@alap3.anarazel.de Backpatch: 9.1-
* Sanity-check that a page zeroed by redo routine is marked with WILL_INIT.Heikki Linnakangas2015-07-20
| | | | | | | | | | | | | | | There was already a sanity-check in the other direction: if a page was marked with WILL_INIT, it had to be initialized by the redo routine. It's not strictly necessary for correctness that a page is marked with WILL_INIT if it's going to be initialized at redo, but it's a missed optimization if nothing else. Fix a few instances of this issue in SP-GiST, where a block in WAL record was not marked with WILL_INIT, but was in fact always initialized at redo. We were creating a full-page image of the page unnecessarily in those cases. Backpatch to 9.5, where the new WILL_INIT flag was added.
* Update copyright for 2015Bruce Momjian2015-01-06
| | | | Backpatch certain files through 9.0
* Improve hash_create's API for selecting simple-binary-key hash functions.Tom Lane2014-12-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, if you wanted anything besides C-string hash keys, you had to specify a custom hashing function to hash_create(). Nearly all such callers were specifying tag_hash or oid_hash; which is tedious, and rather error-prone, since a caller could easily miss the opportunity to optimize by using hash_uint32 when appropriate. Replace this with a design whereby callers using simple binary-data keys just specify HASH_BLOBS and don't need to mess with specific support functions. hash_create() itself will take care of optimizing when the key size is four bytes. This nets out saving a few hundred bytes of code space, and offers a measurable performance improvement in tidbitmap.c (which was not exploiting the opportunity to use hash_uint32 for its 4-byte keys). There might be some wins elsewhere too, I didn't analyze closely. In future we could look into offering a similar optimized hashing function for 8-byte keys. Under this design that could be done in a centralized and machine-independent fashion, whereas getting it right for keys of platform-dependent sizes would've been notationally painful before. For the moment, the old way still works fine, so as not to break source code compatibility for loadable modules. Eventually we might want to remove tag_hash and friends from the exported API altogether, since there's no real need for them to be explicitly referenced from outside dynahash.c. Teodor Sigaev and Tom Lane
* Revamp the WAL record format.Heikki Linnakangas2014-11-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each WAL record now carries information about the modified relation and block(s) in a standardized format. That makes it easier to write tools that need that information, like pg_rewind, prefetching the blocks to speed up recovery, etc. There's a whole new API for building WAL records, replacing the XLogRecData chains used previously. The new API consists of XLogRegister* functions, which are called for each buffer and chunk of data that is added to the record. The new API also gives more control over when a full-page image is written, by passing flags to the XLogRegisterBuffer function. This also simplifies the XLogReadBufferForRedo() calls. The function can dig the relation and block number from the WAL record, so they no longer need to be passed as arguments. For the convenience of redo routines, XLogReader now disects each WAL record after reading it, copying the main data part and the per-block data into MAXALIGNed buffers. The data chunks are not aligned within the WAL record, but the redo routines can assume that the pointers returned by XLogRecGet* functions are. Redo routines are now passed the XLogReaderState, which contains the record in the already-disected format, instead of the plain XLogRecord. The new record format also makes the fixed size XLogRecord header smaller, by removing the xl_len field. The length of the "main data" portion is now stored at the end of the WAL record, and there's a separate header after XLogRecord for it. The alignment padding at the end of XLogRecord is also removed. This compansates for the fact that the new format would otherwise be more bulky than the old format. Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera, Fujii Masao.
* Fix race condition between hot standby and restoring a full-page image.Heikki Linnakangas2014-11-13
| | | | | | | | | | | | | | | | | | | There was a window in RestoreBackupBlock where a page would be zeroed out, but not yet locked. If a backend pinned and locked the page in that window, it saw the zeroed page instead of the old page or new page contents, which could lead to missing rows in a result set, or errors. To fix, replace RBM_ZERO with RBM_ZERO_AND_LOCK, which atomically pins, zeroes, and locks the page, if it's not in the buffer cache already. In stable branches, the old RBM_ZERO constant is renamed to RBM_DO_NOT_USE, to avoid breaking any 3rd party extensions that might use RBM_ZERO. More importantly, this avoids renumbering the other enum values, which would cause even bigger confusion in extensions that use ReadBufferExtended, but haven't been recompiled. Backpatch to all supported versions; this has been racy since hot standby was introduced.
* Fix XLogReadBufferForRedoExtended to get cleanup lock when asked to do so.Heikki Linnakangas2014-11-13
|
* Move the backup-block logic from XLogInsert to a new file, xloginsert.c.Heikki Linnakangas2014-11-06
| | | | | | | | | | | | xlog.c is huge, this makes it a little bit smaller, which is nice. Functions related to putting together the WAL record are in xloginsert.c, and the lower level stuff for managing WAL buffers and such are in xlog.c. Also move the definition of XLogRecord to a separate header file. This causes churn in the #includes of all the files that write WAL records, and redo routines, but it avoids pulling in xlog.h into most places. Reviewed by Michael Paquier, Alvaro Herrera, Andres Freund and Amit Kapila.
* Refactor per-page logic common to all redo routines to a new function.Heikki Linnakangas2014-09-02
| | | | | | | | | | | | Every redo routine uses the same idiom to determine what to do to a page: check if there's a backup block for it, and if not read, the buffer if the block exists, and check its LSN. Refactor that into a common function, XLogReadBufferForRedo, making all the redo routines shorter and more readable. This has no user-visible effect, and makes no changes to the WAL format. Reviewed by Andres Freund, Alvaro Herrera, Michael Paquier.
* Rationalize common/relpath.[hc].Tom Lane2014-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a73018392636ce832b09b5c31f6ad1f18a4643ea created rather a mess by putting dependencies on backend-only include files into include/common. We really shouldn't do that. To clean it up: * Move TABLESPACE_VERSION_DIRECTORY back to its longtime home in catalog/catalog.h. We won't consider this symbol part of the FE/BE API. * Push enum ForkNumber from relfilenode.h into relpath.h. We'll consider relpath.h as the source of truth for fork numbers, since relpath.c was already partially serving that function, and anyway relfilenode.h was kind of a random place for that enum. * So, relfilenode.h now includes relpath.h rather than vice-versa. This direction of dependency is fine. (That allows most, but not quite all, of the existing explicit #includes of relpath.h to go away again.) * Push forkname_to_number from catalog.c to relpath.c, just to centralize fork number stuff a bit better. * Push GetDatabasePath from catalog.c to relpath.c; it was rather odd that the previous commit didn't keep this together with relpath(). * To avoid needing relfilenode.h in common/, redefine the underlying function (now called GetRelationPath) as taking separate OID arguments, and make the APIs using RelFileNode or RelFileNodeBackend into macro wrappers. (The macros have a potential multiple-eval risk, but none of the existing call sites have an issue with that; one of them had such a risk already anyway.) * Fix failure to follow the directions when "init" fork type was added; specifically, the errhint in forkname_to_number wasn't updated, and neither was the SGML documentation for pg_relation_size(). * Fix tablespace-path-too-long check in CreateTableSpace() to account for fork-name component of maximum-length pathnames. This requires putting FORKNAMECHARS into a header file, but it was rather useless (and actually unreferenced) where it was. The last couple of items are potentially back-patchable bug fixes, if anyone is sufficiently excited about them; but personally I'm not. Per a gripe from Christoph Berg about how include/common wasn't self-contained.
* Fix dangling smgr_owner pointer when a fake relcache entry is freed.Heikki Linnakangas2014-03-07
| | | | | | | | | | | | A fake relcache entry can "own" a SmgrRelation object, like a regular relcache entry. But when it was free'd, the owner field in SmgrRelation was not cleared, so it was left pointing to free'd memory. Amazingly this apparently hasn't caused crashes in practice, or we would've heard about it earlier. Andres found this with Valgrind. Report and fix by Andres Freund, with minor modifications by me. Backpatch to all supported versions.
* In XLogReadBufferExtended, don't assume P_NEW yields consecutive pages.Tom Lane2014-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | In a database that's not yet reached consistency, it's possible that some segments of a relation are not full-size but are not the last ones either. Because of the way smgrnblocks() works, asking for a new page with P_NEW will fill in the last not-full-size segment --- and if that makes it full size, the apparent EOF of the relation will increase by more than one page, so that the next P_NEW request will yield a page past the next consecutive one. This breaks the relation-extension logic in XLogReadBufferExtended, possibly allowing a page update to be applied to some page far past where it was intended to go. This appears to be the explanation for reports of table bloat on replication slaves compared to their masters, and probably explains some corrupted-slave reports as well. Fix the loop to check the page number it actually got, rather than merely Assert()'ing that dead reckoning got it to the desired place. AFAICT, there are no other places that make assumptions about exactly which page they'll get from P_NEW. Problem identified by Greg Stark, though this is not the same as his proposed patch. It's been like this for a long time, so back-patch to all supported branches.
* Fix multiple bugs in index page locking during hot-standby WAL replay.Tom Lane2014-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ordinary operation, VACUUM must be careful to take a cleanup lock on each leaf page of a btree index; this ensures that no indexscans could still be "in flight" to heap tuples due to be deleted. (Because of possible index-tuple motion due to concurrent page splits, it's not enough to lock only the pages we're deleting index tuples from.) In Hot Standby, the WAL replay process must likewise lock every leaf page. There were several bugs in the code for that: * The replay scan might come across unused, all-zero pages in the index. While btree_xlog_vacuum itself did the right thing (ie, nothing) with such pages, xlogutils.c supposed that such pages must be corrupt and would throw an error. This accounts for various reports of replication failures with "PANIC: WAL contains references to invalid pages". To fix, add a ReadBufferMode value that instructs XLogReadBufferExtended not to complain when we're doing this. * btree_xlog_vacuum performed the extra locking if standbyState == STANDBY_SNAPSHOT_READY, but that's not the correct test: we won't open up for hot standby queries until the database has reached consistency, and we don't want to do the extra locking till then either, for fear of reading corrupted pages (which bufmgr.c would complain about). Fix by exporting a new function from xlog.c that will report whether we're actually in hot standby replay mode. * To ensure full coverage of the index in the replay scan, btvacuumscan would emit a dummy WAL record for the last page of the index, if no vacuuming work had been done on that page. However, if the last page of the index is all-zero, that would result in corruption of said page, since the functions called on it weren't prepared to handle that case. There's no need to lock any such pages, so change the logic to target the last normal leaf page instead. The first two of these bugs were diagnosed by Andres Freund, the other one by me. Fixes based on ideas from Heikki Linnakangas and myself. This has been wrong since Hot Standby was introduced, so back-patch to 9.0.
* Update copyright for 2014Bruce Momjian2014-01-07
| | | | | Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.
* Move relpath() to libpgcommonAlvaro Herrera2013-02-21
| | | | | | | This enables non-backend code, such as pg_xlogdump, to use it easily. The previous location, in src/backend/catalog/catalog.c, made that essentially impossible because that file depends on many backend-only facilities; so this needs to live separately.
* Update copyrights for 2013Bruce Momjian2013-01-01
| | | | | Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.
* Properly set relpersistence for fake relcache entries.Robert Haas2012-09-14
| | | | | | | This can result in buffers failing to be properly flushed at checkpoint time, leading to data loss. Report, diagnosis, and patch by Jeff Davis.
* Run pgindent on 9.2 source tree in preparation for first 9.3Bruce Momjian2012-06-10
| | | | commit-fest.
* Update copyright notices for year 2012.Bruce Momjian2012-01-01
|
* Don't set reachedMinRecoveryPoint during crash recovery. In crash recovery,Heikki Linnakangas2011-12-09
| | | | | | | | | | | | | | | we don't reach consistency before replaying all of the WAL. Rename the variable to reachedConsistency, to make its intention clearer. In master, that was an active bug because of the recent patch to immediately PANIC if a reference to a missing page is found in WAL after reaching consistency, as Tom Lane's test case demonstrated. In 9.1 and 9.0, the only consequence was a misleading "consistent recovery state reached at %X/%X" message in the log at the beginning of crash recovery (the database is not consistent at that point yet). In 8.4, the log message was not printed in crash recovery, even though there was a similar reachedMinRecoveryPoint local variable that was also set early. So, backpatch to 9.1 and 9.0.
* During recovery, if we reach consistent state and still have entries in theHeikki Linnakangas2011-12-02
| | | | | | | | | | | | | | | invalid-page hash table, PANIC immediately. Immediate PANIC is much better than waiting for end-of-recovery, which is what we did before, because the end-of-recovery might not come until months later if this is a standby server. Also refrain from creating a restartpoint if there are invalid-page entries in the hash table. Restarting recovery from such a restartpoint would not see the invalid references, and wouldn't be able to cross-check them when consistency is reached. That wouldn't matter when things are going smoothly, but the more sanity checks you have the better. Fujii Masao
* Clean up the #include mess a little.Tom Lane2011-09-04
| | | | | | | | | | | | | | | | | walsender.h should depend on xlog.h, not vice versa. (Actually, the inclusion was circular until a couple hours ago, which was even sillier; but Bruce broke it in the expedient rather than logically correct direction.) Because of that poor decision, plus blind application of pgrminclude, we had a situation where half the system was depending on xlog.h to include such unrelated stuff as array.h and guc.h. Clean up the header inclusion, and manually revert a lot of what pgrminclude had done so things build again. This episode reinforces my feeling that pgrminclude should not be run without adult supervision. Inclusion changes in header files in particular need to be reviewed with great care. More generally, it'd be good if we had a clearer notion of module layering to dictate which headers can sanely include which others ... but that's a big task for another day.
* Remove unnecessary #include references, per pgrminclude script.Bruce Momjian2011-09-01
|
* Add missing includes after pgrminclude run.Bruce Momjian2011-08-26
|
* Stamp copyrights for year 2011.Bruce Momjian2011-01-01
|
* Remove cvs keywords from all files.Magnus Hagander2010-09-20
|
* Fix oversight in RelFileNodeBackend patch: CreateFakeRelcacheEntry needs toTom Lane2010-08-30
| | | | | | initialize the rd_backend field of a fake Relation entry correctly. Fortunately, that is easy, since only non-temp relations should ever be mentioned in the WAL stream.
* Include the backend ID in the relpath of temporary relations.Robert Haas2010-08-13
| | | | | | | | | | | | | | | | | This allows us to reliably remove all leftover temporary relation files on cluster startup without reference to system catalogs or WAL; therefore, we no longer include temporary relations in XLOG_XACT_COMMIT and XLOG_XACT_ABORT WAL records. Since these changes require including a backend ID in each SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id field has been reduced from two bytes to one, and the maximum number of connections has been reduced from INT_MAX / 4 to 2^23-1. It would be possible to remove these restrictions by increasing the size of SharedInvalidationMessage by 4 bytes, but right now that doesn't seem like a good trade-off. Review by Jaime Casanova and Tom Lane.
* Update obsolete comment. Noted by Josh Tolley.Tom Lane2010-07-08
|
* Fix up rickety handling of relation-truncation interlocks.Tom Lane2010-02-09
| | | | | | | | | | | | | | | | | | | | Move rd_targblock, rd_fsm_nblocks, and rd_vm_nblocks from relcache to the smgr relation entries, so that they will get reset to InvalidBlockNumber whenever an smgr-level flush happens. Because we now send smgr invalidation messages immediately (not at end of transaction) when a relation truncation occurs, this ensures that other backends will reset their values before they next access the relation. We no longer need the unreliable assumption that a VACUUM that's doing a truncation will hold its AccessExclusive lock until commit --- in fact, we can intentionally release that lock as soon as we've completed the truncation. This patch therefore reverts (most of) Alvaro's patch of 2009-11-10, as well as my marginal hacking on it yesterday. We can also get rid of assorted no-longer-needed relcache flushes, which are far more expensive than an smgr flush because they kill a lot more state. In passing this patch fixes smgr_redo's failure to perform visibility-map truncation, and cleans up some rather dubious assumptions in freespace.c and visibilitymap.c about when rd_fsm_nblocks and rd_vm_nblocks can be out of date.
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock shouldHeikki Linnakangas2009-01-20
| | | | | | | | | be used instead of the normal exclusive lock, and make WAL redo functions responsible for calling RestoreBkpBlocks(). They know better what kind of a lock they need. At the moment, this just moves things around with no functional change, but makes the hot standby patch that's under review cleaner.
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Introduce visibility map. The visibility map is a bitmap with one bit perHeikki Linnakangas2008-12-03
| | | | | | | | | | | | | | | | | | heap page, where a set bit indicates that all tuples on the page are visible to all transactions, and the page therefore doesn't need vacuuming. It is stored in a new relation fork. Lazy vacuum uses the visibility map to skip pages that don't need vacuuming. Vacuum is also responsible for setting the bits in the map. In the future, this can hopefully be used to implement index-only-scans, but we can't currently guarantee that the visibility map is always 100% up-to-date. In addition to the visibility map, there's a new PD_ALL_VISIBLE flag on each heap page, also indicating that all tuples on the page are visible to all transactions. It's important that this flag is kept up-to-date. It is also used to skip visibility tests in sequential scans, which gives a small performance gain on seqscans.
* Rely on relcache invalidation to update the cached size of the FSM.Heikki Linnakangas2008-11-26
|
* Rethink the way FSM truncation works. Instead of WAL-logging FSMHeikki Linnakangas2008-11-19
| | | | | | | | | | | | | | | truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.