aboutsummaryrefslogtreecommitdiff
path: root/src/backend/access/transam/xlog.c
Commit message (Collapse)AuthorAge
...
* Fix crashing bug at the end of recovery in Streaming Replication, whenHeikki Linnakangas2010-01-28
| | | | restore_command is not given. Fujii Masao.
* Fix bug in wasender's xlogid boundary handling, reported by Erik Rijkers.Heikki Linnakangas2010-01-27
| | | | | | | | LogwrtRqst.Write can be set to non-existent FF log segment, we mustn't try to send that in XLogSend(). Also fix similar bug in ReadRecord(), which I just introduced in the ReadRecord() refactoring patch.
* Make standby server continuously retry restoring the next WAL segment withHeikki Linnakangas2010-01-27
| | | | | | | | | | | | | | | | | | | | | | | | restore_command, if the connection to the primary server is lost. This ensures that the standby can recover automatically, if the connection is lost for a long time and standby falls behind so much that the required WAL segments have been archived and deleted in the master. This also makes standby_mode useful without streaming replication; the server will keep retrying restore_command every few seconds until the trigger file is found. That's the same basic functionality pg_standby offers, but without the bells and whistles. To implement that, refactor the ReadRecord/FetchRecord functions. The FetchRecord() function introduced in the original streaming replication patch is removed, and all the retry logic is now in a new function called XLogReadPage(). XLogReadPage() is now responsible for executing restore_command, launching walreceiver, and waiting for new WAL to arrive from primary, as required. This also changes the life cycle of walreceiver. When launched, it now only tries to connect to the master once, and exits if the connection fails, or is lost during streaming for any reason. The startup process detects the death, and re-launches walreceiver if necessary.
* Fix longstanding gripe that we check for 0000000001.history at start ofSimon Riggs2010-01-26
| | | | archive recovery, even when we know it is never present.
* In HS, Startup process sets SIGALRM when waiting for buffer pin. IfSimon Riggs2010-01-23
| | | | | | | woken by alarm we send SIGUSR1 to all backends requesting that they check to see if they are blocking Startup process. If so, they throw ERROR/FATAL as for other conflict resolutions. Deadlock stop gap removed. max_standby_delay = -1 option removed to prevent deadlock.
* Write a WAL record whenever we perform an operation without WAL-loggingHeikki Linnakangas2010-01-20
| | | | | | | | that would've been WAL-logged if archiving was enabled. If we encounter such records in archive recovery anyway, we know that some data is missing from the log. A WARNING is emitted in that case. Original patch by Fujii Masao, with changes by me.
* Introduce Streaming Replication.Heikki Linnakangas2010-01-15
| | | | | | | | | | | | | | | | | | | | This includes two new kinds of postmaster processes, walsenders and walreceiver. Walreceiver is responsible for connecting to the primary server and streaming WAL to disk, while walsender runs in the primary server and streams WAL from disk to the client. Documentation still needs work, but the basics are there. We will probably pull the replication section to a new chapter later on, as well as the sections describing file-based replication. But let's do that as a separate patch, so that it's easier to see what has been added/changed. This patch also adds a new section to the chapter about FE/BE protocol, documenting the protocol used by walsender/walreceivxer. Bump catalog version because of two new functions, pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for monitoring the progress of replication. Fujii Masao, with additional hacking by me
* Write an end-of-backup WAL record at pg_stop_backup(), and wait for it atHeikki Linnakangas2010-01-04
| | | | | | | | | | | | | | | | | | | recovery instead of reading the backup history file. This is more robust, as it stops you from prematurely starting up an inconsisten cluster if the backup history file is lost for some reason, or if the base backup was never finished with pg_stop_backup(). This also paves the way for a simpler streaming replication patch, which doesn't need to care about backup history files anymore. The backup history file is still created and archived as before, but it's not used by the system anymore. It's just for informational purposes now. Bump PG_CONTROL_VERSION as the location of the backup startpoint is now written to a new field in pg_control, and catversion because initdb is required Original patch by Fujii Masao per Simon's idea, with further fixes by me.
* Update copyright for the year 2010.Bruce Momjian2010-01-02
|
* Reset minRecoveryPoint at checkpoints, so that we don't uselessly updateHeikki Linnakangas2009-12-30
| | | | | | it in the control file at crash recovery following an archive recovery. Per Fujii Masao and subsequent discussion.
* Allow read only connections during recovery, known as Hot Standby.Simon Riggs2009-12-19
| | | | | | | | | | | | Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
* Don't error out if recycling or removing an old WAL segment fails at the endHeikki Linnakangas2009-09-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of checkpoint. Although the checkpoint has been written to WAL at that point already, so that all data is safe, and we'll retry removing the WAL segment at the next checkpoint, if such a failure persists we won't be able to remove any other old WAL segments either and will eventually run out of disk space. It's better to treat the failure as non-fatal, and move on to clean any other WAL segment and continue with any other end-of-checkpoint cleanup. We don't normally expect any such failures, but on Windows it can happen with some anti-virus or backup software that lock files without FILE_SHARE_DELETE flag. Also, the loop in pgrename() to retry when the file is locked was broken. If a file is locked on Windows, you get ERROR_SHARE_VIOLATION, not ERROR_ACCESS_DENIED, at least on modern versions. Fix that, although I left the check for ERROR_ACCESS_DENIED in there as well (presumably it was correct in some environment), and added ERROR_LOCK_VIOLATION to be consistent with similar checks in pgwin32_open(). Reduce the timeout on the loop from 30s to 10s, on the grounds that since it's been broken, we've effectively had a timeout of 0s and no-one has complained, so a smaller timeout is actually closer to the old behavior. A longer timeout would mean that if recycling a WAL file fails because it's locked for some reason, InstallXLogFileSegment() will hold ControlFileLock for longer, potentially blocking other backends, so a long timeout isn't totally harmless. While we're at it, set errno correctly in pgrename(). Backpatch to 8.2, which is the oldest version supported on Windows. The xlog.c changes would make sense on other platforms and thus on older versions as well, but since there's no such locking issues on other platforms, it's not worth it.
* On Windows, when a file is deleted and another process still has an openHeikki Linnakangas2009-09-10
| | | | | | | | | | | | | | | file handle on it, the file goes into "pending deletion" state where it still shows up in directory listing, but isn't accessible otherwise. That confuses RemoveOldXLogFiles(), making it think that the file hasn't been archived yet, while it actually was, and it was deleted along with the .done file. Fix that by renaming the file with ".deleted" extension before deleting it. Also check the return value of rename() and unlink(), so that if the removal fails for any reason (e.g another process is holding the file locked), we don't delete the .done file until the WAL file is really gone. Backpatch to 8.2, which is the oldest version supported on Windows.
* Remove flatfiles.c, which is now obsolete.Alvaro Herrera2009-09-01
| | | | | | Recent commits have removed the various uses it was supporting. It was a performance bottleneck, according to bug report #4919 by Lauris Ulmanis; seems it slowed down user creation after a billion users.
* Track the current XID wrap limit (or more accurately, the oldest unfrozenTom Lane2009-08-31
| | | | | | | | | | | | | | | XID) in checkpoint records. This eliminates the need to recompute the value from scratch during database startup, which is one of the two remaining reasons for the flatfile code to exist. It should also simplify life for hot-standby operation. To avoid bloating the checkpoint records unreasonably, I switched from tracking the oldest database by name to tracking it by OID. This turns out to save cycles in general (everywhere but the warning-generating paths, which we hardly care about) and also helps us deal with the case that the oldest database got dropped instead of being vacuumed. The prior coding might go for a long time without updating the wrap limit in that case, which is bad because it might result in a lot of useless autovacuum activity.
* In the checkpoint written at the end of archive recovery, the WAL page headerHeikki Linnakangas2009-08-27
| | | | | | | | | | was incorrectly initialized with timeline ID 0. That rendered the WAL page unrecoverable, making a subsequent archive recovery stop at that point. ThisTimeLineID needs to be initialized before calling AdvanceXLInsertBuffer(). This fixes bug #5011 reported by James Bardin. Backpatch to 8.4, as the bug was introduced by the changes to use of bgwriter for writing the end-of-archive-recovery checkpoint. Patch by Tom Lane.
* Allow backends to start up without use of the flat-file copy of pg_database.Tom Lane2009-08-12
| | | | | | | | | | | | | | | | | | | | | | To make this work in the base case, pg_database now has a nailed-in-cache relation descriptor that is initialized using hardwired knowledge in relcache.c. This means pg_database is added to the set of relations that need to have a Schema_pg_xxx macro maintained in pg_attribute.h. When this path is taken, we'll have to do a seqscan of pg_database to find the row we need. In the normal case, we are able to do an indexscan to find the database's row by name. This is made possible by storing a global relcache init file that describes only the shared catalogs and their indexes (and therefore is usable by all backends in any database). A new backend loads this cache file, finds its database OID after an indexscan on pg_database, and then loads the local relcache init file for that database. This change should effectively eliminate number of databases as a factor in backend startup time, even with large numbers of databases. However, the real reason for doing it is as a first step towards getting rid of the flat files altogether. There are still several other sub-projects to be tackled before that can happen.
* Document that LocalSetXLogInsertAllowed can be re-executed.Tom Lane2009-08-08
| | | | Per comment from Simon.
* rm_cleanup functions need to be allowed to write WAL entries. This oversightTom Lane2009-08-07
| | | | | appears to explain the recent reports of "PANIC: cannot make new WAL entries during recovery".
* Cleanup and code review for the patch that made bgwriter active duringTom Lane2009-06-26
| | | | | | | | | | | | | archive recovery. Invent a separate state variable and inquiry function for XLogInsertAllowed() to clarify some tests and make the management of writing the end-of-recovery checkpoint less klugy. Fix several places that were incorrectly testing InRecovery when they should be looking at RecoveryInProgress or XLogInsertAllowed (because they will now be executed in the bgwriter not startup process). Clarify handling of bad LSNs passed to XLogFlush during recovery. Use a spinlock for setting/testing SharedRecoveryInProgress. Improve quite a lot of comments. Heikki and Tom
* Fix some serious bugs in archive recovery, now that bgwriter is activeHeikki Linnakangas2009-06-25
| | | | | | | | | | | | | | | | | | | | during it: When bgwriter is active, the startup process can't perform mdsync() correctly because it won't see the fsync requests accumulated in bgwriter's private pendingOpsTable. Therefore make bgwriter responsible for the end-of-recovery checkpoint as well, when it's active. When bgwriter is active (= archive recovery), the startup process must not accumulate fsync requests to its own pendingOpsTable, since bgwriter won't see them there when it performs restartpoints. Make startup process drop its pendingOpsTable when bgwriter is launched to avoid that. Update minimum recovery point one last time when leaving archive recovery. It won't be updated by the end-of-recovery checkpoint because XLogFlush() sees us as out of recovery already. This fixes bug #4879 reported by Fujii Masao.
* 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef listBruce Momjian2009-06-11
| | | | provided by Andrew.
* Only recycle normal files in pg_xlog as WAL segments. pg_standby createsHeikki Linnakangas2009-06-02
| | | | | | | | symbolic links with the -l option, and as Fujii Masao pointed out we ended up overwriting files in the archive directory before this patch. Patch by Aidan Van Dyk, Fujii Masao and me. Backpatch to 8.3, where pg_standby was introduced.
* When archiving is enabled, rotate the last WAL segment at shutdown so thatHeikki Linnakangas2009-05-28
| | | | | | all transactions are archived. Original patch by Guillaume Smet.
* Fix all the server-side SIGQUIT handlers (grumble ... why so many identicalTom Lane2009-05-15
| | | | | | | copies?) to ensure they really don't run proc_exit/shmem_exit callbacks, as was intended. I broke this behavior recently by installing atexit callbacks without thinking about the one case where we truly don't want to run those callback functions. Noted in an example from Dave Page.
* Improve a couple of comments.Tom Lane2009-05-14
|
* Add recovery_end_command option to recovery.conf. recovery_end_commandHeikki Linnakangas2009-05-14
| | | | | | | | | | | | | | is run at the end of archive recovery, providing a chance to do external cleanup. Modify pg_standby so that it no longer removes the trigger file, that is to be done using the recovery_end_command now. Provide a "smart" failover mode in pg_standby, where we don't fail over immediately, but only after recovering all unapplied WAL from the archive. That gives you zero data loss assuming all WAL was archived before failover, which is what most users of pg_standby actually want. recovery_end_command by Simon Riggs, pg_standby changes by Fujii Masao and myself.
* Request XLOG switch before writing checkpoint in pg_start_backup(). OtherwiseHeikki Linnakangas2009-05-07
| | | | | | | | | | | | | | | | | | you can end up with an unrecoverable backup if you start a new base backup right after finishing archive recovery. In that scenario, the redo pointer of the checkpoint that pg_start_backup() writes points to the XLOG segment where the timeline-changing end-of-archive-recovery checkpoint is. The beginning of that segment contains pages with the old timeline ID, and we don't accept that in recovery unless we find a history file covering the old timeline ID. If you omit pg_xlog from the base backup and clear the archive directory before starting the backup, there will be no such history file available. The bug is present in all versions since PITR was introduced in 8.0, but I'm back-patching only back to 8.2. Earlier versions didn't have XLOG switch records, making this fix unfeasible. Given the lack of reports until now, it doesn't seem worthwhile to spend more effort to fix 8.0 and 8.1. Per report and suggestion by Mikael Krantz
* After archive recovery, mark the last WAL segment from the parent timelineHeikki Linnakangas2009-04-22
| | | | | | | ready for archival. It was marked at the next checkpoint anyway, but waiting for the next checkpoint is an unnecessary delay. Fujii Masao
* Add an optional parameter to pg_start_backup() that specifies whether to doTom Lane2009-04-07
| | | | | | the checkpoint in immediate or lazy mode. This is to address complaints that pg_start_backup() takes a long time even when there's no need to minimize its I/O consumption.
* Code review for dtrace probes added (so far) to 8.4. Adjust placement ofTom Lane2009-03-11
| | | | | | | some bufmgr probes, take out redundant and memory-leak-inducing path arguments to smgr__md__read__done and smgr__md__write__done, fix bogus attempt to recalculate space used in sort__done, clean up formatting in places where I'm not sure pgindent will do a nice job by itself.
* Reload config file in startup process on SIGHUP.Heikki Linnakangas2009-03-04
| | | | Fujii Masao
* Change the signaling of end-of-recovery. Startup process now indicates endHeikki Linnakangas2009-02-23
| | | | | of recovery by exiting with exit code 0, like in previous releases. Per Tom's suggestion.
* Start background writer during archive recovery. Background writer now performsHeikki Linnakangas2009-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | its usual buffer cleaning duties during archive recovery, and it's responsible for performing restartpoints. This requires some changes in postmaster. When the startup process has done all the initialization and is ready to start WAL redo, it signals the postmaster to launch the background writer. The postmaster is signaled again when the point in recovery is reached where we know that the database is in consistent state. Postmaster isn't interested in that at the moment, but that's the point where we could let other backends in to perform read-only queries. The postmaster is signaled third time when the recovery has ended, so that postmaster knows that it's safe to start accepting connections. The startup process now traps SIGTERM, and performs a "clean" shutdown. If you do a fast shutdown during recovery, a shutdown restartpoint is performed, like a shutdown checkpoint, and postmaster kills the processes cleanly. You still have to continue the recovery at next startup, though. Currently, the background writer is only launched during archive recovery. We could launch it during crash recovery as well, but it seems better to keep that codepath as simple as possible, for the sake of robustness. And it couldn't do any restartpoints during crash recovery anyway, so it wouldn't be that useful. log_restartpoints is gone. Use log_checkpoints instead. This is yet to be documented. This whole operation is a pre-requisite for Hot Standby, but has some value of its own whether the hot standby patch makes 8.4 or not. Simon Riggs, with lots of modifications by me.
* Fix obsolete comment. Zdenek KotalaHeikki Linnakangas2009-02-07
|
* Put back fast-path for the case that there's no backup blocks inHeikki Linnakangas2009-01-23
| | | | | RestoreBkpBlocks. Went missing in my recent refactoring patch, as pointed out by Simon's hot standby patch.
* Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock shouldHeikki Linnakangas2009-01-20
| | | | | | | | | be used instead of the normal exclusive lock, and make WAL redo functions responsible for calling RestoreBkpBlocks(). They know better what kind of a lock they need. At the moment, this just moves things around with no functional change, but makes the hot standby patch that's under review cleaner.
* Re-enable the old code in xlog.c that tried to use posix_fadvise(), so thatTom Lane2009-01-11
| | | | | | | we can get some buildfarm feedback about whether that function is still problematic. (Note that the planned async-preread patch will not really prove anything one way or the other in buildfarm testing, since it will be inactive with default GUC settings.)
* Update copyright for 2009.Bruce Momjian2009-01-01
|
* Change the name of dtrace wal tracepoints:Bruce Momjian2008-12-24
| | | | | | TRACE_POSTGRESQL_WAL_BUFFER_WRITE_DIRTY Robert Lor
* The attached patch contains a couple of fixes in the existing probes andBruce Momjian2008-12-17
| | | | | | | | | | | | | includes a few new ones. - Fixed compilation errors on OS X for probes that use typedefs - Fixed a number of probes to pass ForkNumber per the relation forks patch - The new probes are those that were taken out from the previous submitted patch and required simple fixes. Will submit the other probes that may require more discussion in a separate patch. Robert Lor
* If pg_stop_backup() is called just after switching to a new xlog file,Heikki Linnakangas2008-12-03
| | | | | | wait for the previous instead of the new file to be archived. Based on patch by Simon Riggs.
* Add a startup check that pg_xlog and pg_xlog/archive_status exist.Tom Lane2008-11-09
| | | | | | | If the latter doesn't exist, automatically recreate it. (We don't do this for pg_xlog, though, per discussion.) Jonah Harris
* Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBufferHeikki Linnakangas2008-10-31
| | | | | | | | | | | | functions into one ReadBufferExtended function, that takes the strategy and mode as argument. There's three modes, RBM_NORMAL which is the default used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages without throwing an error. The FSM needs the new mode to recover from corrupt pages, which could happend if we crash after extending an FSM file, and the new page is "torn". Add fork number to some error messages in bufmgr.c, that still lacked it.
* Fix recoveryLastXTime logic so that it actually does what one would expect.Tom Lane2008-10-30
| | | | Per gripe from Kevin Grittner. Backpatch to 8.3, where the bug was introduced.
* Make LC_COLLATE and LC_CTYPE database-level settings. Collation andHeikki Linnakangas2008-09-23
| | | | | | | | ctype are now more like encoding, stored in new datcollate and datctype columns in pg_database. This is a stripped-down version of Radek Strnad's patch, with further changes by me.
* Fix a couple of problems pointed out by Fujii Masao in the 2008-Apr-05 patchTom Lane2008-09-08
| | | | | | | | | | for pg_stop_backup. First, it is possible that the history file name is not alphabetically later than the last WAL file name, so we should explicitly check that both have been archived. Second, the previous coding would wait forever if a checkpoint had managed to remove the WAL file before we look for it. Simon Riggs, plus some code cleanup by me.
* Introduce the concept of relation forks. An smgr relation can now consistHeikki Linnakangas2008-08-11
| | | | | | | | | | | | | | | | of multiple forks, and each fork can be created and grown separately. The bulk of this patch is about changing the smgr API to include an extra ForkNumber argument in every smgr function. Also, smgrscheduleunlink and smgrdounlink no longer implicitly call smgrclose, because other forks might still exist after unlinking one. The callers of those functions have been modified to call smgrclose instead. This patch in itself doesn't have any user-visible effect, but provides the infrastructure needed for upcoming patches. The additional forks envisioned are a rewritten FSM implementation that doesn't rely on a fixed-size shared memory block, and a visibility map to allow skipping portions of a table in VACUUM that have no dead tuples.
* Clean up the use of some page-header-access macros: principally, useTom Lane2008-07-13
| | | | | | | | | | SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that makes the code clearer, and avoid casting between Page and PageHeader where possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas. I did not apply the parts of the proposed patch that would have resulted in slightly changing the on-disk format of hash indexes; it seems to me that's not a win as long as there's any chance of having in-place upgrade for 8.4.
* Fix recovery.conf boolean variables to take the same range of stringBruce Momjian2008-06-30
| | | | values as postgresql.conf.