Allow read only connections during recovery, known as Hot Standby.

Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
author: Simon Riggs <simon@2ndQuadrant.com> 2009-12-19 01:32:45 +0000
committer: Simon Riggs <simon@2ndQuadrant.com> 2009-12-19 01:32:45 +0000
commit: efc16ea520679d713d98a2c7bf1453c4ff7b91ec (patch)
tree: 6a39d2af0704a36281dc7df3ec10823eb3e6de75 /doc/src
parent: 78a09145e0f8322e625bbc7d69fcb865ce4f3034 (diff)
download: postgresql-efc16ea520679d713d98a2c7bf1453c4ff7b91ec.tar.gz
postgresql-efc16ea520679d713d98a2c7bf1453c4ff7b91ec.zip
4 files changed, 925 insertions, 6 deletions
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index f9e45ec3d8c..77ecc4f04b2 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.130 2009/08/07 20:54:31 alvherre Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.131 2009/12/19 01:32:30 sriggs Exp $ -->
 
 <chapter id="backup">
  <title>Backup and Restore</title>
@@ -1429,8 +1429,12 @@ archive_command = 'local_backup_script.sh'
    <listitem>
     <para>
      Operations on hash indexes are not presently WAL-logged, so
-     replay will not update these indexes.  The recommended workaround
-     is to manually <xref linkend="sql-reindex" endterm="sql-reindex-title">
+     replay will not update these indexes.  This will mean that any new inserts
+	 will be ignored by the index, updated rows will apparently disappear and
+	 deleted rows will still retain pointers. In other words, if you modify a
+	 table with a hash index on it then you will get incorrect query results
+	 on a standby server.  When recovery completes it is recommended that you
+     manually <xref linkend="sql-reindex" endterm="sql-reindex-title">
      each such index after completing a recovery operation.
     </para>
    </listitem>
@@ -1883,6 +1887,772 @@ if (!triggered)
   </sect2>
  </sect1>
 
+ <sect1 id="hot-standby">
+  <title>Hot Standby</title>
+
+  <indexterm zone="backup">
+   <primary>Hot Standby</primary>
+  </indexterm>
+
+   <para>
+	Hot Standby is the term used to describe the ability to connect to
+	the server and run queries while the server is in archive recovery. This
+	is useful for both log shipping replication and for restoring a backup
+	to an exact state with great precision.
+	The term Hot Standby also refers to the ability of the server to move
+	from recovery through to normal running while users continue running
+	queries and/or continue their connections.
+   </para>
+
+   <para>
+	Running queries in recovery is in many ways the same as normal running
+	though there are a large number of usage and administrative points
+	to note.
+   </para>
+
+  <sect2 id="hot-standby-users">
+   <title>User's Overview</title>
+
+   <para>
+	Users can connect to the database while the server is in recovery
+	and perform read-only queries. Read-only access to catalogs and views
+	will also occur as normal.
+   </para>
+
+   <para>
+	The data on the standby takes some time to arrive from the primary server
+	so there will be a measurable delay between primary and standby. Running the
+	same query nearly simultaneously on both primary and standby might therefore
+	return differing results. We say that data on the standby is eventually
+	consistent with the primary.
+	Queries executed on the standby will be correct with regard to the transactions
+	that had been recovered at the start of the query, or start of first statement,
+	in the case of serializable transactions. In comparison with the primary,
+	the standby returns query results that could have been obtained on the primary
+	at some exact moment in the past.
+   </para>
+
+   <para>
+	When a transaction is started in recovery, the parameter
+	<varname>transaction_read_only</> will be forced to be true, regardless of the
+	<varname>default_transaction_read_only</> setting in <filename>postgresql.conf</>.
+	It can't be manually set to false either. As a result, all transactions
+	started during recovery will be limited to read-only actions only. In all
+	other ways, connected sessions will appear identical to sessions
+	initiated during normal processing mode. There are no special commands
+	required to initiate a connection at this time, so all interfaces
+	work normally without change. After recovery finishes, the session
+	will allow normal read-write transactions at the start of the next
+	transaction, if these are requested.
+   </para>
+
+   <para>
+	Read-only here means "no writes to the permanent database tables".
+	There are no problems with queries that make use of transient sort and
+	work files.
+   </para>
+
+   <para>
+	The following actions are allowed
+
+	<itemizedlist>
+	 <listitem>
+	  <para>
+       Query access - SELECT, COPY TO including views and SELECT RULEs
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Cursor commands - DECLARE, FETCH, CLOSE,
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Parameters - SHOW, SET, RESET
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Transaction management commands
+		<itemizedlist>
+		 <listitem>
+		  <para>
+		   BEGIN, END, ABORT, START TRANSACTION
+	      </para>
+	     </listitem>
+		 <listitem>
+		  <para>
+	       SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
+	      </para>
+	     </listitem>
+		 <listitem>
+		  <para>
+	       EXCEPTION blocks and other internal subtransactions
+	      </para>
+	     </listitem>
+		</itemizedlist>
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       LOCK TABLE, though only when explicitly in one of these modes:
+	   ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Plugins and extensions - LOAD
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+	These actions produce error messages
+
+	<itemizedlist>
+	 <listitem>
+	  <para>
+	   Data Definition Language (DML) - INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE.
+	   Note that there are no allowed actions that result in a trigger
+	   being executed during recovery.
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+	   Data Definition Language (DDL) - CREATE, DROP, ALTER, COMMENT.
+	   This also applies to temporary tables currently because currently their
+	   definition causes writes to catalog tables.
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       SELECT ... FOR SHARE | UPDATE which cause row locks to be written
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       RULEs on SELECT statements that generate DML commands.
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE.
+       LOCK TABLE that explicitly requests a mode higher than ROW EXCLUSIVE MODE.
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Transaction management commands that explicitly set non-read only state
+		<itemizedlist>
+		 <listitem>
+		  <para>
+			BEGIN READ WRITE,
+			START TRANSACTION READ WRITE
+	      </para>
+	     </listitem>
+		 <listitem>
+		  <para>
+			SET TRANSACTION READ WRITE,
+			SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
+	      </para>
+	     </listitem>
+		 <listitem>
+		  <para>
+	       SET transaction_read_only = off
+	      </para>
+	     </listitem>
+		</itemizedlist>
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Two-phase commit commands - PREPARE TRANSACTION, COMMIT PREPARED,
+	   ROLLBACK PREPARED because even read-only transactions need to write
+	   WAL in the prepare phase (the first phase of two phase commit).
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       sequence update - nextval()
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+	   LISTEN, UNLISTEN, NOTIFY since they currently write to system tables
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+	Note that current behaviour of read only transactions when not in
+	recovery is to allow the last two actions, so there are small and
+	subtle differences in behaviour between read-only transactions
+	run on standby and during normal running.
+	It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
+	temporary tables may be lifted in a future release, if their internal
+	implementation is altered to make this possible.
+   </para>
+
+   <para>
+	If failover or switchover occurs the database will switch to normal
+	processing mode. Sessions will remain connected while the server
+	changes mode. Current transactions will continue, though will remain
+	read-only. After recovery is complete, it will be possible to initiate
+	read-write transactions.
+   </para>
+
+   <para>
+	Users will be able to tell whether their session is read-only by
+	issuing SHOW transaction_read_only.  In addition a set of
+	functions <xref linkend="functions-recovery-info-table"> allow users to
+	access information about Hot Standby. These allow you to write
+	functions that are aware of the current state of the database. These
+	can be used to monitor the progress of recovery, or to allow you to
+	write complex programs that restore the database to particular states.
+   </para>
+
+   <para>
+	In recovery, transactions will not be permitted to take any table lock
+	higher than RowExclusiveLock. In addition, transactions may never assign
+	a TransactionId and may never write WAL.
+	Any <command>LOCK TABLE</> command that runs on the standby and requests
+	a specific lock mode higher than ROW EXCLUSIVE MODE will be rejected.
+   </para>
+
+   <para>
+	In general queries will not experience lock conflicts with the database
+	changes made by recovery. This is becase recovery follows normal
+	concurrency control mechanisms, known as <acronym>MVCC</>. There are
+	some types of change that will cause conflicts, covered in the following
+	section.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-conflict">
+   <title>Handling query conflicts</title>
+
+   <para>
+	The primary and standby nodes are in many ways loosely connected. Actions
+	on the primary will have an effect on the standby. As a result, there is
+	potential for negative interactions or conflicts between them. The easiest
+	conflict to understand is performance: if a huge data load is taking place
+	on the primary then this will generate a similar stream of WAL records on the
+	standby, so standby queries may contend for system resources, such as I/O.
+   </para>
+
+   <para>
+	There are also additional types of conflict that can occur with Hot Standby.
+	These conflicts are <emphasis>hard conflicts</> in the sense that we may
+	need to cancel queries and in some cases disconnect sessions to resolve them.
+	The user is provided with a number of optional ways to handle these
+	conflicts, though we must first understand the possible reasons behind a conflict.
+
+	  <itemizedlist>
+	   <listitem>
+	    <para>
+		 Access Exclusive Locks from primary node, including both explicit
+		 LOCK commands and various kinds of DDL action
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 Dropping tablespaces on the primary while standby queries are using
+		 those tablespace for temporary work files (work_mem overflow)
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 Dropping databases on the primary while that role is connected on standby.
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 Waiting to acquire buffer cleanup locks (for which there is no time out)
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 Early cleanup of data still visible to the current query's snapshot
+	    </para>
+	   </listitem>
+	  </itemizedlist>
+   </para>
+
+   <para>
+	Some WAL redo actions will be for DDL actions. These DDL actions are
+	repeating actions that have already committed on the primary node, so
+	they must not fail on the standby node. These DDL locks take priority
+	and will automatically *cancel* any read-only transactions that get in
+	their way, after a grace period. This is similar to the possibility of
+	being canceled by the deadlock detector, but in this case the standby
+	process always wins, since the replayed actions must not fail. This
+	also ensures that replication doesn't fall behind while we wait for a
+	query to complete. Again, we assume that the standby is there for high
+	availability purposes primarily.
+   </para>
+
+   <para>
+	An example of the above would be an Administrator on Primary server
+	runs a <command>DROP TABLE</> on a table that's currently being queried
+	in the standby server.
+	Clearly the query cannot continue if we let the <command>DROP TABLE</>
+	proceed. If this situation occurred on the primary, the <command>DROP TABLE</>
+	would wait until the query has finished. When the query is on the standby
+	and the <command>DROP TABLE</> is on the primary, the primary doesn't have
+	information about which queries are running on the standby and so the query
+	does not wait on the primary. The WAL change records come through to the
+	standby while the standby query is still running, causing a conflict.
+   </para>
+
+   <para>
+	The most common reason for conflict between standby queries and WAL redo is
+	"early cleanup". Normally, <productname>PostgreSQL</> allows cleanup of old
+	row versions when there are no users who may need to see them to ensure correct
+	visibility of data (the heart of MVCC). If there is a standby query that has
+	been running for longer than any query on the primary then it is possible
+	for old row versions to be removed by either a vacuum or HOT. This will
+	then generate WAL records that, if applied, would remove data on the
+	standby that might *potentially* be required by the standby query.
+	In more technical language, the primary's xmin horizon is later than
+	the standby's xmin horizon, allowing dead rows to be removed.
+   </para>
+
+   <para>
+	Experienced users should note that both row version cleanup and row version
+	freezing will potentially conflict with recovery queries. Running a
+	manual <command>VACUUM FREEZE</> is likely to cause conflicts even on tables
+	with no updated or deleted rows.
+   </para>
+
+   <para>
+	We have a number of choices for resolving query conflicts.  The default
+	is that we wait and hope the query completes. The server will wait
+	automatically until the lag between primary and standby is at most
+	<varname>max_standby_delay</> seconds. Once that grace period expires,
+	we take one of the following actions:
+
+	  <itemizedlist>
+	   <listitem>
+	    <para>
+		 If the conflict is caused by a lock, we cancel the conflicting standby
+		 transaction immediately. If the transaction is idle-in-transaction
+		 then currently we abort the session instead, though this may change
+		 in the future.
+	    </para>
+	   </listitem>
+
+	   <listitem>
+	    <para>
+		 If the conflict is caused by cleanup records we tell the standby query
+		 that a conflict has occurred and that it must cancel itself to avoid the
+		 risk that it silently fails to read relevant data because
+		 that data has been removed. (This is regrettably very similar to the
+		 much feared and iconic error message "snapshot too old"). Some cleanup
+		 records only cause conflict with older queries, though some types of
+		 cleanup record affect all queries.
+	    </para>
+
+	    <para>
+		 If cancellation does occur, the query and/or transaction can always
+		 be re-executed. The error is dynamic and will not necessarily occur
+		 the same way if the query is executed again.
+	    </para>
+	   </listitem>
+	  </itemizedlist>
+   </para>
+
+   <para>
+	<varname>max_standby_delay</> is set in <filename>postgresql.conf</>.
+	The parameter applies to the server as a whole so if the delay is all used
+	up by a single query then there may be little or no waiting for queries that
+	follow immediately, though they will have benefited equally from the initial
+	waiting period. The server may take time to catch up again before the grace
+	period is available again, though if there is a heavy and constant stream
+	of conflicts it may seldom catch up fully.
+   </para>
+
+   <para>
+	Users should be clear that tables that are regularly and heavily updated on
+	primary server will quickly cause cancellation of longer running queries on
+	the standby. In those cases <varname>max_standby_delay</> can be
+	considered somewhat but not exactly the same as setting
+	<varname>statement_timeout</>.
+    </para>
+
+   <para>
+	Other remedial actions exist if the number of cancellations is unacceptable.
+	The first option is to connect to primary server and keep a query active
+	for as long as we need to run queries on the standby. This guarantees that
+	a WAL cleanup record is never generated and we don't ever get query
+	conflicts as described above. This could be done using contrib/dblink
+	and pg_sleep(), or via other mechanisms. If you do this, you should note
+	that this will delay cleanup of dead rows by vacuum or HOT and many
+	people may find this undesirable. However, we should remember that
+	primary and standby nodes are linked via the WAL, so this situation is no
+	different to the case where we ran the query on the primary node itself
+	except we have the benefit of off-loading the execution onto the standby.
+   </para>
+
+   <para>
+	It is also possible to set <varname>vacuum_defer_cleanup_age</> on the primary
+	to defer the cleanup of records by autovacuum, vacuum and HOT. This may allow
+	more time for queries to execute before they are cancelled on the standby,
+	without the need for setting a high <varname>max_standby_delay</>.
+   </para>
+
+   <para>
+	Three-way deadlocks are possible between AccessExclusiveLocks arriving from
+	the primary, cleanup WAL records that require buffer cleanup locks and
+	user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks
+	are currently resolved by the cancellation of user processes that would
+	need to wait on a lock. This is heavy-handed and generates more query
+	cancellations than we need to, though does remove the possibility of deadlock.
+	This behaviour is expected to improve substantially for the main release
+	version of 8.5.
+   </para>
+
+   <para>
+	Dropping tablespaces or databases is discussed in the administrator's
+	section since they are not typical user situations.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-admin">
+   <title>Administrator's Overview</title>
+
+   <para>
+	If there is a <filename>recovery.conf</> file present the server will start
+	in Hot Standby mode by default, though <varname>recovery_connections</> can
+	be disabled via <filename>postgresql.conf</>, if required. The server may take
+	some time to enable recovery connections since the server must first complete
+	sufficient recovery to provide a consistent state against which queries
+	can run before enabling read only connections. Look for these messages
+	in the server logs
+
+<programlisting>
+LOG:  initializing recovery connections
+
+... then some time later ...
+
+LOG:  consistent recovery state reached
+LOG:  database system is ready to accept read only connections
+</programlisting>
+
+	Consistency information is recorded once per checkpoint on the primary, as long
+	as <varname>recovery_connections</> is enabled (on the primary). If this parameter
+	is disabled, it will not be possible to enable recovery connections on the standby.
+	The consistent state can also be delayed in the presence of both of these conditions
+
+	  <itemizedlist>
+	   <listitem>
+	    <para>
+		 a write transaction has more than 64 subtransactions
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 very long-lived write transactions
+	    </para>
+	   </listitem>
+	  </itemizedlist>
+
+	If you are running file-based log shipping ("warm standby"), you may need
+	to wait until the next WAL file arrives, which could be as long as the
+	<varname>archive_timeout</> setting on the primary.
+   </para>
+
+   <para>
+	The setting of some parameters on the standby will need reconfiguration
+	if they have been changed on the primary. The value on the standby must
+	be equal to or greater than the value on the primary. If these parameters
+	are not set high enough then the standby will not be able to track work
+	correctly from recovering transactions. If these values are set too low the
+	the server will halt. Higher values can then be supplied and the server
+	restarted to begin recovery again.
+
+	  <itemizedlist>
+	   <listitem>
+	    <para>
+		 <varname>max_connections</>
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 <varname>max_prepared_transactions</>
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 <varname>max_locks_per_transaction</>
+	    </para>
+	   </listitem>
+	  </itemizedlist>
+   </para>
+
+   <para>
+	It is important that the administrator consider the appropriate setting
+	of <varname>max_standby_delay</>, set in <filename>postgresql.conf</>.
+	There is no optimal setting and should be set according to business
+	priorities. For example if the server is primarily tasked as a High
+	Availability server, then you may wish to lower
+	<varname>max_standby_delay</> or even set it to zero, though that is a
+	very aggressive setting. If the standby server is tasked as an additional
+	server for decision support queries then it may be acceptable to set this
+	to a value of many hours (in seconds).  It is also possible to set
+	<varname>max_standby_delay</> to -1 which means wait forever for queries
+	to complete, if there are conflicts; this will be useful when performing
+	an archive recovery from a backup.
+   </para>
+
+   <para>
+	Transaction status "hint bits" written on primary are not WAL-logged,
+	so data on standby will likely re-write the hints again on the standby.
+	Thus the main database blocks will produce write I/Os even though
+	all users are read-only; no changes have occurred to the data values
+	themselves.  Users will be able to write large sort temp files and
+	re-generate relcache info files, so there is no part of the database
+	that is truly read-only during hot standby mode. There is no restriction
+	on the use of set returning functions, or other users of tuplestore/tuplesort
+	code. Note also that writes to remote databases will still be possible,
+	even though the transaction is read-only locally.
+   </para>
+
+   <para>
+	The following types of administrator command are not accepted
+	during recovery mode
+
+	  <itemizedlist>
+	   <listitem>
+	    <para>
+	     Data Definition Language (DDL) - e.g. CREATE INDEX
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+	     Privilege and Ownership - GRANT, REVOKE, REASSIGN
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+	     Maintenance commands - ANALYZE, VACUUM, CLUSTER, REINDEX
+	    </para>
+	   </listitem>
+	  </itemizedlist>
+   </para>
+
+   <para>
+	Note again that some of these commands are actually allowed during
+	"read only" mode transactions on the primary.
+   </para>
+
+   <para>
+	As a result, you cannot create additional indexes that exist solely
+	on the standby, nor can statistics that exist solely on the standby.
+	If these administrator commands are needed they should be executed
+	on the primary so that the changes will propagate through to the
+	standby.
+   </para>
+
+   <para>
+	<function>pg_cancel_backend()</> will work on user backends, but not the
+	Startup process, which performs recovery. pg_stat_activity does not
+	show an entry for the Startup process, nor do recovering transactions
+	show as active. As a result, pg_prepared_xacts is always empty during
+	recovery. If you wish to resolve in-doubt prepared transactions
+	then look at pg_prepared_xacts on the primary and issue commands to
+	resolve those transactions there.
+   </para>
+
+   <para>
+	pg_locks will show locks held by backends as normal. pg_locks also shows
+	a virtual transaction managed by the Startup process that owns all
+	AccessExclusiveLocks held by transactions being replayed by recovery.
+	Note that Startup process does not acquire locks to
+	make database changes and thus locks other than AccessExclusiveLocks
+	do not show in pg_locks for the Startup process, they are just presumed
+	to exist.
+   </para>
+
+   <para>
+	<productname>check_pgsql</> will work, but it is very simple.
+	<productname>check_postgres</> will also work, though many some actions
+	could give different or confusing results.
+	e.g. last vacuum time will not be maintained for example, since no
+	vacuum occurs on the standby (though vacuums running on the primary do
+	send their changes to the standby).
+   </para>
+
+   <para>
+	WAL file control commands will not work during recovery
+	e.g. <function>pg_start_backup</>, <function>pg_switch_xlog</> etc..
+   </para>
+
+   <para>
+	Dynamically loadable modules work, including pg_stat_statements.
+   </para>
+
+   <para>
+	Advisory locks work normally in recovery, including deadlock detection.
+	Note that advisory locks are never WAL logged, so it is not possible for
+	an advisory lock on either the primary or the standby to conflict with WAL
+	replay. Nor is it possible to acquire an advisory lock on the primary
+	and have it initiate a similar advisory lock on the standby. Advisory
+	locks relate only to a single server on which they are acquired.
+   </para>
+
+   <para>
+	Trigger-based replication systems such as <productname>Slony</>,
+	<productname>Londiste</> and <productname>Bucardo</> won't run on the
+	standby at all, though they will run happily on the primary server as
+	long as the changes are not sent to standby servers to be applied.
+	WAL replay is not trigger-based so you cannot relay from the
+	standby to any system that requires additional database writes or
+	relies on the use of triggers.
+   </para>
+
+   <para>
+	New oids cannot be assigned, though some <acronym>UUID</> generators may still
+	work as long as they do not rely on writing new status to the database.
+   </para>
+
+   <para>
+	Currently, temp table creation is not allowed during read only
+	transactions, so in some cases existing scripts will not run correctly.
+	It is possible we may relax that restriction in a later release. This is
+	both a SQL Standard compliance issue and a technical issue.
+   </para>
+
+   <para>
+	<command>DROP TABLESPACE</> can only succeed if the tablespace is empty.
+	Some standby users may be actively using the tablespace via their
+	<varname>temp_tablespaces</> parameter. If there are temp files in the
+	tablespace we currently cancel all active queries to ensure that temp
+	files are removed, so that we can remove the tablespace and continue with
+	WAL replay.
+   </para>
+
+   <para>
+	Running <command>DROP DATABASE</>, <command>ALTER DATABASE ... SET TABLESPACE</>,
+	or <command>ALTER DATABASE ... RENAME</> on primary will generate a log message
+	that will cause all users connected to that database on the standby to be
+	forcibly disconnected, once <varname>max_standby_delay</> has been reached.
+   </para>
+
+   <para>
+	In normal running, if you issue <command>DROP USER</> or <command>DROP ROLE</>
+	for a role with login capability while that user is still connected then
+	nothing happens to the connected user - they remain connected. The user cannot
+	reconnect however. This behaviour applies in recovery also, so a
+	<command>DROP USER</> on the primary does not disconnect that user on the standby.
+   </para>
+
+   <para>
+	Stats collector is active during recovery. All scans, reads, blocks,
+	index usage etc will all be recorded normally on the standby. Replayed
+	actions will not duplicate their effects on primary, so replaying an
+	insert will not increment the Inserts column of pg_stat_user_tables.
+	The stats file is deleted at start of recovery, so stats from primary
+	and standby will differ; this is considered a feature not a bug.
+   </para>
+
+   <para>
+	Autovacuum is not active during recovery, though will start normally
+	at the end of recovery.
+   </para>
+
+   <para>
+	Background writer is active during recovery and will perform
+	restartpoints (similar to checkpoints on primary) and normal block
+	cleaning activities. The <command>CHECKPOINT</> command is accepted during recovery,
+	though performs a restartpoint rather than a new checkpoint.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-parameters">
+   <title>Hot Standby Parameter Reference</title>
+
+   <para>
+	Various parameters have been mentioned above in the <xref linkend="hot-standby-admin">
+	and <xref linkend="hot-standby-conflict"> sections.
+   </para>
+
+   <para>
+	On the primary, parameters <varname>recovery_connections</> and
+	<varname>vacuum_defer_cleanup_age</> can be used to enable and control the
+	primary server to assist the successful configuration of Hot Standby servers.
+	<varname>max_standby_delay</> has no effect if set on the primary.
+   </para>
+
+   <para>
+	On the standby, parameters <varname>recovery_connections</> and
+	<varname>max_standby_delay</> can be used to enable and control Hot Standby.
+	standby server to assist the successful configuration of Hot Standby servers.
+	<varname>vacuum_defer_cleanup_age</> has no effect during recovery.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-caveats">
+   <title>Caveats</title>
+
+   <para>
+    At this writing, there are several limitations of Hot Standby.
+    These can and probably will be fixed in future releases:
+
+  <itemizedlist>
+   <listitem>
+    <para>
+     Operations on hash indexes are not presently WAL-logged, so
+     replay will not update these indexes.  Hash indexes will not be
+	 used for query plans during recovery.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     Full knowledge of running transactions is required before snapshots
+	 may be taken. Transactions that take use large numbers of subtransactions
+	 (currently greater than 64) will delay the start of read only
+	 connections until the completion of the longest running write transaction.
+	 If this situation occurs explanatory messages will be sent to server log.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     Valid starting points for recovery connections are generated at each
+	 checkpoint on the master. If the standby is shutdown while the master
+	 is in a shutdown state it may not be possible to re-enter Hot Standby
+	 until the primary is started up so that it generates further starting
+	 points in the WAL logs. This is not considered a serious issue
+	 because the standby is usually switched into the primary role while
+	 the first node is taken down.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     At the end of recovery, AccessExclusiveLocks held by prepared transactions
+	 will require twice the normal number of lock table entries. If you plan
+	 on running either a large number of concurrent prepared transactions
+	 that normally take AccessExclusiveLocks, or you plan on having one
+	 large transaction that takes many AccessExclusiveLocks then you are
+	 advised to select a larger value of <varname>max_locks_per_transaction</>,
+	 up to, but never more than twice the value of the parameter setting on
+	 the primary server in rare extremes. You need not consider this at all if
+	 your setting of <varname>max_prepared_transactions</> is <literal>0</>.
+    </para>
+   </listitem>
+  </itemizedlist>
+
+   </para>
+  </sect2>
+
+ </sect1>
+
  <sect1 id="migration">
   <title>Migration Between Releases</title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d13e6d151f5..4554cb614a4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.238 2009/12/17 14:36:16 rhaas Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.239 2009/12/19 01:32:31 sriggs Exp $ -->
 
 <chapter Id="runtime-config">
   <title>Server Configuration</title>
@@ -376,6 +376,12 @@ SET ENABLE_SEQSCAN TO OFF;
         allows. See <xref linkend="sysvipc"> for information on how to
         adjust those parameters, if necessary.
        </para>
+
+       <para>
+        When running a standby server, you must set this parameter to the
+        same or higher value than on the master server. Otherwise, queries
+        will not be allowed in the standby server.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -826,6 +832,12 @@ SET ENABLE_SEQSCAN TO OFF;
         allows. See <xref linkend="sysvipc"> for information on how to
         adjust those parameters, if necessary.
        </para>
+
+       <para>
+        When running a standby server, you must set this parameter to the
+        same or higher value than on the master server. Otherwise, queries
+        will not be allowed in the standby server.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -1733,6 +1745,51 @@ archive_command = 'copy "%p" "C:\\server\\archivedir\\%f"'  # Windows
      
      </variablelist>
     </sect2>
+
+    <sect2 id="runtime-config-standby">
+    <title>Standby Servers</title>
+
+    <variablelist>
+
+     <varlistentry id="recovery-connections" xreflabel="recovery_connections">
+      <term><varname>recovery_connections</varname> (<type>boolean</type>)</term>
+      <listitem>
+       <para>
+		Parameter has two roles. During recovery, specifies whether or not
+		you can connect and run queries to enable <xref linkend="hot-standby">.
+		During normal running, specifies whether additional information is written
+		to WAL to allow recovery connections on a standby server that reads
+		WAL data generated by this server. The default value is
+        <literal>on</literal>.  It is thought that there is little
+		measurable difference in performance from using this feature, so
+		feedback is welcome if any production impacts are noticeable.
+		It is likely that this parameter will be removed in later releases.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="max-standby-delay" xreflabel="max_standby_delay">
+      <term><varname>max_standby_delay</varname> (<type>string</type>)</term>
+      <listitem>
+       <para>
+		When server acts as a standby, this parameter specifies a wait policy
+		for queries that conflict with incoming data changes. Valid settings
+		are -1, meaning wait forever, or a wait time of 0 or more seconds.
+		If a conflict should occur the server will delay up to this
+		amount before it begins trying to resolve things less amicably, as
+		described in <xref linkend="hot-standby-conflict">. Typically,
+		this parameter makes sense only during replication, so when
+		performing an archive recovery to recover from data loss a
+		parameter setting of 0 is recommended.  The default is 30 seconds.
+        This parameter can only be set in the <filename>postgresql.conf</>
+        file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     </variablelist>
+    </sect2>
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -4161,6 +4218,29 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-vacuum-defer-cleanup-age" xreflabel="vacuum_defer_cleanup_age">
+      <term><varname>vacuum_defer_cleanup_age</varname> (<type>integer</type>)</term>
+      <indexterm>
+       <primary><varname>vacuum_defer_cleanup_age</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        Specifies the number of transactions by which <command>VACUUM</> and
+		<acronym>HOT</> updates will defer cleanup of dead row versions. The
+		default is 0 transactions, meaning that dead row versions will be
+		removed as soon as possible. You may wish to set this to a non-zero
+		value when planning or maintaining a <xref linkend="hot-standby">
+		configuration. The recommended value is <literal>0</> unless you have
+		clear reason to increase it. The purpose of the parameter is to
+		allow the user to specify an approximate time delay before cleanup
+		occurs. However, it should be noted that there is no direct link with
+		any specific time delay and so the results will be application and
+		installation specific, as well as variable over time, depending upon
+		the transaction rate (of writes only).
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-bytea-output" xreflabel="bytea_output">
       <term><varname>bytea_output</varname> (<type>enum</type>)</term>
       <indexterm>
@@ -4689,6 +4769,12 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
         allows. See <xref linkend="sysvipc"> for information on how to
         adjust those parameters, if necessary.
        </para>
+
+       <para>
+        When running a standby server, you must set this parameter to the
+        same or higher value than on the master server. Otherwise, queries
+        will not be allowed in the standby server.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -5546,6 +5632,32 @@ plruby.use_strict = true        # generates error: unknown class name
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-trace-recovery-messages" xreflabel="trace_recovery_messages">
+      <term><varname>trace_recovery_messages</varname> (<type>string</type>)</term>
+      <indexterm>
+       <primary><varname>trace_recovery_messages</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        Controls which message levels are written to the server log
+        for system modules needed for recovery processing. This allows
+        the user to override the normal setting of log_min_messages,
+        but only for specific messages. This is intended for use in
+        debugging Hot Standby.
+        Valid values are <literal>DEBUG5</>, <literal>DEBUG4</>,
+        <literal>DEBUG3</>, <literal>DEBUG2</>, <literal>DEBUG1</>,
+        <literal>INFO</>, <literal>NOTICE</>, <literal>WARNING</>,
+        <literal>ERROR</>, <literal>LOG</>, <literal>FATAL</>, and
+        <literal>PANIC</>.  Each level includes all the levels that
+        follow it.  The later the level, the fewer messages are sent
+        to the log.  The default is <literal>WARNING</>.  Note that
+        <literal>LOG</> has a different rank here than in
+        <varname>client_min_messages</>.
+        Parameter should be set in the postgresql.conf only.
+       </para>
+      </listitem>
+     </varlistentry>
+
     <varlistentry id="guc-zero-damaged-pages" xreflabel="zero_damaged_pages">
       <term><varname>zero_damaged_pages</varname> (<type>boolean</type>)</term>
       <indexterm>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7d6125c97e5..50947274039 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.493 2009/12/15 17:57:46 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.494 2009/12/19 01:32:31 sriggs Exp $ -->
 
  <chapter id="functions">
   <title>Functions and Operators</title>
@@ -13132,6 +13132,38 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
     <xref linkend="continuous-archiving">.
    </para>
 
+   <indexterm>
+    <primary>pg_is_in_recovery</primary>
+   </indexterm>
+
+   <para>
+    The functions shown in <xref
+    linkend="functions-recovery-info-table"> provide information
+	about the current status of Hot Standby.
+    These functions may be executed during both recovery and in normal running.
+   </para>
+
+   <table id="functions-recovery-info-table">
+    <title>Recovery Information Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_is_in_recovery</function>()</literal>
+        </entry>
+       <entry><type>bool</type></entry>
+       <entry>True if recovery is still in progress.
+	   </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
    <para>
     The functions shown in <xref linkend="functions-admin-dbsize"> calculate
     the disk space usage of database objects.
diff --git a/doc/src/sgml/ref/checkpoint.sgml b/doc/src/sgml/ref/checkpoint.sgml
index 76eb273dea2..31f1b0fe192 100644
--- a/doc/src/sgml/ref/checkpoint.sgml
+++ b/doc/src/sgml/ref/checkpoint.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/ref/checkpoint.sgml,v 1.16 2008/11/14 10:22:45 petere Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/ref/checkpoint.sgml,v 1.17 2009/12/19 01:32:31 sriggs Exp $ -->
 
 <refentry id="sql-checkpoint">
  <refmeta>
@@ -43,6 +43,11 @@ CHECKPOINT
   </para>
 
   <para>
+   If executed during recovery, the <command>CHECKPOINT</command> command
+   will force a restartpoint rather than writing a new checkpoint.
+  </para>
+
+  <para>
    Only superusers can call <command>CHECKPOINT</command>.  The command is
    not intended for use during normal operation.
   </para>
author	Simon Riggs <simon@2ndQuadrant.com>	2009-12-19 01:32:45 +0000
committer	Simon Riggs <simon@2ndQuadrant.com>	2009-12-19 01:32:45 +0000
commit	efc16ea520679d713d98a2c7bf1453c4ff7b91ec (patch)
tree	6a39d2af0704a36281dc7df3ec10823eb3e6de75 /doc/src
parent	78a09145e0f8322e625bbc7d69fcb865ce4f3034 (diff)
download	postgresql-efc16ea520679d713d98a2c7bf1453c4ff7b91ec.tar.gz postgresql-efc16ea520679d713d98a2c7bf1453c4ff7b91ec.zip