4 files changed, 925 insertions, 6 deletions
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index f9e45ec3d8c..77ecc4f04b2 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.130 2009/08/07 20:54:31 alvherre Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/backup.sgml,v 2.131 2009/12/19 01:32:30 sriggs Exp $ -->
 
 <chapter id="backup">
  <title>Backup and Restore</title>
@@ -1429,8 +1429,12 @@ archive_command = 'local_backup_script.sh'
    <listitem>
     <para>
      Operations on hash indexes are not presently WAL-logged, so
-     replay will not update these indexes.  The recommended workaround
-     is to manually <xref linkend="sql-reindex" endterm="sql-reindex-title">
+     replay will not update these indexes.  This will mean that any new inserts
+	 will be ignored by the index, updated rows will apparently disappear and
+	 deleted rows will still retain pointers. In other words, if you modify a
+	 table with a hash index on it then you will get incorrect query results
+	 on a standby server.  When recovery completes it is recommended that you
+     manually <xref linkend="sql-reindex" endterm="sql-reindex-title">
      each such index after completing a recovery operation.
     </para>
    </listitem>
@@ -1883,6 +1887,772 @@ if (!triggered)
   </sect2>
  </sect1>
 
+ <sect1 id="hot-standby">
+  <title>Hot Standby</title>
+
+  <indexterm zone="backup">
+   <primary>Hot Standby</primary>
+  </indexterm>
+
+   <para>
+	Hot Standby is the term used to describe the ability to connect to
+	the server and run queries while the server is in archive recovery. This
+	is useful for both log shipping replication and for restoring a backup
+	to an exact state with great precision.
+	The term Hot Standby also refers to the ability of the server to move
+	from recovery through to normal running while users continue running
+	queries and/or continue their connections.
+   </para>
+
+   <para>
+	Running queries in recovery is in many ways the same as normal running
+	though there are a large number of usage and administrative points
+	to note.
+   </para>
+
+  <sect2 id="hot-standby-users">
+   <title>User's Overview</title>
+
+   <para>
+	Users can connect to the database while the server is in recovery
+	and perform read-only queries. Read-only access to catalogs and views
+	will also occur as normal.
+   </para>
+
+   <para>
+	The data on the standby takes some time to arrive from the primary server
+	so there will be a measurable delay between primary and standby. Running the
+	same query nearly simultaneously on both primary and standby might therefore
+	return differing results. We say that data on the standby is eventually
+	consistent with the primary.
+	Queries executed on the standby will be correct with regard to the transactions
+	that had been recovered at the start of the query, or start of first statement,
+	in the case of serializable transactions. In comparison with the primary,
+	the standby returns query results that could have been obtained on the primary
+	at some exact moment in the past.
+   </para>
+
+   <para>
+	When a transaction is started in recovery, the parameter
+	<varname>transaction_read_only</> will be forced to be true, regardless of the
+	<varname>default_transaction_read_only</> setting in <filename>postgresql.conf</>.
+	It can't be manually set to false either. As a result, all transactions
+	started during recovery will be limited to read-only actions only. In all
+	other ways, connected sessions will appear identical to sessions
+	initiated during normal processing mode. There are no special commands
+	required to initiate a connection at this time, so all interfaces
+	work normally without change. After recovery finishes, the session
+	will allow normal read-write transactions at the start of the next
+	transaction, if these are requested.
+   </para>
+
+   <para>
+	Read-only here means "no writes to the permanent database tables".
+	There are no problems with queries that make use of transient sort and
+	work files.
+   </para>
+
+   <para>
+	The following actions are allowed
+
+	<itemizedlist>
+	 <listitem>
+	  <para>
+       Query access - SELECT, COPY TO including views and SELECT RULEs
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Cursor commands - DECLARE, FETCH, CLOSE,
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Parameters - SHOW, SET, RESET
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Transaction management commands
+		<itemizedlist>
+		 <listitem>
+		  <para>
+		   BEGIN, END, ABORT, START TRANSACTION
+	      </para>
+	     </listitem>
+		 <listitem>
+		  <para>
+	       SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
+	      </para>
+	     </listitem>
+		 <listitem>
+		  <para>
+	       EXCEPTION blocks and other internal subtransactions
+	      </para>
+	     </listitem>
+		</itemizedlist>
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       LOCK TABLE, though only when explicitly in one of these modes:
+	   ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Plugins and extensions - LOAD
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+	These actions produce error messages
+
+	<itemizedlist>
+	 <listitem>
+	  <para>
+	   Data Definition Language (DML) - INSERT, UPDATE, DELETE, COPY FROM, TRUNCATE.
+	   Note that there are no allowed actions that result in a trigger
+	   being executed during recovery.
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+	   Data Definition Language (DDL) - CREATE, DROP, ALTER, COMMENT.
+	   This also applies to temporary tables currently because currently their
+	   definition causes writes to catalog tables.
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       SELECT ... FOR SHARE | UPDATE which cause row locks to be written
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       RULEs on SELECT statements that generate DML commands.
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       LOCK TABLE, in short default form, since it requests ACCESS EXCLUSIVE MODE.
+       LOCK TABLE that explicitly requests a mode higher than ROW EXCLUSIVE MODE.
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Transaction management commands that explicitly set non-read only state
+		<itemizedlist>
+		 <listitem>
+		  <para>
+			BEGIN READ WRITE,
+			START TRANSACTION READ WRITE
+	      </para>
+	     </listitem>
+		 <listitem>
+		  <para>
+			SET TRANSACTION READ WRITE,
+			SET SESSION CHARACTERISTICS AS TRANSACTION READ WRITE
+	      </para>
+	     </listitem>
+		 <listitem>
+		  <para>
+	       SET transaction_read_only = off
+	      </para>
+	     </listitem>
+		</itemizedlist>
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       Two-phase commit commands - PREPARE TRANSACTION, COMMIT PREPARED,
+	   ROLLBACK PREPARED because even read-only transactions need to write
+	   WAL in the prepare phase (the first phase of two phase commit).
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+       sequence update - nextval()
+      </para>
+     </listitem>
+	 <listitem>
+	  <para>
+	   LISTEN, UNLISTEN, NOTIFY since they currently write to system tables
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+   <para>
+	Note that current behaviour of read only transactions when not in
+	recovery is to allow the last two actions, so there are small and
+	subtle differences in behaviour between read-only transactions
+	run on standby and during normal running.
+	It is possible that the restrictions on LISTEN, UNLISTEN, NOTIFY and
+	temporary tables may be lifted in a future release, if their internal
+	implementation is altered to make this possible.
+   </para>
+
+   <para>
+	If failover or switchover occurs the database will switch to normal
+	processing mode. Sessions will remain connected while the server
+	changes mode. Current transactions will continue, though will remain
+	read-only. After recovery is complete, it will be possible to initiate
+	read-write transactions.
+   </para>
+
+   <para>
+	Users will be able to tell whether their session is read-only by
+	issuing SHOW transaction_read_only.  In addition a set of
+	functions <xref linkend="functions-recovery-info-table"> allow users to
+	access information about Hot Standby. These allow you to write
+	functions that are aware of the current state of the database. These
+	can be used to monitor the progress of recovery, or to allow you to
+	write complex programs that restore the database to particular states.
+   </para>
+
+   <para>
+	In recovery, transactions will not be permitted to take any table lock
+	higher than RowExclusiveLock. In addition, transactions may never assign
+	a TransactionId and may never write WAL.
+	Any <command>LOCK TABLE</> command that runs on the standby and requests
+	a specific lock mode higher than ROW EXCLUSIVE MODE will be rejected.
+   </para>
+
+   <para>
+	In general queries will not experience lock conflicts with the database
+	changes made by recovery. This is becase recovery follows normal
+	concurrency control mechanisms, known as <acronym>MVCC</>. There are
+	some types of change that will cause conflicts, covered in the following
+	section.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-conflict">
+   <title>Handling query conflicts</title>
+
+   <para>
+	The primary and standby nodes are in many ways loosely connected. Actions
+	on the primary will have an effect on the standby. As a result, there is
+	potential for negative interactions or conflicts between them. The easiest
+	conflict to understand is performance: if a huge data load is taking place
+	on the primary then this will generate a similar stream of WAL records on the
+	standby, so standby queries may contend for system resources, such as I/O.
+   </para>
+
+   <para>
+	There are also additional types of conflict that can occur with Hot Standby.
+	These conflicts are <emphasis>hard conflicts</> in the sense that we may
+	need to cancel queries and in some cases disconnect sessions to resolve them.
+	The user is provided with a number of optional ways to handle these
+	conflicts, though we must first understand the possible reasons behind a conflict.
+
+	  <itemizedlist>
+	   <listitem>
+	    <para>
+		 Access Exclusive Locks from primary node, including both explicit
+		 LOCK commands and various kinds of DDL action
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 Dropping tablespaces on the primary while standby queries are using
+		 those tablespace for temporary work files (work_mem overflow)
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 Dropping databases on the primary while that role is connected on standby.
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 Waiting to acquire buffer cleanup locks (for which there is no time out)
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 Early cleanup of data still visible to the current query's snapshot
+	    </para>
+	   </listitem>
+	  </itemizedlist>
+   </para>
+
+   <para>
+	Some WAL redo actions will be for DDL actions. These DDL actions are
+	repeating actions that have already committed on the primary node, so
+	they must not fail on the standby node. These DDL locks take priority
+	and will automatically *cancel* any read-only transactions that get in
+	their way, after a grace period. This is similar to the possibility of
+	being canceled by the deadlock detector, but in this case the standby
+	process always wins, since the replayed actions must not fail. This
+	also ensures that replication doesn't fall behind while we wait for a
+	query to complete. Again, we assume that the standby is there for high
+	availability purposes primarily.
+   </para>
+
+   <para>
+	An example of the above would be an Administrator on Primary server
+	runs a <command>DROP TABLE</> on a table that's currently being queried
+	in the standby server.
+	Clearly the query cannot continue if we let the <command>DROP TABLE</>
+	proceed. If this situation occurred on the primary, the <command>DROP TABLE</>
+	would wait until the query has finished. When the query is on the standby
+	and the <command>DROP TABLE</> is on the primary, the primary doesn't have
+	information about which queries are running on the standby and so the query
+	does not wait on the primary. The WAL change records come through to the
+	standby while the standby query is still running, causing a conflict.
+   </para>
+
+   <para>
+	The most common reason for conflict between standby queries and WAL redo is
+	"early cleanup". Normally, <productname>PostgreSQL</> allows cleanup of old
+	row versions when there are no users who may need to see them to ensure correct
+	visibility of data (the heart of MVCC). If there is a standby query that has
+	been running for longer than any query on the primary then it is possible
+	for old row versions to be removed by either a vacuum or HOT. This will
+	then generate WAL records that, if applied, would remove data on the
+	standby that might *potentially* be required by the standby query.
+	In more technical language, the primary's xmin horizon is later than
+	the standby's xmin horizon, allowing dead rows to be removed.
+   </para>
+
+   <para>
+	Experienced users should note that both row version cleanup and row version
+	freezing will potentially conflict with recovery queries. Running a
+	manual <command>VACUUM FREEZE</> is likely to cause conflicts even on tables
+	with no updated or deleted rows.
+   </para>
+
+   <para>
+	We have a number of choices for resolving query conflicts.  The default
+	is that we wait and hope the query completes. The server will wait
+	automatically until the lag between primary and standby is at most
+	<varname>max_standby_delay</> seconds. Once that grace period expires,
+	we take one of the following actions:
+
+	  <itemizedlist>
+	   <listitem>
+	    <para>
+		 If the conflict is caused by a lock, we cancel the conflicting standby
+		 transaction immediately. If the transaction is idle-in-transaction
+		 then currently we abort the session instead, though this may change
+		 in the future.
+	    </para>
+	   </listitem>
+
+	   <listitem>
+	    <para>
+		 If the conflict is caused by cleanup records we tell the standby query
+		 that a conflict has occurred and that it must cancel itself to avoid the
+		 risk that it silently fails to read relevant data because
+		 that data has been removed. (This is regrettably very similar to the
+		 much feared and iconic error message "snapshot too old"). Some cleanup
+		 records only cause conflict with older queries, though some types of
+		 cleanup record affect all queries.
+	    </para>
+
+	    <para>
+		 If cancellation does occur, the query and/or transaction can always
+		 be re-executed. The error is dynamic and will not necessarily occur
+		 the same way if the query is executed again.
+	    </para>
+	   </listitem>
+	  </itemizedlist>
+   </para>
+
+   <para>
+	<varname>max_standby_delay</> is set in <filename>postgresql.conf</>.
+	The parameter applies to the server as a whole so if the delay is all used
+	up by a single query then there may be little or no waiting for queries that
+	follow immediately, though they will have benefited equally from the initial
+	waiting period. The server may take time to catch up again before the grace
+	period is available again, though if there is a heavy and constant stream
+	of conflicts it may seldom catch up fully.
+   </para>
+
+   <para>
+	Users should be clear that tables that are regularly and heavily updated on
+	primary server will quickly cause cancellation of longer running queries on
+	the standby. In those cases <varname>max_standby_delay</> can be
+	considered somewhat but not exactly the same as setting
+	<varname>statement_timeout</>.
+    </para>
+
+   <para>
+	Other remedial actions exist if the number of cancellations is unacceptable.
+	The first option is to connect to primary server and keep a query active
+	for as long as we need to run queries on the standby. This guarantees that
+	a WAL cleanup record is never generated and we don't ever get query
+	conflicts as described above. This could be done using contrib/dblink
+	and pg_sleep(), or via other mechanisms. If you do this, you should note
+	that this will delay cleanup of dead rows by vacuum or HOT and many
+	people may find this undesirable. However, we should remember that
+	primary and standby nodes are linked via the WAL, so this situation is no
+	different to the case where we ran the query on the primary node itself
+	except we have the benefit of off-loading the execution onto the standby.
+   </para>
+
+   <para>
+	It is also possible to set <varname>vacuum_defer_cleanup_age</> on the primary
+	to defer the cleanup of records by autovacuum, vacuum and HOT. This may allow
+	more time for queries to execute before they are cancelled on the standby,
+	without the need for setting a high <varname>max_standby_delay</>.
+   </para>
+
+   <para>
+	Three-way deadlocks are possible between AccessExclusiveLocks arriving from
+	the primary, cleanup WAL records that require buffer cleanup locks and
+	user requests that are waiting behind replayed AccessExclusiveLocks. Deadlocks
+	are currently resolved by the cancellation of user processes that would
+	need to wait on a lock. This is heavy-handed and generates more query
+	cancellations than we need to, though does remove the possibility of deadlock.
+	This behaviour is expected to improve substantially for the main release
+	version of 8.5.
+   </para>
+
+   <para>
+	Dropping tablespaces or databases is discussed in the administrator's
+	section since they are not typical user situations.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-admin">
+   <title>Administrator's Overview</title>
+
+   <para>
+	If there is a <filename>recovery.conf</> file present the server will start
+	in Hot Standby mode by default, though <varname>recovery_connections</> can
+	be disabled via <filename>postgresql.conf</>, if required. The server may take
+	some time to enable recovery connections since the server must first complete
+	sufficient recovery to provide a consistent state against which queries
+	can run before enabling read only connections. Look for these messages
+	in the server logs
+
+<programlisting>
+LOG:  initializing recovery connections
+
+... then some time later ...
+
+LOG:  consistent recovery state reached
+LOG:  database system is ready to accept read only connections
+</programlisting>
+
+	Consistency information is recorded once per checkpoint on the primary, as long
+	as <varname>recovery_connections</> is enabled (on the primary). If this parameter
+	is disabled, it will not be possible to enable recovery connections on the standby.
+	The consistent state can also be delayed in the presence of both of these conditions
+
+	  <itemizedlist>
+	   <listitem>
+	    <para>
+		 a write transaction has more than 64 subtransactions
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 very long-lived write transactions
+	    </para>
+	   </listitem>
+	  </itemizedlist>
+
+	If you are running file-based log shipping ("warm standby"), you may need
+	to wait until the next WAL file arrives, which could be as long as the
+	<varname>archive_timeout</> setting on the primary.
+   </para>
+
+   <para>
+	The setting of some parameters on the standby will need reconfiguration
+	if they have been changed on the primary. The value on the standby must
+	be equal to or greater than the value on the primary. If these parameters
+	are not set high enough then the standby will not be able to track work
+	correctly from recovering transactions. If these values are set too low the
+	the server will halt. Higher values can then be supplied and the server
+	restarted to begin recovery again.
+
+	  <itemizedlist>
+	   <listitem>
+	    <para>
+		 <varname>max_connections</>
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 <varname>max_prepared_transactions</>
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+		 <varname>max_locks_per_transaction</>
+	    </para>
+	   </listitem>
+	  </itemizedlist>
+   </para>
+
+   <para>
+	It is important that the administrator consider the appropriate setting
+	of <varname>max_standby_delay</>, set in <filename>postgresql.conf</>.
+	There is no optimal setting and should be set according to business
+	priorities. For example if the server is primarily tasked as a High
+	Availability server, then you may wish to lower
+	<varname>max_standby_delay</> or even set it to zero, though that is a
+	very aggressive setting. If the standby server is tasked as an additional
+	server for decision support queries then it may be acceptable to set this
+	to a value of many hours (in seconds).  It is also possible to set
+	<varname>max_standby_delay</> to -1 which means wait forever for queries
+	to complete, if there are conflicts; this will be useful when performing
+	an archive recovery from a backup.
+   </para>
+
+   <para>
+	Transaction status "hint bits" written on primary are not WAL-logged,
+	so data on standby will likely re-write the hints again on the standby.
+	Thus the main database blocks will produce write I/Os even though
+	all users are read-only; no changes have occurred to the data values
+	themselves.  Users will be able to write large sort temp files and
+	re-generate relcache info files, so there is no part of the database
+	that is truly read-only during hot standby mode. There is no restriction
+	on the use of set returning functions, or other users of tuplestore/tuplesort
+	code. Note also that writes to remote databases will still be possible,
+	even though the transaction is read-only locally.
+   </para>
+
+   <para>
+	The following types of administrator command are not accepted
+	during recovery mode
+
+	  <itemizedlist>
+	   <listitem>
+	    <para>
+	     Data Definition Language (DDL) - e.g. CREATE INDEX
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+	     Privilege and Ownership - GRANT, REVOKE, REASSIGN
+	    </para>
+	   </listitem>
+	   <listitem>
+	    <para>
+	     Maintenance commands - ANALYZE, VACUUM, CLUSTER, REINDEX
+	    </para>
+	   </listitem>
+	  </itemizedlist>
+   </para>
+
+   <para>
+	Note again that some of these commands are actually allowed during
+	"read only" mode transactions on the primary.
+   </para>
+
+   <para>
+	As a result, you cannot create additional indexes that exist solely
+	on the standby, nor can statistics that exist solely on the standby.
+	If these administrator commands are needed they should be executed
+	on the primary so that the changes will propagate through to the
+	standby.
+   </para>
+
+   <para>
+	<function>pg_cancel_backend()</> will work on user backends, but not the
+	Startup process, which performs recovery. pg_stat_activity does not
+	show an entry for the Startup process, nor do recovering transactions
+	show as active. As a result, pg_prepared_xacts is always empty during
+	recovery. If you wish to resolve in-doubt prepared transactions
+	then look at pg_prepared_xacts on the primary and issue commands to
+	resolve those transactions there.
+   </para>
+
+   <para>
+	pg_locks will show locks held by backends as normal. pg_locks also shows
+	a virtual transaction managed by the Startup process that owns all
+	AccessExclusiveLocks held by transactions being replayed by recovery.
+	Note that Startup process does not acquire locks to
+	make database changes and thus locks other than AccessExclusiveLocks
+	do not show in pg_locks for the Startup process, they are just presumed
+	to exist.
+   </para>
+
+   <para>
+	<productname>check_pgsql</> will work, but it is very simple.
+	<productname>check_postgres</> will also work, though many some actions
+	could give different or confusing results.
+	e.g. last vacuum time will not be maintained for example, since no
+	vacuum occurs on the standby (though vacuums running on the primary do
+	send their changes to the standby).
+   </para>
+
+   <para>
+	WAL file control commands will not work during recovery
+	e.g. <function>pg_start_backup</>, <function>pg_switch_xlog</> etc..
+   </para>
+
+   <para>
+	Dynamically loadable modules work, including pg_stat_statements.
+   </para>
+
+   <para>
+	Advisory locks work normally in recovery, including deadlock detection.
+	Note that advisory locks are never WAL logged, so it is not possible for
+	an advisory lock on either the primary or the standby to conflict with WAL
+	replay. Nor is it possible to acquire an advisory lock on the primary
+	and have it initiate a similar advisory lock on the standby. Advisory
+	locks relate only to a single server on which they are acquired.
+   </para>
+
+   <para>
+	Trigger-based replication systems such as <productname>Slony</>,
+	<productname>Londiste</> and <productname>Bucardo</> won't run on the
+	standby at all, though they will run happily on the primary server as
+	long as the changes are not sent to standby servers to be applied.
+	WAL replay is not trigger-based so you cannot relay from the
+	standby to any system that requires additional database writes or
+	relies on the use of triggers.
+   </para>
+
+   <para>
+	New oids cannot be assigned, though some <acronym>UUID</> generators may still
+	work as long as they do not rely on writing new status to the database.
+   </para>
+
+   <para>
+	Currently, temp table creation is not allowed during read only
+	transactions, so in some cases existing scripts will not run correctly.
+	It is possible we may relax that restriction in a later release. This is
+	both a SQL Standard compliance issue and a technical issue.
+   </para>
+
+   <para>
+	<command>DROP TABLESPACE</> can only succeed if the tablespace is empty.
+	Some standby users may be actively using the tablespace via their
+	<varname>temp_tablespaces</> parameter. If there are temp files in the
+	tablespace we currently cancel all active queries to ensure that temp
+	files are removed, so that we can remove the tablespace and continue with
+	WAL replay.
+   </para>
+
+   <para>
+	Running <command>DROP DATABASE</>, <command>ALTER DATABASE ... SET TABLESPACE</>,
+	or <command>ALTER DATABASE ... RENAME</> on primary will generate a log message
+	that will cause all users connected to that database on the standby to be
+	forcibly disconnected, once <varname>max_standby_delay</> has been reached.
+   </para>
+
+   <para>
+	In normal running, if you issue <command>DROP USER</> or <command>DROP ROLE</>
+	for a role with login capability while that user is still connected then
+	nothing happens to the connected user - they remain connected. The user cannot
+	reconnect however. This behaviour applies in recovery also, so a
+	<command>DROP USER</> on the primary does not disconnect that user on the standby.
+   </para>
+
+   <para>
+	Stats collector is active during recovery. All scans, reads, blocks,
+	index usage etc will all be recorded normally on the standby. Replayed
+	actions will not duplicate their effects on primary, so replaying an
+	insert will not increment the Inserts column of pg_stat_user_tables.
+	The stats file is deleted at start of recovery, so stats from primary
+	and standby will differ; this is considered a feature not a bug.
+   </para>
+
+   <para>
+	Autovacuum is not active during recovery, though will start normally
+	at the end of recovery.
+   </para>
+
+   <para>
+	Background writer is active during recovery and will perform
+	restartpoints (similar to checkpoints on primary) and normal block
+	cleaning activities. The <command>CHECKPOINT</> command is accepted during recovery,
+	though performs a restartpoint rather than a new checkpoint.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-parameters">
+   <title>Hot Standby Parameter Reference</title>
+
+   <para>
+	Various parameters have been mentioned above in the <xref linkend="hot-standby-admin">
+	and <xref linkend="hot-standby-conflict"> sections.
+   </para>
+
+   <para>
+	On the primary, parameters <varname>recovery_connections</> and
+	<varname>vacuum_defer_cleanup_age</> can be used to enable and control the
+	primary server to assist the successful configuration of Hot Standby servers.
+	<varname>max_standby_delay</> has no effect if set on the primary.
+   </para>
+
+   <para>
+	On the standby, parameters <varname>recovery_connections</> and
+	<varname>max_standby_delay</> can be used to enable and control Hot Standby.
+	standby server to assist the successful configuration of Hot Standby servers.
+	<varname>vacuum_defer_cleanup_age</> has no effect during recovery.
+   </para>
+  </sect2>
+
+  <sect2 id="hot-standby-caveats">
+   <title>Caveats</title>
+
+   <para>
+    At this writing, there are several limitations of Hot Standby.
+    These can and probably will be fixed in future releases:
+
+  <itemizedlist>
+   <listitem>
+    <para>
+     Operations on hash indexes are not presently WAL-logged, so
+     replay will not update these indexes.  Hash indexes will not be
+	 used for query plans during recovery.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     Full knowledge of running transactions is required before snapshots
+	 may be taken. Transactions that take use large numbers of subtransactions
+	 (currently greater than 64) will delay the start of read only
+	 connections until the completion of the longest running write transaction.
+	 If this situation occurs explanatory messages will be sent to server log.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     Valid starting points for recovery connections are generated at each
+	 checkpoint on the master. If the standby is shutdown while the master
+	 is in a shutdown state it may not be possible to re-enter Hot Standby
+	 until the primary is started up so that it generates further starting
+	 points in the WAL logs. This is not considered a serious issue
+	 because the standby is usually switched into the primary role while
+	 the first node is taken down.
+    </para>
+   </listitem>
+   <listitem>
+    <para>
+     At the end of recovery, AccessExclusiveLocks held by prepared transactions
+	 will require twice the normal number of lock table entries. If you plan
+	 on running either a large number of concurrent prepared transactions
+	 that normally take AccessExclusiveLocks, or you plan on having one
+	 large transaction that takes many AccessExclusiveLocks then you are
+	 advised to select a larger value of <varname>max_locks_per_transaction</>,
+	 up to, but never more than twice the value of the parameter setting on
+	 the primary server in rare extremes. You need not consider this at all if
+	 your setting of <varname>max_prepared_transactions</> is <literal>0</>.
+    </para>
+   </listitem>
+  </itemizedlist>
+
+   </para>
+  </sect2>
+
+ </sect1>
+
  <sect1 id="migration">
   <title>Migration Between Releases</title>
 
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d13e6d151f5..4554cb614a4 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.238 2009/12/17 14:36:16 rhaas Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/config.sgml,v 1.239 2009/12/19 01:32:31 sriggs Exp $ -->
 
 <chapter Id="runtime-config">
   <title>Server Configuration</title>
@@ -376,6 +376,12 @@ SET ENABLE_SEQSCAN TO OFF;
         allows. See <xref linkend="sysvipc"> for information on how to
         adjust those parameters, if necessary.
        </para>
+
+       <para>
+        When running a standby server, you must set this parameter to the
+        same or higher value than on the master server. Otherwise, queries
+        will not be allowed in the standby server.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -826,6 +832,12 @@ SET ENABLE_SEQSCAN TO OFF;
         allows. See <xref linkend="sysvipc"> for information on how to
         adjust those parameters, if necessary.
        </para>
+
+       <para>
+        When running a standby server, you must set this parameter to the
+        same or higher value than on the master server. Otherwise, queries
+        will not be allowed in the standby server.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -1733,6 +1745,51 @@ archive_command = 'copy "%p" "C:\\server\\archivedir\\%f"'  # Windows
      
      </variablelist>
     </sect2>
+
+    <sect2 id="runtime-config-standby">
+    <title>Standby Servers</title>
+
+    <variablelist>
+
+     <varlistentry id="recovery-connections" xreflabel="recovery_connections">
+      <term><varname>recovery_connections</varname> (<type>boolean</type>)</term>
+      <listitem>
+       <para>
+		Parameter has two roles. During recovery, specifies whether or not
+		you can connect and run queries to enable <xref linkend="hot-standby">.
+		During normal running, specifies whether additional information is written
+		to WAL to allow recovery connections on a standby server that reads
+		WAL data generated by this server. The default value is
+        <literal>on</literal>.  It is thought that there is little
+		measurable difference in performance from using this feature, so
+		feedback is welcome if any production impacts are noticeable.
+		It is likely that this parameter will be removed in later releases.
+        This parameter can only be set at server start.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry id="max-standby-delay" xreflabel="max_standby_delay">
+      <term><varname>max_standby_delay</varname> (<type>string</type>)</term>
+      <listitem>
+       <para>
+		When server acts as a standby, this parameter specifies a wait policy
+		for queries that conflict with incoming data changes. Valid settings
+		are -1, meaning wait forever, or a wait time of 0 or more seconds.
+		If a conflict should occur the server will delay up to this
+		amount before it begins trying to resolve things less amicably, as
+		described in <xref linkend="hot-standby-conflict">. Typically,
+		this parameter makes sense only during replication, so when
+		performing an archive recovery to recover from data loss a
+		parameter setting of 0 is recommended.  The default is 30 seconds.
+        This parameter can only be set in the <filename>postgresql.conf</>
+        file or on the server command line.
+       </para>
+      </listitem>
+     </varlistentry>
+
+     </variablelist>
+    </sect2>
    </sect1>
 
    <sect1 id="runtime-config-query">
@@ -4161,6 +4218,29 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-vacuum-defer-cleanup-age" xreflabel="vacuum_defer_cleanup_age">
+      <term><varname>vacuum_defer_cleanup_age</varname> (<type>integer</type>)</term>
+      <indexterm>
+       <primary><varname>vacuum_defer_cleanup_age</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        Specifies the number of transactions by which <command>VACUUM</> and
+		<acronym>HOT</> updates will defer cleanup of dead row versions. The
+		default is 0 transactions, meaning that dead row versions will be
+		removed as soon as possible. You may wish to set this to a non-zero
+		value when planning or maintaining a <xref linkend="hot-standby">
+		configuration. The recommended value is <literal>0</> unless you have
+		clear reason to increase it. The purpose of the parameter is to
+		allow the user to specify an approximate time delay before cleanup
+		occurs. However, it should be noted that there is no direct link with
+		any specific time delay and so the results will be application and
+		installation specific, as well as variable over time, depending upon
+		the transaction rate (of writes only).
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-bytea-output" xreflabel="bytea_output">
       <term><varname>bytea_output</varname> (<type>enum</type>)</term>
       <indexterm>
@@ -4689,6 +4769,12 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
         allows. See <xref linkend="sysvipc"> for information on how to
         adjust those parameters, if necessary.
        </para>
+
+       <para>
+        When running a standby server, you must set this parameter to the
+        same or higher value than on the master server. Otherwise, queries
+        will not be allowed in the standby server.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -5546,6 +5632,32 @@ plruby.use_strict = true        # generates error: unknown class name
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-trace-recovery-messages" xreflabel="trace_recovery_messages">
+      <term><varname>trace_recovery_messages</varname> (<type>string</type>)</term>
+      <indexterm>
+       <primary><varname>trace_recovery_messages</> configuration parameter</primary>
+      </indexterm>
+      <listitem>
+       <para>
+        Controls which message levels are written to the server log
+        for system modules needed for recovery processing. This allows
+        the user to override the normal setting of log_min_messages,
+        but only for specific messages. This is intended for use in
+        debugging Hot Standby.
+        Valid values are <literal>DEBUG5</>, <literal>DEBUG4</>,
+        <literal>DEBUG3</>, <literal>DEBUG2</>, <literal>DEBUG1</>,
+        <literal>INFO</>, <literal>NOTICE</>, <literal>WARNING</>,
+        <literal>ERROR</>, <literal>LOG</>, <literal>FATAL</>, and
+        <literal>PANIC</>.  Each level includes all the levels that
+        follow it.  The later the level, the fewer messages are sent
+        to the log.  The default is <literal>WARNING</>.  Note that
+        <literal>LOG</> has a different rank here than in
+        <varname>client_min_messages</>.
+        Parameter should be set in the postgresql.conf only.
+       </para>
+      </listitem>
+     </varlistentry>
+
     <varlistentry id="guc-zero-damaged-pages" xreflabel="zero_damaged_pages">
       <term><varname>zero_damaged_pages</varname> (<type>boolean</type>)</term>
       <indexterm>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7d6125c97e5..50947274039 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.493 2009/12/15 17:57:46 tgl Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.494 2009/12/19 01:32:31 sriggs Exp $ -->
 
  <chapter id="functions">
   <title>Functions and Operators</title>
@@ -13132,6 +13132,38 @@ postgres=# SELECT * FROM pg_xlogfile_name_offset(pg_stop_backup());
     <xref linkend="continuous-archiving">.
    </para>
 
+   <indexterm>
+    <primary>pg_is_in_recovery</primary>
+   </indexterm>
+
+   <para>
+    The functions shown in <xref
+    linkend="functions-recovery-info-table"> provide information
+	about the current status of Hot Standby.
+    These functions may be executed during both recovery and in normal running.
+   </para>
+
+   <table id="functions-recovery-info-table">
+    <title>Recovery Information Functions</title>
+    <tgroup cols="3">
+     <thead>
+      <row><entry>Name</entry> <entry>Return Type</entry> <entry>Description</entry>
+      </row>
+     </thead>
+
+     <tbody>
+      <row>
+       <entry>
+        <literal><function>pg_is_in_recovery</function>()</literal>
+        </entry>
+       <entry><type>bool</type></entry>
+       <entry>True if recovery is still in progress.
+	   </entry>
+      </row>
+     </tbody>
+    </tgroup>
+   </table>
+
    <para>
     The functions shown in <xref linkend="functions-admin-dbsize"> calculate
     the disk space usage of database objects.
diff --git a/doc/src/sgml/ref/checkpoint.sgml b/doc/src/sgml/ref/checkpoint.sgml
index 76eb273dea2..31f1b0fe192 100644
--- a/doc/src/sgml/ref/checkpoint.sgml
+++ b/doc/src/sgml/ref/checkpoint.sgml
@@ -1,4 +1,4 @@
-<!-- $PostgreSQL: pgsql/doc/src/sgml/ref/checkpoint.sgml,v 1.16 2008/11/14 10:22:45 petere Exp $ -->
+<!-- $PostgreSQL: pgsql/doc/src/sgml/ref/checkpoint.sgml,v 1.17 2009/12/19 01:32:31 sriggs Exp $ -->
 
 <refentry id="sql-checkpoint">
  <refmeta>
@@ -43,6 +43,11 @@ CHECKPOINT
   </para>
 
   <para>
+   If executed during recovery, the <command>CHECKPOINT</command> command
+   will force a restartpoint rather than writing a new checkpoint.
+  </para>
+
+  <para>
    Only superusers can call <command>CHECKPOINT</command>.  The command is
    not intended for use during normal operation.
   </para>