Allow read only connections during recovery, known as Hot Standby.

Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
author: Simon Riggs <simon@2ndQuadrant.com> 2009-12-19 01:32:45 +0000
committer: Simon Riggs <simon@2ndQuadrant.com> 2009-12-19 01:32:45 +0000
commit: efc16ea520679d713d98a2c7bf1453c4ff7b91ec (patch)
tree: 6a39d2af0704a36281dc7df3ec10823eb3e6de75 /src/backend/utils/time/tqual.c
parent: 78a09145e0f8322e625bbc7d69fcb865ce4f3034 (diff)
download: postgresql-efc16ea520679d713d98a2c7bf1453c4ff7b91ec.tar.gz
postgresql-efc16ea520679d713d98a2c7bf1453c4ff7b91ec.zip
1 files changed, 66 insertions, 24 deletions
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 6d8f86acc96..32eeabb9994 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -50,7 +50,7 @@
  * Portions Copyright (c) 1994, Regents of the University of California
  *
  * IDENTIFICATION
- *	  $PostgreSQL: pgsql/src/backend/utils/time/tqual.c,v 1.113 2009/06/11 14:49:06 momjian Exp $
+ *	  $PostgreSQL: pgsql/src/backend/utils/time/tqual.c,v 1.114 2009/12/19 01:32:37 sriggs Exp $
  *
  *-------------------------------------------------------------------------
  */
@@ -1257,42 +1257,84 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 		return true;
 
 	/*
-	 * If the snapshot contains full subxact data, the fastest way to check
-	 * things is just to compare the given XID against both subxact XIDs and
-	 * top-level XIDs.	If the snapshot overflowed, we have to use pg_subtrans
-	 * to convert a subxact XID to its parent XID, but then we need only look
-	 * at top-level XIDs not subxacts.
+	 * Snapshot information is stored slightly differently in snapshots
+	 * taken during recovery.
 	 */
-	if (snapshot->subxcnt >= 0)
+	if (!snapshot->takenDuringRecovery)
 	{
-		/* full data, so search subxip */
-		int32		j;
+		/*
+		 * If the snapshot contains full subxact data, the fastest way to check
+		 * things is just to compare the given XID against both subxact XIDs and
+		 * top-level XIDs.	If the snapshot overflowed, we have to use pg_subtrans
+		 * to convert a subxact XID to its parent XID, but then we need only look
+		 * at top-level XIDs not subxacts.
+		 */
+		if (!snapshot->suboverflowed)
+		{
+			/* full data, so search subxip */
+			int32		j;
 
-		for (j = 0; j < snapshot->subxcnt; j++)
+			for (j = 0; j < snapshot->subxcnt; j++)
+			{
+				if (TransactionIdEquals(xid, snapshot->subxip[j]))
+					return true;
+			}
+
+			/* not there, fall through to search xip[] */
+		}
+		else
 		{
-			if (TransactionIdEquals(xid, snapshot->subxip[j]))
-				return true;
+			/* overflowed, so convert xid to top-level */
+			xid = SubTransGetTopmostTransaction(xid);
+
+			/*
+			 * If xid was indeed a subxact, we might now have an xid < xmin, so
+			 * recheck to avoid an array scan.	No point in rechecking xmax.
+			 */
+			if (TransactionIdPrecedes(xid, snapshot->xmin))
+				return false;
 		}
 
-		/* not there, fall through to search xip[] */
+		for (i = 0; i < snapshot->xcnt; i++)
+		{
+			if (TransactionIdEquals(xid, snapshot->xip[i]))
+				return true;
+		}
 	}
 	else
 	{
-		/* overflowed, so convert xid to top-level */
-		xid = SubTransGetTopmostTransaction(xid);
+		int32		j;
 
 		/*
-		 * If xid was indeed a subxact, we might now have an xid < xmin, so
-		 * recheck to avoid an array scan.	No point in rechecking xmax.
+		 * In recovery we store all xids in the subxact array because it
+		 * is by far the bigger array, and we mostly don't know which xids
+		 * are top-level and which are subxacts. The xip array is empty.
+		 *
+		 * We start by searching subtrans, if we overflowed.
 		 */
-		if (TransactionIdPrecedes(xid, snapshot->xmin))
-			return false;
-	}
+		if (snapshot->suboverflowed)
+		{
+			/* overflowed, so convert xid to top-level */
+			xid = SubTransGetTopmostTransaction(xid);
 
-	for (i = 0; i < snapshot->xcnt; i++)
-	{
-		if (TransactionIdEquals(xid, snapshot->xip[i]))
-			return true;
+			/*
+			 * If xid was indeed a subxact, we might now have an xid < xmin, so
+			 * recheck to avoid an array scan.	No point in rechecking xmax.
+			 */
+			if (TransactionIdPrecedes(xid, snapshot->xmin))
+				return false;
+		}
+
+		/*
+		 * We now have either a top-level xid higher than xmin or an
+		 * indeterminate xid. We don't know whether it's top level or subxact
+		 * but it doesn't matter. If it's present, the xid is visible.
+		 */
+		for (j = 0; j < snapshot->subxcnt; j++)
+		{
+			if (TransactionIdEquals(xid, snapshot->subxip[j]))
+				return true;
+		}
 	}
 
 	return false;
author	Simon Riggs <simon@2ndQuadrant.com>	2009-12-19 01:32:45 +0000
committer	Simon Riggs <simon@2ndQuadrant.com>	2009-12-19 01:32:45 +0000
commit	efc16ea520679d713d98a2c7bf1453c4ff7b91ec (patch)
tree	6a39d2af0704a36281dc7df3ec10823eb3e6de75 /src/backend/utils/time/tqual.c
parent	78a09145e0f8322e625bbc7d69fcb865ce4f3034 (diff)
download	postgresql-efc16ea520679d713d98a2c7bf1453c4ff7b91ec.tar.gz postgresql-efc16ea520679d713d98a2c7bf1453c4ff7b91ec.zip