Truncate pg_multixact/'s contents during crash recovery

Commit 9dc842f08 of 8.2 era prevented MultiXact truncation during crash recovery, because there was no guarantee that enough state had been setup, and because it wasn't deemed to be a good idea to remove data during crash recovery anyway. Since then, due to Hot-Standby, streaming replication and PITR, the amount of time a cluster can spend doing crash recovery has increased significantly, to the point that a cluster may even never come out of it. This has made not truncating the content of pg_multixact/ not defensible anymore. To fix, take care to setup enough state for multixact truncation before crash recovery starts (easy since checkpoints contain the required information), and move the current end-of-recovery actions to a new TrimMultiXact() function, analogous to TrimCLOG(). At some later point, this should probably done similarly to the way clog.c is doing it, which is to just WAL log truncations, but we can't do that for the back branches. Back-patch to 9.0. 8.4 also has the problem, but since there's no hot standby there, it's much less pressing. In 9.2 and earlier, this patch is simpler than in newer branches, because multixact access during recovery isn't required. Add appropriate checks to make sure that's not happening. Andres Freund
author: Alvaro Herrera <alvherre@alvh.no-ip.org> 2013-11-29 11:26:41 -0300
committer: Alvaro Herrera <alvherre@alvh.no-ip.org> 2013-11-29 21:47:15 -0300
commit: 1df0122daa6510eed4146033379a5055f66f5a8e (patch)
tree: 8784e3d301ca6cb27cf3dfc51e8285a4b1dde924 /src/backend/access/transam/multixact.c
parent: f54106f77e6d71cbb3fa0924095e5142341fde2b (diff)
download: postgresql-1df0122daa6510eed4146033379a5055f66f5a8e.tar.gz
postgresql-1df0122daa6510eed4146033379a5055f66f5a8e.zip
1 files changed, 42 insertions, 11 deletions
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index 90fa030caf2..ca702eec817 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -1768,11 +1768,8 @@ MaybeExtendOffsetSlru(void)
  *
  * StartupXLOG has already established nextMXact/nextOffset by calling
  * MultiXactSetNextMXact and/or MultiXactAdvanceNextMXact, and the oldestMulti
- * info from pg_control and/or MultiXactAdvanceOldest.	Note that we may
- * already have replayed WAL data into the SLRU files.
- *
- * We don't need any locks here, really; the SLRU locks are taken
- * only because slru.c expects to be called with locks held.
+ * info from pg_control and/or MultiXactAdvanceOldest, but we haven't yet
+ * replayed WAL.
  */
 void
 StartupMultiXact(void)
@@ -1780,12 +1777,40 @@ StartupMultiXact(void)
 	MultiXactId multi = MultiXactState->nextMXact;
 	MultiXactOffset offset = MultiXactState->nextOffset;
 	int			pageno;
+
+	/*
+	 * Initialize offset's idea of the latest page number.
+	 */
+	pageno = MultiXactIdToOffsetPage(multi);
+	MultiXactOffsetCtl->shared->latest_page_number = pageno;
+
+	/*
+	 * Initialize member's idea of the latest page number.
+	 */
+	pageno = MXOffsetToMemberPage(offset);
+	MultiXactMemberCtl->shared->latest_page_number = pageno;
+}
+
+/*
+ * This must be called ONCE at the end of startup/recovery.
+ *
+ * We don't need any locks here, really; the SLRU locks are taken only because
+ * slru.c expects to be called with locks held.
+ */
+void
+TrimMultiXact(void)
+{
+	MultiXactId multi = MultiXactState->nextMXact;
+	MultiXactOffset offset = MultiXactState->nextOffset;
+	int			pageno;
 	int			entryno;
 	int			flagsoff;
 
 	/*
 	 * During a binary upgrade, make sure that the offsets SLRU is large
-	 * enough to contain the next value that would be created.
+	 * enough to contain the next value that would be created. It's fine to do
+	 * this here and not in StartupMultiXact() since binary upgrades should
+	 * never need crash recovery.
 	 */
 	if (IsBinaryUpgrade)
 		MaybeExtendOffsetSlru();
@@ -1794,7 +1819,7 @@ StartupMultiXact(void)
 	LWLockAcquire(MultiXactOffsetControlLock, LW_EXCLUSIVE);
 
 	/*
-	 * Initialize our idea of the latest page number.
+	 * (Re-)Initialize our idea of the latest page number.
 	 */
 	pageno = MultiXactIdToOffsetPage(multi);
 	MultiXactOffsetCtl->shared->latest_page_number = pageno;
@@ -1824,7 +1849,7 @@ StartupMultiXact(void)
 	LWLockAcquire(MultiXactMemberControlLock, LW_EXCLUSIVE);
 
 	/*
-	 * Initialize our idea of the latest page number.
+	 * (Re-)Initialize our idea of the latest page number.
 	 */
 	pageno = MXOffsetToMemberPage(offset);
 	MultiXactMemberCtl->shared->latest_page_number = pageno;
@@ -2258,9 +2283,15 @@ SlruScanDirCbFindEarliest(SlruCtl ctl, char *filename, int segpage, void *data)
  * Remove all MultiXactOffset and MultiXactMember segments before the oldest
  * ones still of interest.
  *
- * This is called by vacuum after it has successfully advanced a database's
- * datminmxid value; the cutoff value we're passed is the minimum of all
- * databases' datminmxid values.
+ * On a primary, this is called by vacuum after it has successfully advanced a
+ * database's datminmxid value; the cutoff value we're passed is the minimum of
+ * all databases' datminmxid values.
+ *
+ * During crash recovery, it's called from CreateRestartPoint() instead.  We
+ * rely on the fact that xlog_redo() will already have called
+ * MultiXactAdvanceOldest().  Our latest_page_number will already have been
+ * initialized by StartupMultiXact() and kept up to date as new pages are
+ * zeroed.
  */
 void
 TruncateMultiXact(MultiXactId oldestMXact)
author	Alvaro Herrera <alvherre@alvh.no-ip.org>	2013-11-29 11:26:41 -0300
committer	Alvaro Herrera <alvherre@alvh.no-ip.org>	2013-11-29 21:47:15 -0300
commit	1df0122daa6510eed4146033379a5055f66f5a8e (patch)
tree	8784e3d301ca6cb27cf3dfc51e8285a4b1dde924 /src/backend/access/transam/multixact.c
parent	f54106f77e6d71cbb3fa0924095e5142341fde2b (diff)
download	postgresql-1df0122daa6510eed4146033379a5055f66f5a8e.tar.gz postgresql-1df0122daa6510eed4146033379a5055f66f5a8e.zip