Fix performance problem when building a lossy tidbitmap.

As pointed out by Sergey Koposov, repeated invocations of tbm_lossify can make building a large tidbitmap into an O(N^2) operation. To fix, make sure we remove more than the minimum amount of information per call, and add a fallback path to behave sanely if we're unable to fit the bitmap within the requested amount of memory. This has been wrong since the tidbitmap code was written, so back-patch to all supported branches.
author: Tom Lane <tgl@sss.pgh.pa.us> 2011-08-20 14:51:02 -0400
committer: Tom Lane <tgl@sss.pgh.pa.us> 2011-08-20 14:51:02 -0400
commit: 08e1eedf247e04a9652d997a74ceb46d889124ba (patch)
tree: 7a09ecadfc7c18d035a0b13c5a02bad22167bc7c /src
parent: ee639d277787a75183d3763728f02da0d0a6ae52 (diff)
download: postgresql-08e1eedf247e04a9652d997a74ceb46d889124ba.tar.gz
postgresql-08e1eedf247e04a9652d997a74ceb46d889124ba.zip
1 files changed, 19 insertions, 3 deletions
diff --git a/src/backend/nodes/tidbitmap.c b/src/backend/nodes/tidbitmap.c
index 33f94c03b2a..6f806fda851 100644
--- a/src/backend/nodes/tidbitmap.c
+++ b/src/backend/nodes/tidbitmap.c
@@ -953,8 +953,11 @@ tbm_lossify(TIDBitmap *tbm)
 	/*
 	 * XXX Really stupid implementation: this just lossifies pages in
 	 * essentially random order.  We should be paying some attention to the
-	 * number of bits set in each page, instead.  Also it might be a good idea
-	 * to lossify more than the minimum number of pages during each call.
+	 * number of bits set in each page, instead.
+	 *
+	 * Since we are called as soon as nentries exceeds maxentries, we should
+	 * push nentries down to significantly less than maxentries, or else we'll
+	 * just end up doing this again very soon.  We shoot for maxentries/2.
 	 */
 	Assert(!tbm->iterating);
 	Assert(tbm->status == TBM_HASH);
@@ -975,7 +978,7 @@ tbm_lossify(TIDBitmap *tbm)
 		/* This does the dirty work ... */
 		tbm_mark_page_lossy(tbm, page->blockno);
 
-		if (tbm->nentries <= tbm->maxentries)
+		if (tbm->nentries <= tbm->maxentries / 2)
 		{
 			/* we have done enough */
 			hash_seq_term(&status);
@@ -988,6 +991,19 @@ tbm_lossify(TIDBitmap *tbm)
 		 * not care whether we visit lossy chunks or not.
 		 */
 	}
+
+	/*
+	 * With a big bitmap and small work_mem, it's possible that we cannot
+	 * get under maxentries.  Again, if that happens, we'd end up uselessly
+	 * calling tbm_lossify over and over.  To prevent this from becoming a
+	 * performance sink, force maxentries up to at least double the current
+	 * number of entries.  (In essence, we're admitting inability to fit
+	 * within work_mem when we do this.)  Note that this test will not fire
+	 * if we broke out of the loop early; and if we didn't, the current
+	 * number of entries is simply not reducible any further.
+	 */
+	if (tbm->nentries > tbm->maxentries / 2)
+		tbm->maxentries = Min(tbm->nentries, (INT_MAX - 1) / 2) * 2;
 }
 
 /*
author	Tom Lane <tgl@sss.pgh.pa.us>	2011-08-20 14:51:02 -0400
committer	Tom Lane <tgl@sss.pgh.pa.us>	2011-08-20 14:51:02 -0400
commit	08e1eedf247e04a9652d997a74ceb46d889124ba (patch)
tree	7a09ecadfc7c18d035a0b13c5a02bad22167bc7c /src
parent	ee639d277787a75183d3763728f02da0d0a6ae52 (diff)
download	postgresql-08e1eedf247e04a9652d997a74ceb46d889124ba.tar.gz postgresql-08e1eedf247e04a9652d997a74ceb46d889124ba.zip