Buffering GiST index build algorithm.

When building a GiST index that doesn't fit in cache, buffers are attached to some internal nodes in the index. This speeds up the build by avoiding random I/O that would otherwise be needed to traverse all the way down the tree to the find right leaf page for tuple. Alexander Korotkov
author: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2011-09-08 17:51:23 +0300
committer: Heikki Linnakangas <heikki.linnakangas@iki.fi> 2011-09-08 17:51:23 +0300
commit: 5edb24a8983e4a103e26153853d91141f818227c (patch)
tree: 9e3102de6e2149b0d3678b403c91955e97f3bdc8 /doc/src
parent: 09b68c70af855a0a69cede14da70968ddd97ba05 (diff)
download: postgresql-5edb24a8983e4a103e26153853d91141f818227c.tar.gz
postgresql-5edb24a8983e4a103e26153853d91141f818227c.zip
2 files changed, 54 insertions, 0 deletions
diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml
index 78171cfa3fd..1b6fa1a8817 100644
--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@@ -642,6 +642,40 @@ my_distance(PG_FUNCTION_ARGS)
 
   </variablelist>
 
+ <sect2 id="gist-buffering-build">
+  <title>GiST buffering build</title>
+  <para>
+   Building large GiST indexes by simply inserting all the tuples tends to be
+   slow, because if the index tuples are scattered across the index and the
+   index is large enough to not fit in cache, the insertions need to perform
+   a lot of random I/O. PostgreSQL from version 9.2 supports a more efficient
+   method to build GiST indexes based on buffering, which can dramatically
+   reduce number of random I/O needed for non-ordered data sets. For
+   well-ordered datasets the benefit is smaller or non-existent, because
+   only a small number of pages receive new tuples at a time, and those pages
+   fit in cache even if the index as whole does not.
+  </para>
+
+  <para>
+   However, buffering index build needs to call the <function>penalty</>
+   function more often, which consumes some extra CPU resources. Also, the
+   buffers used in the buffering build need temporary disk space, up to
+   the size of the resulting index. Buffering can also infuence the quality
+   of the produced index, in both positive and negative directions. That
+   influence depends on various factors, like the distribution of the input
+   data and operator class implementation.
+  </para>
+
+  <para>
+   By default, the index build switches to the buffering method when the
+   index size reaches <xref linkend="guc-effective-cache-size">. It can
+   be manually turned on or off by the <literal>BUFFERING</literal> parameter
+   to the CREATE INDEX clause. The default behavior is good for most cases,
+   but turning buffering off might speed up the build somewhat if the input
+   data is ordered.
+  </para>
+
+ </sect2>
 </sect1>
 
 <sect1 id="gist-examples">
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 1a1e8d60d75..2cfc9f30f16 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -341,6 +341,26 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ <replaceable class="parameter">name</
    </varlistentry>
 
    </variablelist>
+   <para>
+    GiST indexes additionaly accepts parameters:
+   </para>
+
+   <variablelist>
+
+   <varlistentry>
+    <term><literal>BUFFERING</></term>
+    <listitem>
+    <para>
+     Determines whether the buffering build technique described in
+     <xref linkend="gist-buffering-build"> is used to build the index. With
+     <literal>OFF</> it is disabled, with <literal>ON</> it is enabled, and
+     with <literal>AUTO</> it is initially disabled, but turned on
+     on-the-fly once the index size reaches <xref linkend="guc-effective-cache-size">. The default is <literal>AUTO</>.
+    </para>
+    </listitem>
+   </varlistentry>
+
+   </variablelist>
   </refsect2>
 
   <refsect2 id="SQL-CREATEINDEX-CONCURRENTLY">
author	Heikki Linnakangas <heikki.linnakangas@iki.fi>	2011-09-08 17:51:23 +0300
committer	Heikki Linnakangas <heikki.linnakangas@iki.fi>	2011-09-08 17:51:23 +0300
commit	5edb24a8983e4a103e26153853d91141f818227c (patch)
tree	9e3102de6e2149b0d3678b403c91955e97f3bdc8 /doc/src
parent	09b68c70af855a0a69cede14da70968ddd97ba05 (diff)
download	postgresql-5edb24a8983e4a103e26153853d91141f818227c.tar.gz postgresql-5edb24a8983e4a103e26153853d91141f818227c.zip