diff options
author | Heikki Linnakangas <heikki.linnakangas@iki.fi> | 2011-09-08 17:51:23 +0300 |
---|---|---|
committer | Heikki Linnakangas <heikki.linnakangas@iki.fi> | 2011-09-08 17:51:23 +0300 |
commit | 5edb24a8983e4a103e26153853d91141f818227c (patch) | |
tree | 9e3102de6e2149b0d3678b403c91955e97f3bdc8 /doc/src | |
parent | 09b68c70af855a0a69cede14da70968ddd97ba05 (diff) | |
download | postgresql-5edb24a8983e4a103e26153853d91141f818227c.tar.gz postgresql-5edb24a8983e4a103e26153853d91141f818227c.zip |
Buffering GiST index build algorithm.
When building a GiST index that doesn't fit in cache, buffers are attached
to some internal nodes in the index. This speeds up the build by avoiding
random I/O that would otherwise be needed to traverse all the way down the
tree to the find right leaf page for tuple.
Alexander Korotkov
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/gist.sgml | 34 | ||||
-rw-r--r-- | doc/src/sgml/ref/create_index.sgml | 20 |
2 files changed, 54 insertions, 0 deletions
diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml index 78171cfa3fd..1b6fa1a8817 100644 --- a/doc/src/sgml/gist.sgml +++ b/doc/src/sgml/gist.sgml @@ -642,6 +642,40 @@ my_distance(PG_FUNCTION_ARGS) </variablelist> + <sect2 id="gist-buffering-build"> + <title>GiST buffering build</title> + <para> + Building large GiST indexes by simply inserting all the tuples tends to be + slow, because if the index tuples are scattered across the index and the + index is large enough to not fit in cache, the insertions need to perform + a lot of random I/O. PostgreSQL from version 9.2 supports a more efficient + method to build GiST indexes based on buffering, which can dramatically + reduce number of random I/O needed for non-ordered data sets. For + well-ordered datasets the benefit is smaller or non-existent, because + only a small number of pages receive new tuples at a time, and those pages + fit in cache even if the index as whole does not. + </para> + + <para> + However, buffering index build needs to call the <function>penalty</> + function more often, which consumes some extra CPU resources. Also, the + buffers used in the buffering build need temporary disk space, up to + the size of the resulting index. Buffering can also infuence the quality + of the produced index, in both positive and negative directions. That + influence depends on various factors, like the distribution of the input + data and operator class implementation. + </para> + + <para> + By default, the index build switches to the buffering method when the + index size reaches <xref linkend="guc-effective-cache-size">. It can + be manually turned on or off by the <literal>BUFFERING</literal> parameter + to the CREATE INDEX clause. The default behavior is good for most cases, + but turning buffering off might speed up the build somewhat if the input + data is ordered. + </para> + + </sect2> </sect1> <sect1 id="gist-examples"> diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 1a1e8d60d75..2cfc9f30f16 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -341,6 +341,26 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ <replaceable class="parameter">name</ </varlistentry> </variablelist> + <para> + GiST indexes additionaly accepts parameters: + </para> + + <variablelist> + + <varlistentry> + <term><literal>BUFFERING</></term> + <listitem> + <para> + Determines whether the buffering build technique described in + <xref linkend="gist-buffering-build"> is used to build the index. With + <literal>OFF</> it is disabled, with <literal>ON</> it is enabled, and + with <literal>AUTO</> it is initially disabled, but turned on + on-the-fly once the index size reaches <xref linkend="guc-effective-cache-size">. The default is <literal>AUTO</>. + </para> + </listitem> + </varlistentry> + + </variablelist> </refsect2> <refsect2 id="SQL-CREATEINDEX-CONCURRENTLY"> |