aboutsummaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/ref/cluster.sgml67
1 files changed, 30 insertions, 37 deletions
diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 4b641954efa..adba2678632 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -128,18 +128,33 @@ CLUSTER [VERBOSE]
</para>
<para>
- During the cluster operation, a temporary copy of the table is created
- that contains the table data in the index order. Temporary copies of
- each index on the table are created as well. Therefore, you need free
- space on disk at least equal to the sum of the table size and the index
- sizes.
+ <command>CLUSTER</> can re-sort the table using either an indexscan
+ on the specified index, or (if the index is a b-tree) a sequential
+ scan followed by sorting. It will attempt to choose the method that
+ will be faster, based on planner cost parameters and available statistical
+ information.
</para>
<para>
- Because <command>CLUSTER</command> remembers the clustering information,
- one can cluster the tables one wants clustered manually the first time, and
- setup a timed event similar to <command>VACUUM</command> so that the tables
- are periodically reclustered.
+ When an indexscan is used, a temporary copy of the table is created that
+ contains the table data in the index order. Temporary copies of each
+ index on the table are created as well. Therefore, you need free space on
+ disk at least equal to the sum of the table size and the index sizes.
+ </para>
+
+ <para>
+ When a sequential scan and sort is used, a temporary sort file is
+ also created, so that the peak temporary space requirement is as much
+ as double the table size, plus the index sizes. This method is often
+ faster than the indexscan method, but if the disk space requirement is
+ intolerable, you can disable this choice by temporarily setting <xref
+ linkend="guc-enable-sort"> to <literal>off</>.
+ </para>
+
+ <para>
+ It is advisable to set <xref linkend="guc-maintenance-work-mem"> to
+ a reasonably large value (but not more than the amount of RAM you can
+ dedicate to the <command>CLUSTER</> operation) before clustering.
</para>
<para>
@@ -150,35 +165,13 @@ CLUSTER [VERBOSE]
</para>
<para>
- There is another way to cluster data. The
- <command>CLUSTER</command> command reorders the original table by
- scanning it using the index you specify. This can be slow
- on large tables because the rows are fetched from the table
- in index order, and if the table is disordered, the
- entries are on random pages, so there is one disk page
- retrieved for every row moved. (<productname>PostgreSQL</productname> has
- a cache, but the majority of a big table will not fit in the cache.)
- The other way to cluster a table is to use:
-
-<programlisting>
-CREATE TABLE <replaceable class="parameter">newtable</replaceable> AS
- SELECT * FROM <replaceable class="parameter">table</replaceable> ORDER BY <replaceable class="parameter">columnlist</replaceable>;
-</programlisting>
-
- which uses the <productname>PostgreSQL</productname> sorting code
- to produce the desired order;
- this is usually much faster than an index scan for disordered data.
- Then you drop the old table, use
- <command>ALTER TABLE ... RENAME</command>
- to rename <replaceable class="parameter">newtable</replaceable> to the
- old name, and recreate the table's indexes.
- The big disadvantage of this approach is that it does not preserve
- OIDs, constraints, foreign key relationships, granted privileges, and
- other ancillary properties of the table &mdash; all such items must be
- manually recreated. Another disadvantage is that this way requires a sort
- temporary file about the same size as the table itself, so peak disk usage
- is about three times the table size instead of twice the table size.
+ Because <command>CLUSTER</command> remembers which indexes are clustered,
+ one can cluster the tables one wants clustered manually the first time,
+ then set up a periodic maintenance script that executes
+ <command>CLUSTER</> without any parameters, so that the desired tables
+ are periodically reclustered.
</para>
+
</refsect1>
<refsect1>