New cost model for planning, incorporating a penalty for random page

accesses versus sequential accesses, a (very crude) estimate of the effects of caching on random page accesses, and cost to evaluate WHERE- clause expressions. Export critical parameters for this model as SET variables. Also, create SET variables for the planner's enable flags (enable_seqscan, enable_indexscan, etc) so that these can be controlled more conveniently than via PGOPTIONS. Planner now estimates both startup cost (cost before retrieving first tuple) and total cost of each path, so it can optimize queries with LIMIT on a reasonable basis by interpolating between these costs. Same facility is a win for EXISTS(...) subqueries and some other cases. Redesign pathkey representation to achieve a major speedup in planning (I saw as much as 5X on a 10-way join); also minor changes in planner to reduce memory consumption by recycling discarded Path nodes and not constructing unnecessary lists. Minor cleanups to display more-plausible costs in some cases in EXPLAIN output. Initdb forced by change in interface to index cost estimation functions.
author: Tom Lane <tgl@sss.pgh.pa.us> 2000-02-15 20:49:31 +0000
committer: Tom Lane <tgl@sss.pgh.pa.us> 2000-02-15 20:49:31 +0000
commit: b1577a7c78d2d8880b3c0f94689fb75bd074c897 (patch)
tree: c8d8f0500eb2eaa085d921a46a7d0987ba594a4a /doc/src
parent: 553b5da6a1147881dc1df101ecf9bab75f767ccf (diff)
download: postgresql-b1577a7c78d2d8880b3c0f94689fb75bd074c897.tar.gz
postgresql-b1577a7c78d2d8880b3c0f94689fb75bd074c897.zip
4 files changed, 325 insertions, 122 deletions
diff --git a/doc/src/sgml/libpq++.sgml b/doc/src/sgml/libpq++.sgml
index d259206c882..65cc873e280 100644
--- a/doc/src/sgml/libpq++.sgml
+++ b/doc/src/sgml/libpq++.sgml
@@ -164,24 +164,6 @@
 	sets the default mode for the genetic optimizer.
        </para>
       </listitem>
-      <listitem>
-       <para>
-	<envar>PGRPLANS</envar>
-	sets the default mode to allow or disable right-sided plans in the optimizer.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-	<envar>PGCOSTHEAP</envar>
-	sets the default cost for heap searches for the optimizer.
-       </para>
-      </listitem>
-      <listitem>
-       <para>
-	<envar>PGCOSTINDEX</envar>
-	sets the default cost for indexed searches for the optimizer.
-       </para>
-      </listitem>
      </itemizedlist>
     </para>
 	
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 2e02618c62e..506d98002c5 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1900,24 +1900,6 @@ behavior for every Postgres session:
 sets the default mode for the genetic optimizer.
 </para>
 </listitem>
-<listitem>
-<para>
-<envar>PGRPLANS</envar>
-sets the default mode to allow or disable right-sided plans in the optimizer.
-</para>
-</listitem>
-<listitem>
-<para>
-<envar>PGCOSTHEAP</envar>
-sets the default cost for heap searches for the optimizer.
-</para>
-</listitem>
-<listitem>
-<para>
-<envar>PGCOSTINDEX</envar>
-sets the default cost for indexed searches for the optimizer.
-</para>
-</listitem>
 </itemizedlist>
 </para>
 
diff --git a/doc/src/sgml/ref/set.sgml b/doc/src/sgml/ref/set.sgml
index 51177570649..2c32c76ff03 100644
--- a/doc/src/sgml/ref/set.sgml
+++ b/doc/src/sgml/ref/set.sgml
@@ -1,5 +1,5 @@
 <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/ref/set.sgml,v 1.28 1999/07/22 15:09:15 thomas Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/ref/set.sgml,v 1.29 2000/02/15 20:49:07 tgl Exp $
 Postgres documentation
 -->
 
@@ -50,7 +50,8 @@ SET TRANSACTION ISOLATION LEVEL { READ COMMITTED | SERIALIZABLE }
       <term><replaceable class="PARAMETER">value</replaceable></term>
       <listitem>
        <para>
-	New value of parameter.
+	New value of parameter.  The word <term>DEFAULT</term> can be
+	written to specify resetting the parameter to its default value.
        </para>
       </listitem>
      </varlistentry>
@@ -78,20 +79,12 @@ SET TRANSACTION ISOLATION LEVEL { READ COMMITTED | SERIALIZABLE }
 	   </para>
 	  </listitem>
 	 </varlistentry>
-	 
-	 <varlistentry>
-	  <term>DEFAULT</term>
-	  <listitem>
-	   <para>
-	    Sets the multi-byte client encoding to the default value.
-	   </para>
-	  </listitem>
-	 </varlistentry>
 	</variablelist>
        </para>
 
        <para>
-	This is only enabled if multi-byte was specified to configure.
+	This option is only available if MULTIBYTE support was enabled
+	during the configure step of building Postgres.
        </para>
       </listitem>
      </varlistentry>
@@ -176,6 +169,9 @@ SET TRANSACTION ISOLATION LEVEL { READ COMMITTED | SERIALIZABLE }
 	<simplelist>
 	 <member>
 	  Setting the <envar>PGDATESTYLE</envar> environment variable.
+	  If PGDATESTYLE is set in the frontend environment of a client
+	  based on libpq, libpq will automatically set DATESTYLE to the
+	  value of PGDATESTYLE during connection startup.
 	 </member>
 	 <member>
 	  Running postmaster using the option <option>-o -e</option> to set
@@ -218,19 +214,12 @@ SET TRANSACTION ISOLATION LEVEL { READ COMMITTED | SERIALIZABLE }
 	   </para>
 	  </listitem>
 	 </varlistentry>
-	 
-	 <varlistentry>
-	  <term>DEFAULT</term>
-	  <listitem>
-	   <para>
-	    Sets the multi-byte server encoding.
-	   </para>
-	  </listitem>
-	 </varlistentry>
 	</variablelist>
        </para>
+
        <para>
-	This is only enabled if multi-byte was specified to configure.
+	This option is only available if MULTIBYTE support was enabled
+	during the configure step of building Postgres.
        </para>
       </listitem>
      </varlistentry>
@@ -287,15 +276,16 @@ SET TRANSACTION ISOLATION LEVEL { READ COMMITTED | SERIALIZABLE }
 	becomes GMT (on most systems anyway).
        </para>
        <para>
-	A frontend which uses libpq may be initialized by setting the PGTZ
-	environment variable.
-       </para>
-       <para>
 	The second syntax shown above, allows one to set the timezone
 	with a syntax similar to SQL92 <command>SET TIME ZONE</command>.
 	The LOCAL keyword is just an alternate form
 	of DEFAULT for SQL92 compatibility.
        </para>
+       <para>
+        If the PGTZ environment variable is set in the frontend
+	environment of a client based on libpq, libpq will automatically
+	set TIMEZONE to the value of PGTZ during connection startup.
+       </para>
       </listitem>
      </varlistentry>
 
@@ -349,133 +339,381 @@ SET TRANSACTION ISOLATION LEVEL { READ COMMITTED | SERIALIZABLE }
     
     <variablelist>
      <varlistentry>
-      <term>COST_HEAP</term>
+      <term>RANDOM_PAGE_COST</term>
       <listitem>
        <para>
-	Sets the default cost of a heap scan for use by the optimizer.
+        Sets the optimizer's estimate of the cost of a nonsequentially
+	fetched disk page.  This is measured as a multiple of the cost
+	of a sequential page fetch.
 	
 	<variablelist>
 	 <varlistentry>
-	  <term><replaceable class="parameter">float4</replaceable></term>
+	  <term><replaceable class="parameter">float8</replaceable></term>
 	  <listitem>
 	   <para>
-	    Set the cost of a heap scan to the specified floating point value.
+	    Set the cost of a random page access
+	    to the specified floating-point value.
 	   </para>
 	  </listitem>
 	 </varlistentry>
-	 
+	</variablelist>
+       </para>
+      </listitem>
+     </varlistentry>
+    
+     <varlistentry>
+      <term>CPU_TUPLE_COST</term>
+      <listitem>
+       <para>
+        Sets the optimizer's estimate of the cost of processing each
+	tuple during a query.  This is measured as a fraction of the cost
+	of a sequential page fetch.
+	
+	<variablelist>
 	 <varlistentry>
-	  <term>DEFAULT</term>
+	  <term><replaceable class="parameter">float8</replaceable></term>
 	  <listitem>
 	   <para>
-	    Sets the cost of a heap scan to the default value.
+	    Set the cost of per-tuple CPU processing
+	    to the specified floating-point value.
 	   </para>
 	  </listitem>
 	 </varlistentry>
 	</variablelist>
        </para>
+      </listitem>
+     </varlistentry>
+    
+     <varlistentry>
+      <term>CPU_INDEX_TUPLE_COST</term>
+      <listitem>
        <para>
-	The frontend may be initialized by setting the PGCOSTHEAP
-	environment variable.
+        Sets the optimizer's estimate of the cost of processing each
+	index tuple during an index scan.  This is measured as a fraction
+	of the cost of a sequential page fetch.
+	
+	<variablelist>
+	 <varlistentry>
+	  <term><replaceable class="parameter">float8</replaceable></term>
+	  <listitem>
+	   <para>
+	    Set the cost of per-index-tuple CPU processing
+	    to the specified floating-point value.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	</variablelist>
+       </para>
+      </listitem>
+     </varlistentry>
+    
+     <varlistentry>
+      <term>CPU_OPERATOR_COST</term>
+      <listitem>
+       <para>
+        Sets the optimizer's estimate of the cost of processing each
+	operator in a WHERE clause.  This is measured as a fraction
+	of the cost of a sequential page fetch.
+	
+	<variablelist>
+	 <varlistentry>
+	  <term><replaceable class="parameter">float8</replaceable></term>
+	  <listitem>
+	   <para>
+	    Set the cost of per-operator CPU processing
+	    to the specified floating-point value.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	</variablelist>
+       </para>
+      </listitem>
+     </varlistentry>
+    
+     <varlistentry>
+      <term>EFFECTIVE_CACHE_SIZE</term>
+      <listitem>
+       <para>
+        Sets the optimizer's assumption about the effective size of the
+	disk cache (that is, the portion of the kernel's disk cache that
+	will be used for Postgres data files).  This is measured in disk
+	pages, which are normally 8Kb apiece.
+	
+	<variablelist>
+	 <varlistentry>
+	  <term><replaceable class="parameter">float8</replaceable></term>
+	  <listitem>
+	   <para>
+	    Set the assumed cache size
+	    to the specified floating-point value.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	</variablelist>
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
-      <term>COST_INDEX</term>
+      <term>ENABLE_SEQSCAN</term>
       <listitem>
        <para>
-	Sets the default cost of an index scan for use by the optimizer.
+        Enables or disables the planner's use of sequential scan plan types.
+	(It's not possible to suppress sequential scans entirely, but turning
+	this variable OFF discourages the planner from using one if there is
+	any other method available.)
 
-	<variablelist>     
+	<variablelist>
 	 <varlistentry>
-	  <term><replaceable class="parameter">float4</replaceable></term>
+	  <term>ON</term>
 	  <listitem>
 	   <para>
-	    Set the cost of an index scan to the specified floating point value.
+	    enables use of sequential scans (default setting).
 	   </para>
 	  </listitem>
 	 </varlistentry>
-	 
+	
 	 <varlistentry>
-	  <term>DEFAULT</term>
+	  <term>OFF</term>
 	  <listitem>
 	   <para>
-	    Sets the cost of an index scan to the default value.
+	    disables use of sequential scans.
 	   </para>
 	  </listitem>
 	 </varlistentry>
 	</variablelist>
        </para>
+      </listitem>
+     </varlistentry>
 
+     <varlistentry>
+      <term>ENABLE_INDEXSCAN</term>
+      <listitem>
        <para>
-	The frontend may be initialized by setting the PGCOSTINDEX
-	environment variable.
+        Enables or disables the planner's use of index scan plan types.
+
+	<variablelist>
+	 <varlistentry>
+	  <term>ON</term>
+	  <listitem>
+	   <para>
+	    enables use of index scans (default setting).
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	
+	 <varlistentry>
+	  <term>OFF</term>
+	  <listitem>
+	   <para>
+	    disables use of index scans.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	</variablelist>
        </para>
       </listitem>
      </varlistentry>
 
      <varlistentry>
-      <term>GEQO</term>
+      <term>ENABLE_TIDSCAN</term>
       <listitem>
        <para>
-	Sets the threshold for using the genetic optimizer algorithm.
+        Enables or disables the planner's use of TID scan plan types.
 
 	<variablelist>
 	 <varlistentry>
 	  <term>ON</term>
 	  <listitem>
 	   <para>
-	    enables the genetic optimizer algorithm
-	    for statements with 6 or more tables.
+	    enables use of TID scans (default setting).
 	   </para>
 	  </listitem>
 	 </varlistentry>
+	
+	 <varlistentry>
+	  <term>OFF</term>
+	  <listitem>
+	   <para>
+	    disables use of TID scans.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	</variablelist>
+       </para>
+      </listitem>
+     </varlistentry>
 
+     <varlistentry>
+      <term>ENABLE_SORT</term>
+      <listitem>
+       <para>
+        Enables or disables the planner's use of explicit sort steps.
+	(It's not possible to suppress explicit sorts entirely, but turning
+	this variable OFF discourages the planner from using one if there is
+	any other method available.)
+
+	<variablelist>
 	 <varlistentry>
-	  <term>ON=<replaceable class="parameter">#</replaceable></term>
+	  <term>ON</term>
 	  <listitem>
 	   <para>
-	    Takes an integer argument to enable the genetic optimizer algorithm
-	    for statements with <replaceable class="parameter">#</replaceable>
-	    or more tables in the query.
+	    enables use of sorts (default setting).
 	   </para>
 	  </listitem>
 	 </varlistentry>
+	
+	 <varlistentry>
+	  <term>OFF</term>
+	  <listitem>
+	   <para>
+	    disables use of sorts.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	</variablelist>
+       </para>
+      </listitem>
+     </varlistentry>
 
+     <varlistentry>
+      <term>ENABLE_NESTLOOP</term>
+      <listitem>
+       <para>
+        Enables or disables the planner's use of nested-loop join plans.
+	(It's not possible to suppress nested-loop joins entirely, but turning
+	this variable OFF discourages the planner from using one if there is
+	any other method available.)
+
+	<variablelist>
+	 <varlistentry>
+	  <term>ON</term>
+	  <listitem>
+	   <para>
+	    enables use of nested-loop joins (default setting).
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	
 	 <varlistentry>
 	  <term>OFF</term>
 	  <listitem>
 	   <para>
-	    disables the genetic optimizer algorithm.
+	    disables use of nested-loop joins.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	</variablelist>
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>ENABLE_MERGEJOIN</term>
+      <listitem>
+       <para>
+        Enables or disables the planner's use of mergejoin plans.
+
+	<variablelist>
+	 <varlistentry>
+	  <term>ON</term>
+	  <listitem>
+	   <para>
+	    enables use of merge joins (default setting).
 	   </para>
 	  </listitem>
 	 </varlistentry>
+	
 	 <varlistentry>
-	  <term>DEFAULT</term>
+	  <term>OFF</term>
+	  <listitem>
+	   <para>
+	    disables use of merge joins.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	</variablelist>
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term>ENABLE_HASHJOIN</term>
+      <listitem>
+       <para>
+        Enables or disables the planner's use of hashjoin plans.
+
+	<variablelist>
+	 <varlistentry>
+	  <term>ON</term>
 	  <listitem>
 	   <para>
-	    Equivalent to specifying <command>SET GEQO='ON'</command>
+	    enables use of hash joins (default setting).
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	
+	 <varlistentry>
+	  <term>OFF</term>
+	  <listitem>
+	   <para>
+	    disables use of hash joins.
 	   </para>
 	  </listitem>
 	 </varlistentry>
 	</variablelist>
        </para>
+      </listitem>
+     </varlistentry>
 
+     <varlistentry>
+      <term>GEQO</term>
+      <listitem>
        <para>
-	This algorithm is on by default, which used GEQO for
-	statements of eleven or more tables.
-	(See the chapter on GEQO in the Programmer's Guide
-	for more information).
+	Sets the threshold for using the genetic optimizer algorithm.
+
+	<variablelist>
+	 <varlistentry>
+	  <term>ON</term>
+	  <listitem>
+	   <para>
+	    enables the genetic optimizer algorithm
+	    for statements with 11 or more tables.
+	    (This is also the DEFAULT setting.)
+	   </para>
+	  </listitem>
+	 </varlistentry>
+
+	 <varlistentry>
+	  <term>ON=<replaceable class="parameter">#</replaceable></term>
+	  <listitem>
+	   <para>
+	    Takes an integer argument to enable the genetic optimizer algorithm
+	    for statements with <replaceable class="parameter">#</replaceable>
+	    or more tables in the query.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+
+	 <varlistentry>
+	  <term>OFF</term>
+	  <listitem>
+	   <para>
+	    disables the genetic optimizer algorithm.
+	   </para>
+	  </listitem>
+	 </varlistentry>
+	</variablelist>
        </para>
+
        <para>
-	The frontend may be initialized by setting PGGEQO
-	environment variable.
+	See the chapter on GEQO in the Programmer's Guide
+	for more information about query optimization.
        </para>
        <para>
-	It may be useful when joining big relations with
-	small ones. This algorithm is off by default.
-	It's not used by GEQO anyway.
+        If the PGGEQO environment variable is set in the frontend
+	environment of a client based on libpq, libpq will automatically
+	set GEQO to the value of PGGEQO during connection startup.
        </para>
       </listitem>
      </varlistentry>
@@ -484,10 +722,16 @@ SET TRANSACTION ISOLATION LEVEL { READ COMMITTED | SERIALIZABLE }
       <term>KSQO</term>
       <listitem>
        <para>
-	<firstterm>Key Set Query Optimizer</firstterm> forces the query optimizer
-	to optimize repetative OR clauses such as generated by
-	<productname>MicroSoft Access</productname>:
-       
+	<firstterm>Key Set Query Optimizer</firstterm> causes the query
+	planner to convert queries whose WHERE clause contains many
+	OR'ed AND clauses (such as "WHERE (a=1 AND b=2) OR (a=2 AND b=3) ...")
+	into a UNION query.  This method can be faster than the default
+	implementation, but it doesn't necessarily give exactly the same
+	results, since UNION implicitly adds a SELECT DISTINCT clause to
+	eliminate identical output rows.  KSQO is commonly used when
+	working with products like <productname>MicroSoft
+	Access</productname>, which tend to generate queries of this form.
+
 	<variablelist>
 	 <varlistentry>
 	  <term>ON</term>
@@ -502,7 +746,7 @@ SET TRANSACTION ISOLATION LEVEL { READ COMMITTED | SERIALIZABLE }
 	  <term>OFF</term>
 	  <listitem>
 	   <para>
-	    disables this optimization.
+	    disables this optimization (default setting).
 	   </para>
 	  </listitem>
 	 </varlistentry>
@@ -519,13 +763,9 @@ SET TRANSACTION ISOLATION LEVEL { READ COMMITTED | SERIALIZABLE }
        </para>
 
        <para>
-	It may be useful when joining big relations with
-	small ones. This algorithm is off by default.
-	It's not used by GEQO anyway.
-       </para>
-       <para>
-	The frontend may be initialized by setting the PGKSQO
-	environment variable.
+        The KSQO algorithm used to be absolutely essential for queries
+	with many OR'ed AND clauses, but in Postgres 7.0 and later
+	the standard planner handles these queries fairly successfully.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/show.sgml b/doc/src/sgml/ref/show.sgml
index 14b43d82344..39fbde16ae0 100644
--- a/doc/src/sgml/ref/show.sgml
+++ b/doc/src/sgml/ref/show.sgml
@@ -1,5 +1,5 @@
 <!--
-$Header: /cvsroot/pgsql/doc/src/sgml/ref/show.sgml,v 1.7 1999/07/22 15:09:15 thomas Exp $
+$Header: /cvsroot/pgsql/doc/src/sgml/ref/show.sgml,v 1.8 2000/02/15 20:49:07 tgl Exp $
 Postgres documentation
 -->
 
@@ -106,14 +106,14 @@ SHOW VARIABLE
    Description
   </title>
   <para>
-   <command>SHOW</command> will display the current
-   configuration parameters for
-   variable during a session.
+   <command>SHOW</command> will display the current setting of a
+   run-time parameter during a session.
   </para>
   <para>
-   The session can be configured using <command>SET</command> statement,
-   and values
-   can be restored to the defaults using <command>RESET</command> statement.
+   These variables can be set using the <command>SET</command> statement,
+   and
+   can be restored to the default values using the <command>RESET</command>
+   statement.
    Parameters and values are case-insensitive.
   </para>
 
@@ -125,13 +125,12 @@ SHOW VARIABLE
     Notes
    </title>
    <para>
-    The <command>SHOW</command> is a <productname>Postgres</productname>
+    <command>SHOW</command> is a <productname>Postgres</productname>
     language extension.
    </para>
    <para>
     Refer to  <command>SET</command>/<command>RESET</command> 
     to set/reset variable values.
-    See also  <command>SET TIME ZONE</command>.
    </para>
   </refsect2>
  </refsect1>
author	Tom Lane <tgl@sss.pgh.pa.us>	2000-02-15 20:49:31 +0000
committer	Tom Lane <tgl@sss.pgh.pa.us>	2000-02-15 20:49:31 +0000
commit	b1577a7c78d2d8880b3c0f94689fb75bd074c897 (patch)
tree	c8d8f0500eb2eaa085d921a46a7d0987ba594a4a /doc/src
parent	553b5da6a1147881dc1df101ecf9bab75f767ccf (diff)
download	postgresql-b1577a7c78d2d8880b3c0f94689fb75bd074c897.tar.gz postgresql-b1577a7c78d2d8880b3c0f94689fb75bd074c897.zip