Change planner to use the current true disk file size as its estimate of

a relation's number of blocks, rather than the possibly-obsolete value in pg_class.relpages. Scale the value in pg_class.reltuples correspondingly to arrive at a hopefully more accurate number of rows. When pg_class contains 0/0, estimate a tuple width from the column datatypes and divide that into current file size to estimate number of rows. This improved methodology allows us to jettison the ancient hacks that put bogus default values into pg_class when a table is first created. Also, per a suggestion from Simon, make VACUUM (but not VACUUM FULL or ANALYZE) adjust the value it puts into pg_class.reltuples to try to represent the mean tuple density instead of the minimal density that actually prevails just after VACUUM. These changes alter the plans selected for certain regression tests, so update the expected files accordingly. (I removed join_1.out because it's not clear if it still applies; we can add back any variant versions as they are shown to be needed.)
author: Tom Lane <tgl@sss.pgh.pa.us> 2004-12-01 19:00:56 +0000
committer: Tom Lane <tgl@sss.pgh.pa.us> 2004-12-01 19:00:56 +0000
commit: 5374d097de4d7321f03df7899816e61502438b32 (patch)
tree: e7dae5fe8181c528568b4a8bb9ca1aa4c7261831 /doc/src
parent: b7bcea64ecca27e67dbc81525a71fcc0da396b36 (diff)
download: postgresql-5374d097de4d7321f03df7899816e61502438b32.tar.gz
postgresql-5374d097de4d7321f03df7899816e61502438b32.zip
4 files changed, 58 insertions, 37 deletions
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index f7335323fb4..92382c620a5 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1,6 +1,6 @@
 <!--
  Documentation of the system catalogs, directed toward PostgreSQL developers
- $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.92 2004/11/15 06:32:13 neilc Exp $
+ $PostgreSQL: pgsql/doc/src/sgml/catalogs.sgml,v 2.93 2004/12/01 19:00:27 tgl Exp $
  -->
 
 <chapter id="catalogs">
@@ -1079,11 +1079,12 @@
       <entry><type>int4</type></entry>
       <entry></entry>
       <entry>
-       Size of the on-disk representation of this table in pages (size
+       Size of the on-disk representation of this table in pages (of size
        <symbol>BLCKSZ</symbol>).
        This is only an estimate used by the planner.
        It is updated by <command>VACUUM</command>,
-       <command>ANALYZE</command>, and <command>CREATE INDEX</command>.
+       <command>ANALYZE</command>, and a few DDL commands
+       such as <command>CREATE INDEX</command>.
       </entry>
      </row>
 
@@ -1095,7 +1096,8 @@
        Number of rows in the table.
        This is only an estimate used by the planner.
        It is updated by <command>VACUUM</command>,
-       <command>ANALYZE</command>, and <command>CREATE INDEX</command>.
+       <command>ANALYZE</command>, and a few DDL commands
+       such as <command>CREATE INDEX</command>.
       </entry>
      </row>
 
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index ce09f7bc978..215519bc270 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/diskusage.sgml,v 1.11 2004/06/21 04:06:03 tgl Exp $
+$PostgreSQL: pgsql/doc/src/sgml/diskusage.sgml,v 1.12 2004/12/01 19:00:27 tgl Exp $
 -->
 
 <chapter id="diskusage">
@@ -7,11 +7,7 @@ $PostgreSQL: pgsql/doc/src/sgml/diskusage.sgml,v 1.11 2004/06/21 04:06:03 tgl Ex
 
  <para>
   This chapter discusses how to monitor the disk usage of a
-  <productname>PostgreSQL</> database system.  In the current
-  release, the database administrator does not have much control over
-  the on-disk storage layout, so this chapter is mostly informative
-  and can give you some ideas how to manage the disk usage with
-  operating system tools.
+  <productname>PostgreSQL</> database system.
  </para>
 
  <sect1 id="disk-usage">
@@ -23,11 +19,12 @@ $PostgreSQL: pgsql/doc/src/sgml/diskusage.sgml,v 1.11 2004/06/21 04:06:03 tgl Ex
 
   <para>
    Each table has a primary heap disk file where most of the data is
-   stored. To store long column values, there is also a
-   <acronym>TOAST</> file associated with the table, named based on the
-   table's OID (actually <literal>pg_class.relfilenode</>), and an index on the
-   <acronym>TOAST</> table. There also may be indexes associated with
-   the base table.
+   stored. If the table has any columns with potentially-wide values,
+   there is also a <acronym>TOAST</> file associated with the table,
+   which is used to store values too wide to fit comfortably in the main
+   table.  There will be one index on the
+   <acronym>TOAST</> table, if present. There may also be indexes associated
+   with the base table.
   </para>
 
   <para>
@@ -45,18 +42,24 @@ SELECT relfilenode, relpages FROM pg_class WHERE relname = 'customer';
        16806 |       60
 (1 row)
 </programlisting>
-   Each page is typically 8 kilobytes. (Remember, <literal>relpages</>
-   is only updated by <command>VACUUM</> and <command>ANALYZE</>.)
+   Each page is typically 8 kilobytes. (Remember, <structfield>relpages</>
+   is only updated by <command>VACUUM</>, <command>ANALYZE</>, and
+   a few DDL commands such as <command>CREATE INDEX</>.)  The
+   <structfield>relfilenode</> value is of interest if you want to examine
+   the table's disk file directly.
   </para>
 
   <para>
    To show the space used by <acronym>TOAST</> tables, use a query
-   like the following, substituting the <literal>relfilenode</literal>
-   number of the heap (determined by the query above):
+   like the following:
 <programlisting>
 SELECT relname, relpages
-    FROM pg_class
-    WHERE relname = 'pg_toast_16806' OR relname = 'pg_toast_16806_index'
+    FROM pg_class,
+         (SELECT reltoastrelid FROM pg_class
+          WHERE relname = 'customer') ss
+    WHERE oid = ss.reltoastrelid
+       OR oid = (SELECT reltoastidxid FROM pg_class
+                 WHERE oid = ss.reltoastrelid)
     ORDER BY relname;
 
        relname        | relpages 
@@ -74,7 +77,7 @@ SELECT c2.relname, c2.relpages
     WHERE c.relname = 'customer'
         AND c.oid = i.indrelid
         AND c2.oid = i.indexrelid
-        ORDER BY c2.relname;
+    ORDER BY c2.relname;
 
        relname        | relpages 
 ----------------------+----------
@@ -113,13 +116,10 @@ SELECT relname, relpages FROM pg_class ORDER BY relpages DESC;
 
   <para>
    The most important disk monitoring task of a database administrator
-   is to make sure the disk doesn't grow full.  A filled data disk may
-   result in subsequent corruption of database indexes, but not of the
-   tables themselves. If the WAL files are on the same disk (as
-   is the case for a default configuration) then a filled disk during
-   database initialization may result in corrupted or incomplete WAL
-   files. This failure condition is detected and the database server
-   will refuse to start up.
+   is to make sure the disk doesn't grow full.  A filled data disk will
+   not result in data corruption, but it may well prevent useful activity
+   from occurring. If the disk holding the WAL files grows full, database
+   server panic and consequent shutdown may occur.
   </para>
 
   <para>
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml
index 6c23adb81b7..687d3228120 100644
--- a/doc/src/sgml/perform.sgml
+++ b/doc/src/sgml/perform.sgml
@@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.47 2004/11/15 06:32:14 neilc Exp $
+$PostgreSQL: pgsql/doc/src/sgml/perform.sgml,v 1.48 2004/12/01 19:00:27 tgl Exp $
 -->
 
  <chapter id="performance-tips">
@@ -389,14 +389,15 @@ SELECT relname, relkind, reltuples, relpages FROM pg_class WHERE relname LIKE 't
   <para>
    For efficiency reasons, <structfield>reltuples</structfield> 
    and <structfield>relpages</structfield> are not updated on-the-fly,
-   and so they usually contain only approximate values (which is good
-   enough for the planner's purposes).  They are initialized with dummy
-   values (presently 1000 and 10 respectively) when a table is created.
-   They are updated by certain commands, presently <command>VACUUM</>,
-   <command>ANALYZE</>, and <command>CREATE INDEX</>.  A stand-alone
+   and so they usually contain somewhat out-of-date values.
+   They are updated by <command>VACUUM</>, <command>ANALYZE</>, and a
+   few DDL commands such as <command>CREATE INDEX</>.  A stand-alone
    <command>ANALYZE</>, that is one not part of <command>VACUUM</>,
    generates an approximate <structfield>reltuples</structfield> value
-   since it does not read every row of the table.
+   since it does not read every row of the table.  The planner
+   will scale the values it finds in <structname>pg_class</structname>
+   to match the current physical table size, thus obtaining a closer
+   approximation.
   </para>
 
   <indexterm>
diff --git a/doc/src/sgml/release.sgml b/doc/src/sgml/release.sgml
index 391f9d06c59..3a597810398 100644
--- a/doc/src/sgml/release.sgml
+++ b/doc/src/sgml/release.sgml
@@ -1,5 +1,5 @@
 <!--
-$PostgreSQL: pgsql/doc/src/sgml/release.sgml,v 1.314 2004/11/27 21:56:04 petere Exp $
+$PostgreSQL: pgsql/doc/src/sgml/release.sgml,v 1.315 2004/12/01 19:00:27 tgl Exp $
 -->
 
 <appendix id="release">
@@ -529,6 +529,24 @@ $PostgreSQL: pgsql/doc/src/sgml/release.sgml,v 1.314 2004/11/27 21:56:04 petere
 
     <listitem>
      <para>
+      Use dynamically-generated table size estimates while planning (Tom)
+     </para>
+     <para>
+      The optimizer now uses a table's current actual size on disk as its
+      estimate of the number of blocks in the table, and it makes an estimate
+      of the number of rows in the table based on the current size on disk.
+      Formerly, the
+      <structname>pg_class</structname>.<structfield>relpages</structfield>
+      and
+      <structname>pg_class</structname>.<structfield>reltuples</structfield>
+      fields were used as-is, but these values might be quite out-of-date,
+      leading to poor choices of plans.  They are now treated only as an
+      indication of the table's density (rows per page).
+     </para>
+    </listitem>
+
+    <listitem>
+     <para>
       Improved index usage with <literal>OR</> clauses (Tom)
      </para>
      <para>
author	Tom Lane <tgl@sss.pgh.pa.us>	2004-12-01 19:00:56 +0000
committer	Tom Lane <tgl@sss.pgh.pa.us>	2004-12-01 19:00:56 +0000
commit	5374d097de4d7321f03df7899816e61502438b32 (patch)
tree	e7dae5fe8181c528568b4a8bb9ca1aa4c7261831 /doc/src
parent	b7bcea64ecca27e67dbc81525a71fcc0da396b36 (diff)
download	postgresql-5374d097de4d7321f03df7899816e61502438b32.tar.gz postgresql-5374d097de4d7321f03df7899816e61502438b32.zip