Database File Layout
A description of the database physical storage layout.
This section provides an overview of the physical format used by
PostgreSQL databases.
All the data needed for a database cluster is stored within the cluster's data
directory, commonly referred to as PGDATA> (after the name of the
environment variable that can be used to define it). A common location for
PGDATA> is /var/lib/pgsql/data>. Multiple clusters,
managed by different postmasters, can exist on the same machine.
The PGDATA> directory contains several subdirectories and control
files, as shown in . In addition to
these required items, the cluster configuration files
postgresql.conf, pg_hba.conf, and
pg_ident.conf are traditionally stored in
PGDATA> (although beginning in
PostgreSQL 8.0 it is possible to keep them
elsewhere).
Contents of PGDATA>
Item
Description
PG_VERSION>
A file containing the major version number of PostgreSQL
base>
Subdirectory containing per-database subdirectories
global>
Subdirectory containing cluster-wide tables, such as
pg_database>
pg_clog>
Subdirectory containing transaction commit status data
pg_subtrans>
Subdirectory containing subtransaction status data
pg_tblspc>
Subdirectory containing symbolic links to tablespaces
pg_xlog>
Subdirectory containing WAL (Write Ahead Log) files
postmaster.opts>
A file recording the command-line options the postmaster was
last started with
postmaster.pid>
A lock file recording the current postmaster PID and shared memory
segment ID (not present after postmaster shutdown)
For each database in the cluster there is a subdirectory within
PGDATA>/base>, named after the database's OID in
pg_database>. This subdirectory is the default location
for the database's files; in particular, its system catalogs are stored
there.
Each table and index is stored in a separate file, named after the table
or index's filenode> number, which can be found in
pg_class>.relfilenode>.
Note that while a table's filenode often matches its OID, this is
not> necessarily the case; some operations, like
TRUNCATE>, REINDEX>, CLUSTER> and some forms
of ALTER TABLE>, can change the filenode while preserving the OID.
Avoid assuming that filenode and table OID are the same.
When a table or index exceeds 1Gb, it is divided into gigabyte-sized
segments>. The first segment's file name is the same as the
filenode; subsequent segments are named filenode.1, filenode.2, etc.
This arrangement avoids problems on platforms that have file size limitations.
The contents of tables and indexes are discussed further in
.
A table that has columns with potentially large entries will have an
associated TOAST> table, which is used for out-of-line storage of
field values that are too large to keep in the table rows proper.
pg_class>.reltoastrelid> links from a table to
its TOAST table, if any.
Tablespaces make the scenario more complicated. Each user-defined tablespace
has a symbolic link inside the PGDATA>/pg_tblspc>
directory, which points to the physical tablespace directory (as specified in
its CREATE TABLESPACE> command). The symbolic link is named after
the tablespace's OID. Inside the physical tablespace directory there is
a subdirectory for each database that has elements in the tablespace, named
after the database's OID. Tables within that directory follow the filenode
naming scheme. The pg_default> tablespace is not accessed through
pg_tblspc>, but corresponds to
PGDATA>/base>. Similarly, the pg_global>
tablespace is not accessed through pg_tblspc>, but corresponds to
PGDATA>/global>.