aboutsummaryrefslogtreecommitdiff
path: root/doc/src/sgml/page.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src/sgml/page.sgml')
-rw-r--r--doc/src/sgml/page.sgml159
1 files changed, 94 insertions, 65 deletions
diff --git a/doc/src/sgml/page.sgml b/doc/src/sgml/page.sgml
index 7551085dc94..d7096a4bbe1 100644
--- a/doc/src/sgml/page.sgml
+++ b/doc/src/sgml/page.sgml
@@ -4,13 +4,17 @@
<abstract>
<para>
-A description of the database file default page format.
+A description of the database file page format.
</para>
</abstract>
<para>
-This section provides an overview of the page format used by <productname>PostgreSQL</productname>
-tables. User-defined access methods need not use this page format.
+This section provides an overview of the page format used by
+<productname>PostgreSQL</productname> tables and indexes. (Index
+access methods need not use this page format. At present, all index
+methods do use this basic format, but the data kept on index metapages
+usually doesn't follow the item layout rules exactly.) TOAST tables
+and sequences are formatted just like a regular table.
</para>
<para>
@@ -18,15 +22,13 @@ In the following explanation, a
<firstterm>byte</firstterm>
is assumed to contain 8 bits. In addition, the term
<firstterm>item</firstterm>
-refers to data that is stored in <productname>PostgreSQL</productname> tables.
+refers to an individual data value that is stored on a page. In a table,
+an item is a tuple (row); in an index, an item is an index entry.
</para>
<para>
-<xref linkend="page-table"> shows how pages in both normal
- <productname>PostgreSQL</productname> tables and
- <productname>PostgreSQL</productname> indexes (e.g., a B-tree index)
-are structured. This structure is also used for toast tables and sequences.
+<xref linkend="page-table"> shows the basic layout of a page.
There are five parts to each page.
</para>
@@ -48,12 +50,13 @@ Item
<row>
<entry>PageHeaderData</entry>
- <entry>20 bytes long. Contains general information about the page to allow to access it.</entry>
+ <entry>20 bytes long. Contains general information about the page, including
+free space pointers.</entry>
</row>
<row>
-<entry>itemPointerData</entry>
-<entry>List of (offset,length) pairs pointing to the actual item.</entry>
+<entry>ItemPointerData</entry>
+<entry>Array of (offset,length) pairs pointing to the actual items.</entry>
</row>
<row>
@@ -62,13 +65,14 @@ Item
</row>
<row>
-<entry>items</entry>
-<entry>The actual items themselves. Different access method have different data here.</entry>
+<entry>Items</entry>
+<entry>The actual items themselves.</entry>
</row>
<row>
<entry>Special Space</entry>
-<entry>Access method specific data. Different method store different data. Unused by normal tables.</entry>
+<entry>Index access method specific data. Different methods store different
+data. Empty in ordinary tables.</entry>
</row>
</tbody>
@@ -78,11 +82,12 @@ Item
<para>
The first 20 bytes of each page consists of a page header
- (PageHeaderData). It's format is detailed in <xref
+ (PageHeaderData). Its format is detailed in <xref
linkend="pageheaderdata-table">. The first two fields deal with WAL
related stuff. This is followed by three 2-byte integer fields
- (<firstterm>lower</firstterm>, <firstterm>upper</firstterm>, and
- <firstterm>special</firstterm>). These represent byte offsets to the start
+ (<structfield>pd_lower</structfield>, <structfield>pd_upper</structfield>,
+ and <structfield>pd_special</structfield>). These represent byte offsets to
+ the start
of unallocated space, to the end of unallocated space, and to the start of
the special space.
@@ -104,7 +109,7 @@ Item
<row>
<entry>pd_lsn</entry>
<entry>XLogRecPtr</entry>
- <entry>6 bytes</entry>
+ <entry>8 bytes</entry>
<entry>LSN: next byte after last byte of xlog</entry>
</row>
<row>
@@ -132,38 +137,51 @@ Item
<entry>Offset to start of special space.</entry>
</row>
<row>
- <entry>pd_opaque</entry>
- <entry>OpaqueData</entry>
+ <entry>pd_pagesize_version</entry>
+ <entry>uint16</entry>
<entry>2 bytes</entry>
- <entry>AM-generic information. Currently just stores the page size.</entry>
+ <entry>Page size and layout version number information.</entry>
</row>
</tbody>
</tgroup>
</table>
+ <para>
+ All the details may be found in src/include/storage/bufpage.h.
+ </para>
+
<para>
Special space is a region at the end of the page that is allocated at page
initialization time and contains information specific to an access method.
- The last 2 bytes of the page header, <firstterm>opaque</firstterm>,
- currently only stores the page size. Page size is stored in each page
- because frames in the buffer pool may be subdivided into equal sized pages
- on a frame by frame basis within a table (is this true? - mvo).
-
+ The last 2 bytes of the page header,
+ <structfield>pd_pagesize_version</structfield>, store both the page size
+ and a version indicator. Beginning with
+ <productname>PostgreSQL</productname> 7.3 the version number is 1; prior
+ releases used version number 0. (The basic page layout and header format
+ has not changed, but the layout of heap tuple headers has.) The page size
+ is basically only present as a cross-check; there is no support for having
+ more than one page size in an installation.
</para>
<para>
Following the page header are item identifiers
- (<firstterm>ItemIdData</firstterm>). New item identifiers are allocated
- from the first four bytes of unallocated space. Because an item
- identifier is never moved until it is freed, its index may be used to
- indicate the location of an item on a page. In fact, every pointer to an
- item (<firstterm>ItemPointer</firstterm>, also know as
- <firstterm>CTID</firstterm>) created by
- <productname>PostgreSQL</productname> consists of a frame number and an
- index of an item identifier. An item identifier contains a byte-offset to
+ (<type>ItemIdData</type>), each requiring four bytes.
+ An item identifier contains a byte-offset to
the start of an item, its length in bytes, and a set of attribute bits
which affect its interpretation.
+ New item identifiers are allocated
+ as needed from the beginning of the unallocated space.
+ The number of item identifiers present can be determined by looking at
+ <structfield>pd_lower</>, which is increased to allocate a new identifier.
+ Because an item
+ identifier is never moved until it is freed, its index may be used on a
+ long-term basis to reference an item, even when the item itself is moved
+ around on the page to compact free space. In fact, every pointer to an
+ item (<type>ItemPointer</type>, also known as
+ <type>CTID</type>) created by
+ <productname>PostgreSQL</productname> consists of a page number and the
+ index of an item identifier.
</para>
@@ -171,8 +189,8 @@ Item
The items themselves are stored in space allocated backwards from the end
of unallocated space. The exact structure varies depending on what the
- table is to contain. Sequences and tables both use a structure named
- <firstterm>HeapTupleHeaderData</firstterm>, describe below.
+ table is to contain. Tables and sequences both use a structure named
+ <type>HeapTupleHeaderData</type>, described below.
</para>
@@ -180,20 +198,33 @@ Item
The final section is the "special section" which may contain anything the
access method wishes to store. Ordinary tables do not use this at all
- (indicated by setting the offset to the pagesize).
+ (indicated by setting <structfield>pd_special</> to equal the pagesize).
</para>
<para>
- All tuples are structured the same way. A header of around 31 bytes
- followed by an optional null bitmask and the data. The header is detailed
- below in <xref linkend="heaptupleheaderdata-table">. The null bitmask is
- only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in the
- <firstterm>t_infomask</firstterm>. If it is present it takes up the space
- between the end of the header and the beginning of the data, as indicated
- by the <firstterm>t_hoff</firstterm> field. In this list of bits, a 1 bit
- indicates not-null, a 0 bit is a null.
+ All table tuples are structured the same way. There is a fixed-size
+ header (occupying 23 bytes on most machines), followed by an optional null
+ bitmap, an optional object ID field, and the user data. The header is
+ detailed
+ in <xref linkend="heaptupleheaderdata-table">. The actual user data
+ (fields of the tuple) begins at the offset indicated by
+ <structfield>t_hoff</>, which must always be a multiple of the MAXALIGN
+ distance for the platform.
+ The null bitmap is
+ only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in
+ <structfield>t_infomask</structfield>. If it is present it begins just after
+ the fixed header and occupies enough bytes to have one bit per data column
+ (that is, <structfield>t_natts</> bits altogether). In this list of bits, a
+ 1 bit indicates not-null, a 0 bit is a null. When the bitmap is not
+ present, all columns are assumed not-null.
+ The object ID is only present if the <firstterm>HEAP_HASOID</firstterm> bit
+ is set in <structfield>t_infomask</structfield>. If present, it appears just
+ before the <structfield>t_hoff</> boundary. Any padding needed to make
+ <structfield>t_hoff</> a MAXALIGN multiple will appear between the null
+ bitmap and the object ID. (This in turn ensures that the object ID is
+ suitably aligned.)
</para>
@@ -211,34 +242,34 @@ Item
</thead>
<tbody>
<row>
- <entry>t_oid</entry>
- <entry>Oid</entry>
+ <entry>t_xmin</entry>
+ <entry>TransactionId</entry>
<entry>4 bytes</entry>
- <entry>OID of this tuple</entry>
+ <entry>insert XID stamp</entry>
</row>
<row>
<entry>t_cmin</entry>
<entry>CommandId</entry>
<entry>4 bytes</entry>
- <entry>insert CID stamp</entry>
+ <entry>insert CID stamp (overlays with t_xmax)</entry>
</row>
<row>
- <entry>t_cmax</entry>
- <entry>CommandId</entry>
+ <entry>t_xmax</entry>
+ <entry>TransactionId</entry>
<entry>4 bytes</entry>
- <entry>delete CID stamp</entry>
+ <entry>delete XID stamp</entry>
</row>
<row>
- <entry>t_xmin</entry>
- <entry>TransactionId</entry>
+ <entry>t_cmax</entry>
+ <entry>CommandId</entry>
<entry>4 bytes</entry>
- <entry>insert XID stamp</entry>
+ <entry>delete CID stamp (overlays with t_xvac)</entry>
</row>
<row>
- <entry>t_xmax</entry>
+ <entry>t_xvac</entry>
<entry>TransactionId</entry>
<entry>4 bytes</entry>
- <entry>delete XID stamp</entry>
+ <entry>XID for VACUUM operation moving tuple</entry>
</row>
<row>
<entry>t_ctid</entry>
@@ -256,30 +287,28 @@ Item
<entry>t_infomask</entry>
<entry>uint16</entry>
<entry>2 bytes</entry>
- <entry>Various flags</entry>
+ <entry>various flags</entry>
</row>
<row>
<entry>t_hoff</entry>
<entry>uint8</entry>
<entry>1 byte</entry>
- <entry>length of tuple header. Also offset of data.</entry>
+ <entry>offset to user data</entry>
</row>
</tbody>
</tgroup>
</table>
<para>
-
- All the details may be found in src/include/storage/bufpage.h.
-
+ All the details may be found in src/include/access/htup.h.
</para>
<para>
Interpreting the actual data can only be done with information obtained
from other tables, mostly <firstterm>pg_attribute</firstterm>. The
- particular fields are <firstterm>attlen</firstterm> and
- <firstterm>attalign</firstterm>. There is no way to directly get a
+ particular fields are <structfield>attlen</structfield> and
+ <structfield>attalign</structfield>. There is no way to directly get a
particular attribute, except when there are only fixed width fields and no
NULLs. All this trickery is wrapped up in the functions
<firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm>
@@ -293,7 +322,7 @@ Item
the next. Then make sure you have the right alignment. If the field is a
fixed width field, then all the bytes are simply placed. If it's a
variable length field (attlen == -1) then it's a bit more complicated,
- using the variable length structure <firstterm>varattrib</firstterm>.
+ using the variable length structure <type>varattrib</type>.
Depending on the flags, the data may be either inline, compressed or in
another table (TOAST).