diff options
Diffstat (limited to 'doc/src/sgml/page.sgml')
-rw-r--r-- | doc/src/sgml/page.sgml | 159 |
1 files changed, 94 insertions, 65 deletions
diff --git a/doc/src/sgml/page.sgml b/doc/src/sgml/page.sgml index 7551085dc94..d7096a4bbe1 100644 --- a/doc/src/sgml/page.sgml +++ b/doc/src/sgml/page.sgml @@ -4,13 +4,17 @@ <abstract> <para> -A description of the database file default page format. +A description of the database file page format. </para> </abstract> <para> -This section provides an overview of the page format used by <productname>PostgreSQL</productname> -tables. User-defined access methods need not use this page format. +This section provides an overview of the page format used by +<productname>PostgreSQL</productname> tables and indexes. (Index +access methods need not use this page format. At present, all index +methods do use this basic format, but the data kept on index metapages +usually doesn't follow the item layout rules exactly.) TOAST tables +and sequences are formatted just like a regular table. </para> <para> @@ -18,15 +22,13 @@ In the following explanation, a <firstterm>byte</firstterm> is assumed to contain 8 bits. In addition, the term <firstterm>item</firstterm> -refers to data that is stored in <productname>PostgreSQL</productname> tables. +refers to an individual data value that is stored on a page. In a table, +an item is a tuple (row); in an index, an item is an index entry. </para> <para> -<xref linkend="page-table"> shows how pages in both normal - <productname>PostgreSQL</productname> tables and - <productname>PostgreSQL</productname> indexes (e.g., a B-tree index) -are structured. This structure is also used for toast tables and sequences. +<xref linkend="page-table"> shows the basic layout of a page. There are five parts to each page. </para> @@ -48,12 +50,13 @@ Item <row> <entry>PageHeaderData</entry> - <entry>20 bytes long. Contains general information about the page to allow to access it.</entry> + <entry>20 bytes long. Contains general information about the page, including +free space pointers.</entry> </row> <row> -<entry>itemPointerData</entry> -<entry>List of (offset,length) pairs pointing to the actual item.</entry> +<entry>ItemPointerData</entry> +<entry>Array of (offset,length) pairs pointing to the actual items.</entry> </row> <row> @@ -62,13 +65,14 @@ Item </row> <row> -<entry>items</entry> -<entry>The actual items themselves. Different access method have different data here.</entry> +<entry>Items</entry> +<entry>The actual items themselves.</entry> </row> <row> <entry>Special Space</entry> -<entry>Access method specific data. Different method store different data. Unused by normal tables.</entry> +<entry>Index access method specific data. Different methods store different +data. Empty in ordinary tables.</entry> </row> </tbody> @@ -78,11 +82,12 @@ Item <para> The first 20 bytes of each page consists of a page header - (PageHeaderData). It's format is detailed in <xref + (PageHeaderData). Its format is detailed in <xref linkend="pageheaderdata-table">. The first two fields deal with WAL related stuff. This is followed by three 2-byte integer fields - (<firstterm>lower</firstterm>, <firstterm>upper</firstterm>, and - <firstterm>special</firstterm>). These represent byte offsets to the start + (<structfield>pd_lower</structfield>, <structfield>pd_upper</structfield>, + and <structfield>pd_special</structfield>). These represent byte offsets to + the start of unallocated space, to the end of unallocated space, and to the start of the special space. @@ -104,7 +109,7 @@ Item <row> <entry>pd_lsn</entry> <entry>XLogRecPtr</entry> - <entry>6 bytes</entry> + <entry>8 bytes</entry> <entry>LSN: next byte after last byte of xlog</entry> </row> <row> @@ -132,38 +137,51 @@ Item <entry>Offset to start of special space.</entry> </row> <row> - <entry>pd_opaque</entry> - <entry>OpaqueData</entry> + <entry>pd_pagesize_version</entry> + <entry>uint16</entry> <entry>2 bytes</entry> - <entry>AM-generic information. Currently just stores the page size.</entry> + <entry>Page size and layout version number information.</entry> </row> </tbody> </tgroup> </table> + <para> + All the details may be found in src/include/storage/bufpage.h. + </para> + <para> Special space is a region at the end of the page that is allocated at page initialization time and contains information specific to an access method. - The last 2 bytes of the page header, <firstterm>opaque</firstterm>, - currently only stores the page size. Page size is stored in each page - because frames in the buffer pool may be subdivided into equal sized pages - on a frame by frame basis within a table (is this true? - mvo). - + The last 2 bytes of the page header, + <structfield>pd_pagesize_version</structfield>, store both the page size + and a version indicator. Beginning with + <productname>PostgreSQL</productname> 7.3 the version number is 1; prior + releases used version number 0. (The basic page layout and header format + has not changed, but the layout of heap tuple headers has.) The page size + is basically only present as a cross-check; there is no support for having + more than one page size in an installation. </para> <para> Following the page header are item identifiers - (<firstterm>ItemIdData</firstterm>). New item identifiers are allocated - from the first four bytes of unallocated space. Because an item - identifier is never moved until it is freed, its index may be used to - indicate the location of an item on a page. In fact, every pointer to an - item (<firstterm>ItemPointer</firstterm>, also know as - <firstterm>CTID</firstterm>) created by - <productname>PostgreSQL</productname> consists of a frame number and an - index of an item identifier. An item identifier contains a byte-offset to + (<type>ItemIdData</type>), each requiring four bytes. + An item identifier contains a byte-offset to the start of an item, its length in bytes, and a set of attribute bits which affect its interpretation. + New item identifiers are allocated + as needed from the beginning of the unallocated space. + The number of item identifiers present can be determined by looking at + <structfield>pd_lower</>, which is increased to allocate a new identifier. + Because an item + identifier is never moved until it is freed, its index may be used on a + long-term basis to reference an item, even when the item itself is moved + around on the page to compact free space. In fact, every pointer to an + item (<type>ItemPointer</type>, also known as + <type>CTID</type>) created by + <productname>PostgreSQL</productname> consists of a page number and the + index of an item identifier. </para> @@ -171,8 +189,8 @@ Item The items themselves are stored in space allocated backwards from the end of unallocated space. The exact structure varies depending on what the - table is to contain. Sequences and tables both use a structure named - <firstterm>HeapTupleHeaderData</firstterm>, describe below. + table is to contain. Tables and sequences both use a structure named + <type>HeapTupleHeaderData</type>, described below. </para> @@ -180,20 +198,33 @@ Item The final section is the "special section" which may contain anything the access method wishes to store. Ordinary tables do not use this at all - (indicated by setting the offset to the pagesize). + (indicated by setting <structfield>pd_special</> to equal the pagesize). </para> <para> - All tuples are structured the same way. A header of around 31 bytes - followed by an optional null bitmask and the data. The header is detailed - below in <xref linkend="heaptupleheaderdata-table">. The null bitmask is - only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in the - <firstterm>t_infomask</firstterm>. If it is present it takes up the space - between the end of the header and the beginning of the data, as indicated - by the <firstterm>t_hoff</firstterm> field. In this list of bits, a 1 bit - indicates not-null, a 0 bit is a null. + All table tuples are structured the same way. There is a fixed-size + header (occupying 23 bytes on most machines), followed by an optional null + bitmap, an optional object ID field, and the user data. The header is + detailed + in <xref linkend="heaptupleheaderdata-table">. The actual user data + (fields of the tuple) begins at the offset indicated by + <structfield>t_hoff</>, which must always be a multiple of the MAXALIGN + distance for the platform. + The null bitmap is + only present if the <firstterm>HEAP_HASNULL</firstterm> bit is set in + <structfield>t_infomask</structfield>. If it is present it begins just after + the fixed header and occupies enough bytes to have one bit per data column + (that is, <structfield>t_natts</> bits altogether). In this list of bits, a + 1 bit indicates not-null, a 0 bit is a null. When the bitmap is not + present, all columns are assumed not-null. + The object ID is only present if the <firstterm>HEAP_HASOID</firstterm> bit + is set in <structfield>t_infomask</structfield>. If present, it appears just + before the <structfield>t_hoff</> boundary. Any padding needed to make + <structfield>t_hoff</> a MAXALIGN multiple will appear between the null + bitmap and the object ID. (This in turn ensures that the object ID is + suitably aligned.) </para> @@ -211,34 +242,34 @@ Item </thead> <tbody> <row> - <entry>t_oid</entry> - <entry>Oid</entry> + <entry>t_xmin</entry> + <entry>TransactionId</entry> <entry>4 bytes</entry> - <entry>OID of this tuple</entry> + <entry>insert XID stamp</entry> </row> <row> <entry>t_cmin</entry> <entry>CommandId</entry> <entry>4 bytes</entry> - <entry>insert CID stamp</entry> + <entry>insert CID stamp (overlays with t_xmax)</entry> </row> <row> - <entry>t_cmax</entry> - <entry>CommandId</entry> + <entry>t_xmax</entry> + <entry>TransactionId</entry> <entry>4 bytes</entry> - <entry>delete CID stamp</entry> + <entry>delete XID stamp</entry> </row> <row> - <entry>t_xmin</entry> - <entry>TransactionId</entry> + <entry>t_cmax</entry> + <entry>CommandId</entry> <entry>4 bytes</entry> - <entry>insert XID stamp</entry> + <entry>delete CID stamp (overlays with t_xvac)</entry> </row> <row> - <entry>t_xmax</entry> + <entry>t_xvac</entry> <entry>TransactionId</entry> <entry>4 bytes</entry> - <entry>delete XID stamp</entry> + <entry>XID for VACUUM operation moving tuple</entry> </row> <row> <entry>t_ctid</entry> @@ -256,30 +287,28 @@ Item <entry>t_infomask</entry> <entry>uint16</entry> <entry>2 bytes</entry> - <entry>Various flags</entry> + <entry>various flags</entry> </row> <row> <entry>t_hoff</entry> <entry>uint8</entry> <entry>1 byte</entry> - <entry>length of tuple header. Also offset of data.</entry> + <entry>offset to user data</entry> </row> </tbody> </tgroup> </table> <para> - - All the details may be found in src/include/storage/bufpage.h. - + All the details may be found in src/include/access/htup.h. </para> <para> Interpreting the actual data can only be done with information obtained from other tables, mostly <firstterm>pg_attribute</firstterm>. The - particular fields are <firstterm>attlen</firstterm> and - <firstterm>attalign</firstterm>. There is no way to directly get a + particular fields are <structfield>attlen</structfield> and + <structfield>attalign</structfield>. There is no way to directly get a particular attribute, except when there are only fixed width fields and no NULLs. All this trickery is wrapped up in the functions <firstterm>heap_getattr</firstterm>, <firstterm>fastgetattr</firstterm> @@ -293,7 +322,7 @@ Item the next. Then make sure you have the right alignment. If the field is a fixed width field, then all the bytes are simply placed. If it's a variable length field (attlen == -1) then it's a bit more complicated, - using the variable length structure <firstterm>varattrib</firstterm>. + using the variable length structure <type>varattrib</type>. Depending on the flags, the data may be either inline, compressed or in another table (TOAST). |