diff options
-rw-r--r-- | doc/src/sgml/ref/create_type.sgml | 25 | ||||
-rw-r--r-- | doc/src/sgml/storage.sgml | 143 | ||||
-rw-r--r-- | doc/src/sgml/xtypes.sgml | 52 |
3 files changed, 157 insertions, 63 deletions
diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml index e5d7992bbf5..f9e1297d0b0 100644 --- a/doc/src/sgml/ref/create_type.sgml +++ b/doc/src/sgml/ref/create_type.sgml @@ -329,15 +329,17 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> to <literal>VARIABLE</literal>. (Internally, this is represented by setting <literal>typlen</> to -1.) The internal representation of all variable-length types must start with a 4-byte integer giving the total - length of this value of the type. + length of this value of the type. (Note that the length field is often + encoded, as described in <xref linkend="storage-toast">; it's unwise + to access it directly.) </para> <para> The optional flag <literal>PASSEDBYVALUE</literal> indicates that values of this data type are passed by value, rather than by - reference. You cannot pass by value types whose internal - representation is larger than the size of the <type>Datum</> type - (4 bytes on most machines, 8 bytes on a few). + reference. Types passed by value must be fixed-length, and their internal + representation cannot be larger than the size of the <type>Datum</> type + (4 bytes on some machines, 8 bytes on others). </para> <para> @@ -368,6 +370,17 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> </para> <para> + All <replaceable class="parameter">storage</replaceable> values other + than <literal>plain</literal> imply that the functions of the data type + can handle values that have been <firstterm>toasted</>, as described + in <xref linkend="storage-toast"> and <xref linkend="xtypes-toast">. + The specific other value given merely determines the default TOAST + storage strategy for columns of a toastable data type; users can pick + other strategies for individual columns using <literal>ALTER TABLE + SET STORAGE</>. + </para> + + <para> The <replaceable class="parameter">like_type</replaceable> parameter provides an alternative method for specifying the basic representation properties of a data type: copy them from some existing type. The values of @@ -465,8 +478,8 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> identical things, and you want to allow these things to be accessed directly by subscripting, in addition to whatever operations you plan to provide for the type as a whole. For example, type <type>point</> - is represented as just two floating-point numbers, each can be accessed using - <literal>point[0]</> and <literal>point[1]</>. + is represented as just two floating-point numbers, which can be accessed + using <literal>point[0]</> and <literal>point[1]</>. Note that this facility only works for fixed-length types whose internal form is exactly a sequence of identical fixed-length fields. A subscriptable diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml index 85a8de2ece9..d8c52875d82 100644 --- a/doc/src/sgml/storage.sgml +++ b/doc/src/sgml/storage.sgml @@ -303,25 +303,33 @@ Oversized-Attribute Storage Technique). <para> <productname>PostgreSQL</productname> uses a fixed page size (commonly -8 kB), and does not allow tuples to span multiple pages. Therefore, it is +8 kB), and does not allow tuples to span multiple pages. Therefore, it is not possible to store very large field values directly. To overcome -this limitation, large field values are compressed and/or broken up into -multiple physical rows. This happens transparently to the user, with only +this limitation, large field values are compressed and/or broken up into +multiple physical rows. This happens transparently to the user, with only small impact on most of the backend code. The technique is affectionately -known as <acronym>TOAST</> (or <quote>the best thing since sliced bread</>). +known as <acronym>TOAST</> (or <quote>the best thing since sliced bread</>). +The <acronym>TOAST</> infrastructure is also used to improve handling of +large data values in-memory. </para> <para> Only certain data types support <acronym>TOAST</> — there is no need to impose the overhead on data types that cannot produce large field values. To support <acronym>TOAST</>, a data type must have a variable-length -(<firstterm>varlena</>) representation, in which the first 32-bit word of any -stored value contains the total length of the value in bytes (including -itself). <acronym>TOAST</> does not constrain the rest of the representation. -All the C-level functions supporting a <acronym>TOAST</>-able data type must -be careful to handle <acronym>TOAST</>ed input values. (This is normally done -by invoking <function>PG_DETOAST_DATUM</> before doing anything with an input -value, but in some cases more efficient approaches are possible.) +(<firstterm>varlena</>) representation, in which, ordinarily, the first +four-byte word of any stored value contains the total length of the value in +bytes (including itself). <acronym>TOAST</> does not constrain the rest +of the data type's representation. The special representations collectively +called <firstterm><acronym>TOAST</>ed values</firstterm> work by modifying or +reinterpreting this initial length word. Therefore, the C-level functions +supporting a <acronym>TOAST</>-able data type must be careful about how they +handle potentially <acronym>TOAST</>ed input values: an input might not +actually consist of a four-byte length word and contents until after it's +been <firstterm>detoasted</>. (This is normally done by invoking +<function>PG_DETOAST_DATUM</> before doing anything with an input value, +but in some cases more efficient approaches are possible. +See <xref linkend="xtypes-toast"> for more detail.) </para> <para> @@ -333,58 +341,84 @@ the value is an ordinary un-<acronym>TOAST</>ed value of the data type, and the remaining bits of the length word give the total datum size (including length word) in bytes. When the highest-order or lowest-order bit is set, the value has only a single-byte header instead of the normal four-byte -header, and the remaining bits give the total datum size (including length -byte) in bytes. As a special case, if the remaining bits are all zero -(which would be impossible for a self-inclusive length), the value is a -pointer to out-of-line data stored in a separate TOAST table. (The size of -a TOAST pointer is given in the second byte of the datum.) -Values with single-byte headers aren't aligned on any particular -boundary, either. Lastly, when the highest-order or lowest-order bit is -clear but the adjacent bit is set, the content of the datum has been -compressed and must be decompressed before use. In this case the remaining -bits of the length word give the total size of the compressed datum, not the +header, and the remaining bits of that byte give the total datum size +(including length byte) in bytes. This alternative supports space-efficient +storage of values shorter than 127 bytes, while still allowing the data type +to grow to 1 GB at need. Values with single-byte headers aren't aligned on +any particular boundary, whereas values with four-byte headers are aligned on +at least a four-byte boundary; this omission of alignment padding provides +additional space savings that is significant compared to short values. +As a special case, if the remaining bits of a single-byte header are all +zero (which would be impossible for a self-inclusive length), the value is +a pointer to out-of-line data, with several possible alternatives as +described below. The type and size of such a <firstterm>TOAST pointer</> +are determined by a code stored in the second byte of the datum. +Lastly, when the highest-order or lowest-order bit is clear but the adjacent +bit is set, the content of the datum has been compressed and must be +decompressed before use. In this case the remaining bits of the four-byte +length word give the total size of the compressed datum, not the original data. Note that compression is also possible for out-of-line data but the varlena header does not tell whether it has occurred — -the content of the TOAST pointer tells that, instead. +the content of the <acronym>TOAST</> pointer tells that, instead. </para> <para> -If any of the columns of a table are <acronym>TOAST</>-able, the table will -have an associated <acronym>TOAST</> table, whose OID is stored in the table's -<structname>pg_class</>.<structfield>reltoastrelid</> entry. Out-of-line -<acronym>TOAST</>ed values are kept in the <acronym>TOAST</> table, as -described in more detail below. +As mentioned, there are multiple types of <acronym>TOAST</> pointer datums. +The oldest and most common type is a pointer to out-of-line data stored in +a <firstterm><acronym>TOAST</> table</firstterm> that is separate from, but +associated with, the table containing the <acronym>TOAST</> pointer datum +itself. These <firstterm>on-disk</> pointer datums are created by the +<acronym>TOAST</> management code (in <filename>access/heap/tuptoaster.c</>) +when a tuple to be stored on disk is too large to be stored as-is. +Further details appear in <xref linkend="storage-toast-ondisk">. +Alternatively, a <acronym>TOAST</> pointer datum can contain a pointer to +out-of-line data that appears elsewhere in memory. Such datums are +necessarily short-lived, and will never appear on-disk, but they are very +useful for avoiding copying and redundant processing of large data values. +Further details appear in <xref linkend="storage-toast-inmemory">. </para> <para> -The compression technique used is a fairly simple and very fast member +The compression technique used for either in-line or out-of-line compressed +data is a fairly simple and very fast member of the LZ family of compression techniques. See <filename>src/common/pg_lzcompress.c</> for the details. </para> +<sect2 id="storage-toast-ondisk"> + <title>Out-of-line, on-disk TOAST storage</title> + +<para> +If any of the columns of a table are <acronym>TOAST</>-able, the table will +have an associated <acronym>TOAST</> table, whose OID is stored in the table's +<structname>pg_class</>.<structfield>reltoastrelid</> entry. On-disk +<acronym>TOAST</>ed values are kept in the <acronym>TOAST</> table, as +described in more detail below. +</para> + <para> Out-of-line values are divided (after compression if used) into chunks of at most <symbol>TOAST_MAX_CHUNK_SIZE</> bytes (by default this value is chosen so that four chunk rows will fit on a page, making it about 2000 bytes). -Each chunk is stored -as a separate row in the <acronym>TOAST</> table for the owning table. Every +Each chunk is stored as a separate row in the <acronym>TOAST</> table +belonging to the owning table. Every <acronym>TOAST</> table has the columns <structfield>chunk_id</> (an OID identifying the particular <acronym>TOAST</>ed value), <structfield>chunk_seq</> (a sequence number for the chunk within its value), and <structfield>chunk_data</> (the actual data of the chunk). A unique index on <structfield>chunk_id</> and <structfield>chunk_seq</> provides fast -retrieval of the values. A pointer datum representing an out-of-line +retrieval of the values. A pointer datum representing an out-of-line on-disk <acronym>TOAST</>ed value therefore needs to store the OID of the <acronym>TOAST</> table in which to look and the OID of the specific value (its <structfield>chunk_id</>). For convenience, pointer datums also store the -logical datum size (original uncompressed data length) and actual stored size +logical datum size (original uncompressed data length) and physical stored size (different if compression was applied). Allowing for the varlena header bytes, -the total size of a <acronym>TOAST</> pointer datum is therefore 18 bytes -regardless of the actual size of the represented value. +the total size of an on-disk <acronym>TOAST</> pointer datum is therefore 18 +bytes regardless of the actual size of the represented value. </para> <para> -The <acronym>TOAST</> code is triggered only +The <acronym>TOAST</> management code is triggered only when a row value to be stored in a table is wider than <symbol>TOAST_TUPLE_THRESHOLD</> bytes (normally 2 kB). The <acronym>TOAST</> code will compress and/or move @@ -397,8 +431,8 @@ none of the out-of-line values change. </para> <para> -The <acronym>TOAST</> code recognizes four different strategies for storing -<acronym>TOAST</>-able columns: +The <acronym>TOAST</> management code recognizes four different strategies +for storing <acronym>TOAST</>-able columns on disk: <itemizedlist> <listitem> @@ -460,6 +494,41 @@ pages). There was no run time difference compared to an un-<acronym>TOAST</>ed comparison table, in which all the HTML pages were cut down to 7 kB to fit. </para> +</sect2> + +<sect2 id="storage-toast-inmemory"> + <title>Out-of-line, in-memory TOAST storage</title> + +<para> +<acronym>TOAST</> pointers can point to data that is not on disk, but is +elsewhere in the memory of the current server process. Such pointers +obviously cannot be long-lived, but they are nonetheless useful. There +is currently just one sub-case: +pointers to <firstterm>indirect</> data. +</para> + +<para> +Indirect <acronym>TOAST</> pointers simply point at a non-indirect varlena +value stored somewhere in memory. This case was originally created merely +as a proof of concept, but it is currently used during logical decoding to +avoid possibly having to create physical tuples exceeding 1 GB (as pulling +all out-of-line field values into the tuple might do). The case is of +limited use since the creator of the pointer datum is entirely responsible +that the referenced data survives for as long as the pointer could exist, +and there is no infrastructure to help with this. +</para> + +<para> +For all types of in-memory <acronym>TOAST</> pointer, the <acronym>TOAST</> +management code ensures that no such pointer datum can accidentally get +stored on disk. In-memory <acronym>TOAST</> pointers are automatically +expanded to normal in-line varlena values before storage — and then +possibly converted to on-disk <acronym>TOAST</> pointers, if the containing +tuple would otherwise be too big. +</para> + +</sect2> + </sect1> <sect1 id="storage-fsm"> diff --git a/doc/src/sgml/xtypes.sgml b/doc/src/sgml/xtypes.sgml index e1340baeb73..2459616281d 100644 --- a/doc/src/sgml/xtypes.sgml +++ b/doc/src/sgml/xtypes.sgml @@ -234,35 +234,49 @@ CREATE TYPE complex ( </para> <para> + If the internal representation of the data type is variable-length, the + internal representation must follow the standard layout for variable-length + data: the first four bytes must be a <type>char[4]</type> field which is + never accessed directly (customarily named <structfield>vl_len_</>). You + must use the <function>SET_VARSIZE()</function> macro to store the total + size of the datum (including the length field itself) in this field + and <function>VARSIZE()</function> to retrieve it. (These macros exist + because the length field may be encoded depending on platform.) + </para> + + <para> + For further details see the description of the + <xref linkend="sql-createtype"> command. + </para> + + <sect2 id="xtypes-toast"> + <title>TOAST Considerations</title> <indexterm> <primary>TOAST</primary> <secondary>and user-defined types</secondary> </indexterm> - If the values of your data type vary in size (in internal form), you should - make the data type <acronym>TOAST</>-able (see <xref - linkend="storage-toast">). You should do this even if the data are always + + <para> + If the values of your data type vary in size (in internal form), it's + usually desirable to make the data type <acronym>TOAST</>-able (see <xref + linkend="storage-toast">). You should do this even if the values are always too small to be compressed or stored externally, because <acronym>TOAST</> can save space on small data too, by reducing header overhead. </para> <para> - To do this, the internal representation must follow the standard layout for - variable-length data: the first four bytes must be a <type>char[4]</type> - field which is never accessed directly (customarily named - <structfield>vl_len_</>). You - must use <function>SET_VARSIZE()</function> to store the size of the datum - in this field and <function>VARSIZE()</function> to retrieve it. The C - functions operating on the data type must always be careful to unpack any - toasted values they are handed, by using <function>PG_DETOAST_DATUM</>. - (This detail is customarily hidden by defining type-specific - <function>GETARG_DATATYPE_P</function> macros.) Then, when running the - <command>CREATE TYPE</command> command, specify the internal length as - <literal>variable</> and select the appropriate storage option. + To support <acronym>TOAST</> storage, the C functions operating on the data + type must always be careful to unpack any toasted values they are handed + by using <function>PG_DETOAST_DATUM</>. (This detail is customarily hidden + by defining type-specific <function>GETARG_DATATYPE_P</function> macros.) + Then, when running the <command>CREATE TYPE</command> command, specify the + internal length as <literal>variable</> and select some appropriate storage + option other than <literal>plain</>. </para> <para> - If the alignment is unimportant (either just for a specific function or + If data alignment is unimportant (either just for a specific function or because the data type specifies byte alignment anyway) then it's possible to avoid some of the overhead of <function>PG_DETOAST_DATUM</>. You can use <function>PG_DETOAST_DATUM_PACKED</> instead (customarily hidden by @@ -286,8 +300,6 @@ CREATE TYPE complex ( </para> </note> - <para> - For further details see the description of the - <xref linkend="sql-createtype"> command. - </para> + </sect2> + </sect1> |