aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--doc/src/sgml/ref/create_type.sgml25
-rw-r--r--doc/src/sgml/storage.sgml143
-rw-r--r--doc/src/sgml/xtypes.sgml52
3 files changed, 157 insertions, 63 deletions
diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml
index e5d7992bbf5..f9e1297d0b0 100644
--- a/doc/src/sgml/ref/create_type.sgml
+++ b/doc/src/sgml/ref/create_type.sgml
@@ -329,15 +329,17 @@ CREATE TYPE <replaceable class="parameter">name</replaceable>
to <literal>VARIABLE</literal>. (Internally, this is represented
by setting <literal>typlen</> to -1.) The internal representation of all
variable-length types must start with a 4-byte integer giving the total
- length of this value of the type.
+ length of this value of the type. (Note that the length field is often
+ encoded, as described in <xref linkend="storage-toast">; it's unwise
+ to access it directly.)
</para>
<para>
The optional flag <literal>PASSEDBYVALUE</literal> indicates that
values of this data type are passed by value, rather than by
- reference. You cannot pass by value types whose internal
- representation is larger than the size of the <type>Datum</> type
- (4 bytes on most machines, 8 bytes on a few).
+ reference. Types passed by value must be fixed-length, and their internal
+ representation cannot be larger than the size of the <type>Datum</> type
+ (4 bytes on some machines, 8 bytes on others).
</para>
<para>
@@ -368,6 +370,17 @@ CREATE TYPE <replaceable class="parameter">name</replaceable>
</para>
<para>
+ All <replaceable class="parameter">storage</replaceable> values other
+ than <literal>plain</literal> imply that the functions of the data type
+ can handle values that have been <firstterm>toasted</>, as described
+ in <xref linkend="storage-toast"> and <xref linkend="xtypes-toast">.
+ The specific other value given merely determines the default TOAST
+ storage strategy for columns of a toastable data type; users can pick
+ other strategies for individual columns using <literal>ALTER TABLE
+ SET STORAGE</>.
+ </para>
+
+ <para>
The <replaceable class="parameter">like_type</replaceable> parameter
provides an alternative method for specifying the basic representation
properties of a data type: copy them from some existing type. The values of
@@ -465,8 +478,8 @@ CREATE TYPE <replaceable class="parameter">name</replaceable>
identical things, and you want to allow these things to be accessed
directly by subscripting, in addition to whatever operations you plan
to provide for the type as a whole. For example, type <type>point</>
- is represented as just two floating-point numbers, each can be accessed using
- <literal>point[0]</> and <literal>point[1]</>.
+ is represented as just two floating-point numbers, which can be accessed
+ using <literal>point[0]</> and <literal>point[1]</>.
Note that
this facility only works for fixed-length types whose internal form
is exactly a sequence of identical fixed-length fields. A subscriptable
diff --git a/doc/src/sgml/storage.sgml b/doc/src/sgml/storage.sgml
index 85a8de2ece9..d8c52875d82 100644
--- a/doc/src/sgml/storage.sgml
+++ b/doc/src/sgml/storage.sgml
@@ -303,25 +303,33 @@ Oversized-Attribute Storage Technique).
<para>
<productname>PostgreSQL</productname> uses a fixed page size (commonly
-8 kB), and does not allow tuples to span multiple pages. Therefore, it is
+8 kB), and does not allow tuples to span multiple pages. Therefore, it is
not possible to store very large field values directly. To overcome
-this limitation, large field values are compressed and/or broken up into
-multiple physical rows. This happens transparently to the user, with only
+this limitation, large field values are compressed and/or broken up into
+multiple physical rows. This happens transparently to the user, with only
small impact on most of the backend code. The technique is affectionately
-known as <acronym>TOAST</> (or <quote>the best thing since sliced bread</>).
+known as <acronym>TOAST</> (or <quote>the best thing since sliced bread</>).
+The <acronym>TOAST</> infrastructure is also used to improve handling of
+large data values in-memory.
</para>
<para>
Only certain data types support <acronym>TOAST</> &mdash; there is no need to
impose the overhead on data types that cannot produce large field values.
To support <acronym>TOAST</>, a data type must have a variable-length
-(<firstterm>varlena</>) representation, in which the first 32-bit word of any
-stored value contains the total length of the value in bytes (including
-itself). <acronym>TOAST</> does not constrain the rest of the representation.
-All the C-level functions supporting a <acronym>TOAST</>-able data type must
-be careful to handle <acronym>TOAST</>ed input values. (This is normally done
-by invoking <function>PG_DETOAST_DATUM</> before doing anything with an input
-value, but in some cases more efficient approaches are possible.)
+(<firstterm>varlena</>) representation, in which, ordinarily, the first
+four-byte word of any stored value contains the total length of the value in
+bytes (including itself). <acronym>TOAST</> does not constrain the rest
+of the data type's representation. The special representations collectively
+called <firstterm><acronym>TOAST</>ed values</firstterm> work by modifying or
+reinterpreting this initial length word. Therefore, the C-level functions
+supporting a <acronym>TOAST</>-able data type must be careful about how they
+handle potentially <acronym>TOAST</>ed input values: an input might not
+actually consist of a four-byte length word and contents until after it's
+been <firstterm>detoasted</>. (This is normally done by invoking
+<function>PG_DETOAST_DATUM</> before doing anything with an input value,
+but in some cases more efficient approaches are possible.
+See <xref linkend="xtypes-toast"> for more detail.)
</para>
<para>
@@ -333,58 +341,84 @@ the value is an ordinary un-<acronym>TOAST</>ed value of the data type, and
the remaining bits of the length word give the total datum size (including
length word) in bytes. When the highest-order or lowest-order bit is set,
the value has only a single-byte header instead of the normal four-byte
-header, and the remaining bits give the total datum size (including length
-byte) in bytes. As a special case, if the remaining bits are all zero
-(which would be impossible for a self-inclusive length), the value is a
-pointer to out-of-line data stored in a separate TOAST table. (The size of
-a TOAST pointer is given in the second byte of the datum.)
-Values with single-byte headers aren't aligned on any particular
-boundary, either. Lastly, when the highest-order or lowest-order bit is
-clear but the adjacent bit is set, the content of the datum has been
-compressed and must be decompressed before use. In this case the remaining
-bits of the length word give the total size of the compressed datum, not the
+header, and the remaining bits of that byte give the total datum size
+(including length byte) in bytes. This alternative supports space-efficient
+storage of values shorter than 127 bytes, while still allowing the data type
+to grow to 1 GB at need. Values with single-byte headers aren't aligned on
+any particular boundary, whereas values with four-byte headers are aligned on
+at least a four-byte boundary; this omission of alignment padding provides
+additional space savings that is significant compared to short values.
+As a special case, if the remaining bits of a single-byte header are all
+zero (which would be impossible for a self-inclusive length), the value is
+a pointer to out-of-line data, with several possible alternatives as
+described below. The type and size of such a <firstterm>TOAST pointer</>
+are determined by a code stored in the second byte of the datum.
+Lastly, when the highest-order or lowest-order bit is clear but the adjacent
+bit is set, the content of the datum has been compressed and must be
+decompressed before use. In this case the remaining bits of the four-byte
+length word give the total size of the compressed datum, not the
original data. Note that compression is also possible for out-of-line data
but the varlena header does not tell whether it has occurred &mdash;
-the content of the TOAST pointer tells that, instead.
+the content of the <acronym>TOAST</> pointer tells that, instead.
</para>
<para>
-If any of the columns of a table are <acronym>TOAST</>-able, the table will
-have an associated <acronym>TOAST</> table, whose OID is stored in the table's
-<structname>pg_class</>.<structfield>reltoastrelid</> entry. Out-of-line
-<acronym>TOAST</>ed values are kept in the <acronym>TOAST</> table, as
-described in more detail below.
+As mentioned, there are multiple types of <acronym>TOAST</> pointer datums.
+The oldest and most common type is a pointer to out-of-line data stored in
+a <firstterm><acronym>TOAST</> table</firstterm> that is separate from, but
+associated with, the table containing the <acronym>TOAST</> pointer datum
+itself. These <firstterm>on-disk</> pointer datums are created by the
+<acronym>TOAST</> management code (in <filename>access/heap/tuptoaster.c</>)
+when a tuple to be stored on disk is too large to be stored as-is.
+Further details appear in <xref linkend="storage-toast-ondisk">.
+Alternatively, a <acronym>TOAST</> pointer datum can contain a pointer to
+out-of-line data that appears elsewhere in memory. Such datums are
+necessarily short-lived, and will never appear on-disk, but they are very
+useful for avoiding copying and redundant processing of large data values.
+Further details appear in <xref linkend="storage-toast-inmemory">.
</para>
<para>
-The compression technique used is a fairly simple and very fast member
+The compression technique used for either in-line or out-of-line compressed
+data is a fairly simple and very fast member
of the LZ family of compression techniques. See
<filename>src/common/pg_lzcompress.c</> for the details.
</para>
+<sect2 id="storage-toast-ondisk">
+ <title>Out-of-line, on-disk TOAST storage</title>
+
+<para>
+If any of the columns of a table are <acronym>TOAST</>-able, the table will
+have an associated <acronym>TOAST</> table, whose OID is stored in the table's
+<structname>pg_class</>.<structfield>reltoastrelid</> entry. On-disk
+<acronym>TOAST</>ed values are kept in the <acronym>TOAST</> table, as
+described in more detail below.
+</para>
+
<para>
Out-of-line values are divided (after compression if used) into chunks of at
most <symbol>TOAST_MAX_CHUNK_SIZE</> bytes (by default this value is chosen
so that four chunk rows will fit on a page, making it about 2000 bytes).
-Each chunk is stored
-as a separate row in the <acronym>TOAST</> table for the owning table. Every
+Each chunk is stored as a separate row in the <acronym>TOAST</> table
+belonging to the owning table. Every
<acronym>TOAST</> table has the columns <structfield>chunk_id</> (an OID
identifying the particular <acronym>TOAST</>ed value),
<structfield>chunk_seq</> (a sequence number for the chunk within its value),
and <structfield>chunk_data</> (the actual data of the chunk). A unique index
on <structfield>chunk_id</> and <structfield>chunk_seq</> provides fast
-retrieval of the values. A pointer datum representing an out-of-line
+retrieval of the values. A pointer datum representing an out-of-line on-disk
<acronym>TOAST</>ed value therefore needs to store the OID of the
<acronym>TOAST</> table in which to look and the OID of the specific value
(its <structfield>chunk_id</>). For convenience, pointer datums also store the
-logical datum size (original uncompressed data length) and actual stored size
+logical datum size (original uncompressed data length) and physical stored size
(different if compression was applied). Allowing for the varlena header bytes,
-the total size of a <acronym>TOAST</> pointer datum is therefore 18 bytes
-regardless of the actual size of the represented value.
+the total size of an on-disk <acronym>TOAST</> pointer datum is therefore 18
+bytes regardless of the actual size of the represented value.
</para>
<para>
-The <acronym>TOAST</> code is triggered only
+The <acronym>TOAST</> management code is triggered only
when a row value to be stored in a table is wider than
<symbol>TOAST_TUPLE_THRESHOLD</> bytes (normally 2 kB).
The <acronym>TOAST</> code will compress and/or move
@@ -397,8 +431,8 @@ none of the out-of-line values change.
</para>
<para>
-The <acronym>TOAST</> code recognizes four different strategies for storing
-<acronym>TOAST</>-able columns:
+The <acronym>TOAST</> management code recognizes four different strategies
+for storing <acronym>TOAST</>-able columns on disk:
<itemizedlist>
<listitem>
@@ -460,6 +494,41 @@ pages). There was no run time difference compared to an un-<acronym>TOAST</>ed
comparison table, in which all the HTML pages were cut down to 7 kB to fit.
</para>
+</sect2>
+
+<sect2 id="storage-toast-inmemory">
+ <title>Out-of-line, in-memory TOAST storage</title>
+
+<para>
+<acronym>TOAST</> pointers can point to data that is not on disk, but is
+elsewhere in the memory of the current server process. Such pointers
+obviously cannot be long-lived, but they are nonetheless useful. There
+is currently just one sub-case:
+pointers to <firstterm>indirect</> data.
+</para>
+
+<para>
+Indirect <acronym>TOAST</> pointers simply point at a non-indirect varlena
+value stored somewhere in memory. This case was originally created merely
+as a proof of concept, but it is currently used during logical decoding to
+avoid possibly having to create physical tuples exceeding 1 GB (as pulling
+all out-of-line field values into the tuple might do). The case is of
+limited use since the creator of the pointer datum is entirely responsible
+that the referenced data survives for as long as the pointer could exist,
+and there is no infrastructure to help with this.
+</para>
+
+<para>
+For all types of in-memory <acronym>TOAST</> pointer, the <acronym>TOAST</>
+management code ensures that no such pointer datum can accidentally get
+stored on disk. In-memory <acronym>TOAST</> pointers are automatically
+expanded to normal in-line varlena values before storage &mdash; and then
+possibly converted to on-disk <acronym>TOAST</> pointers, if the containing
+tuple would otherwise be too big.
+</para>
+
+</sect2>
+
</sect1>
<sect1 id="storage-fsm">
diff --git a/doc/src/sgml/xtypes.sgml b/doc/src/sgml/xtypes.sgml
index e1340baeb73..2459616281d 100644
--- a/doc/src/sgml/xtypes.sgml
+++ b/doc/src/sgml/xtypes.sgml
@@ -234,35 +234,49 @@ CREATE TYPE complex (
</para>
<para>
+ If the internal representation of the data type is variable-length, the
+ internal representation must follow the standard layout for variable-length
+ data: the first four bytes must be a <type>char[4]</type> field which is
+ never accessed directly (customarily named <structfield>vl_len_</>). You
+ must use the <function>SET_VARSIZE()</function> macro to store the total
+ size of the datum (including the length field itself) in this field
+ and <function>VARSIZE()</function> to retrieve it. (These macros exist
+ because the length field may be encoded depending on platform.)
+ </para>
+
+ <para>
+ For further details see the description of the
+ <xref linkend="sql-createtype"> command.
+ </para>
+
+ <sect2 id="xtypes-toast">
+ <title>TOAST Considerations</title>
<indexterm>
<primary>TOAST</primary>
<secondary>and user-defined types</secondary>
</indexterm>
- If the values of your data type vary in size (in internal form), you should
- make the data type <acronym>TOAST</>-able (see <xref
- linkend="storage-toast">). You should do this even if the data are always
+
+ <para>
+ If the values of your data type vary in size (in internal form), it's
+ usually desirable to make the data type <acronym>TOAST</>-able (see <xref
+ linkend="storage-toast">). You should do this even if the values are always
too small to be compressed or stored externally, because
<acronym>TOAST</> can save space on small data too, by reducing header
overhead.
</para>
<para>
- To do this, the internal representation must follow the standard layout for
- variable-length data: the first four bytes must be a <type>char[4]</type>
- field which is never accessed directly (customarily named
- <structfield>vl_len_</>). You
- must use <function>SET_VARSIZE()</function> to store the size of the datum
- in this field and <function>VARSIZE()</function> to retrieve it. The C
- functions operating on the data type must always be careful to unpack any
- toasted values they are handed, by using <function>PG_DETOAST_DATUM</>.
- (This detail is customarily hidden by defining type-specific
- <function>GETARG_DATATYPE_P</function> macros.) Then, when running the
- <command>CREATE TYPE</command> command, specify the internal length as
- <literal>variable</> and select the appropriate storage option.
+ To support <acronym>TOAST</> storage, the C functions operating on the data
+ type must always be careful to unpack any toasted values they are handed
+ by using <function>PG_DETOAST_DATUM</>. (This detail is customarily hidden
+ by defining type-specific <function>GETARG_DATATYPE_P</function> macros.)
+ Then, when running the <command>CREATE TYPE</command> command, specify the
+ internal length as <literal>variable</> and select some appropriate storage
+ option other than <literal>plain</>.
</para>
<para>
- If the alignment is unimportant (either just for a specific function or
+ If data alignment is unimportant (either just for a specific function or
because the data type specifies byte alignment anyway) then it's possible
to avoid some of the overhead of <function>PG_DETOAST_DATUM</>. You can use
<function>PG_DETOAST_DATUM_PACKED</> instead (customarily hidden by
@@ -286,8 +300,6 @@ CREATE TYPE complex (
</para>
</note>
- <para>
- For further details see the description of the
- <xref linkend="sql-createtype"> command.
- </para>
+ </sect2>
+
</sect1>