diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2024-04-03 17:41:54 -0400 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2024-04-03 17:41:57 -0400 |
commit | 06286709ee0637ec7376329a5aa026b7682dcfe2 (patch) | |
tree | f1c4f4b606b28227c401be2722b1d08c924586e9 /doc/src | |
parent | 97ce821e3e171ce99fa7c398889ac08432cd0264 (diff) | |
download | postgresql-06286709ee0637ec7376329a5aa026b7682dcfe2.tar.gz postgresql-06286709ee0637ec7376329a5aa026b7682dcfe2.zip |
Invent SERIALIZE option for EXPLAIN.
EXPLAIN (ANALYZE, SERIALIZE) allows collection of statistics about
the volume of data emitted by a query, as well as the time taken
to convert the data to the on-the-wire format. Previously there
was no way to investigate this without actually sending the data
to the client, in which case network transmission costs might
swamp what you wanted to see. In particular this feature allows
investigating the costs of de-TOASTing compressed or out-of-line
data during formatting.
Stepan Rutz and Matthias van de Meent,
reviewed by Tomas Vondra and myself
Discussion: https://postgr.es/m/ca0adb0e-fa4e-c37e-1cd7-91170b18cae1@gmx.de
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/perform.sgml | 24 | ||||
-rw-r--r-- | doc/src/sgml/ref/explain.sgml | 29 |
2 files changed, 47 insertions, 6 deletions
diff --git a/doc/src/sgml/perform.sgml b/doc/src/sgml/perform.sgml index 4b831a62066..2d0097f121a 100644 --- a/doc/src/sgml/perform.sgml +++ b/doc/src/sgml/perform.sgml @@ -144,11 +144,11 @@ EXPLAIN SELECT * FROM tenk1; It's important to understand that the cost of an upper-level node includes the cost of all its child nodes. It's also important to realize that the cost only reflects things that the planner cares about. - In particular, the cost does not consider the time spent transmitting - result rows to the client, which could be an important - factor in the real elapsed time; but the planner ignores it because - it cannot change it by altering the plan. (Every correct plan will - output the same row set, we trust.) + In particular, the cost does not consider the time spent to convert + output values to text form or to transmit them to the client, which + could be important factors in the real elapsed time; but the planner + ignores those costs because it cannot change them by altering the + plan. (Every correct plan will output the same row set, we trust.) </para> <para> @@ -956,6 +956,17 @@ EXPLAIN UPDATE parent SET f2 = f2 + 1 WHERE f1 = 101; <command>EXPLAIN ANALYZE</command>. </para> + <para> + The time shown for the top-level node does not include any time needed + to convert the query's output data into displayable form or to send it + to the client. While <command>EXPLAIN ANALYZE</command> will never + send the data to the client, it can be told to convert the query's + output data to displayable form and measure the time needed for that, + by specifying the <literal>SERIALIZE</literal> option. That time will + be shown separately, and it's also included in the + total <literal>Execution time</literal>. + </para> + </sect2> <sect2 id="using-explain-caveats"> @@ -965,7 +976,8 @@ EXPLAIN UPDATE parent SET f2 = f2 + 1 WHERE f1 = 101; There are two significant ways in which run times measured by <command>EXPLAIN ANALYZE</command> can deviate from normal execution of the same query. First, since no output rows are delivered to the client, - network transmission costs and I/O conversion costs are not included. + network transmission costs are not included. I/O conversion costs are + not included either unless <literal>SERIALIZE</literal> is specified. Second, the measurement overhead added by <command>EXPLAIN ANALYZE</command> can be significant, especially on machines with slow <function>gettimeofday()</function> operating-system calls. You can use the diff --git a/doc/src/sgml/ref/explain.sgml b/doc/src/sgml/ref/explain.sgml index a4b6564bdb3..db9d3a8549a 100644 --- a/doc/src/sgml/ref/explain.sgml +++ b/doc/src/sgml/ref/explain.sgml @@ -41,6 +41,7 @@ EXPLAIN [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] <rep SETTINGS [ <replaceable class="parameter">boolean</replaceable> ] GENERIC_PLAN [ <replaceable class="parameter">boolean</replaceable> ] BUFFERS [ <replaceable class="parameter">boolean</replaceable> ] + SERIALIZE [ { NONE | TEXT | BINARY } ] WAL [ <replaceable class="parameter">boolean</replaceable> ] TIMING [ <replaceable class="parameter">boolean</replaceable> ] SUMMARY [ <replaceable class="parameter">boolean</replaceable> ] @@ -207,6 +208,34 @@ ROLLBACK; </varlistentry> <varlistentry> + <term><literal>SERIALIZE</literal></term> + <listitem> + <para> + Include information on the cost + of <firstterm>serializing</firstterm> the query's output data, that + is converting it to text or binary format to send to the client. + This can be a significant part of the time required for regular + execution of the query, if the datatype output functions are + expensive or if <acronym>TOAST</acronym>ed values must be fetched + from out-of-line storage. <command>EXPLAIN</command>'s default + behavior, <literal>SERIALIZE NONE</literal>, does not perform these + conversions. If <literal>SERIALIZE TEXT</literal> + or <literal>SERIALIZE BINARY</literal> is specified, the appropriate + conversions are performed, and the time spent doing so is measured + (unless <literal>TIMING OFF</literal> is specified). If + the <literal>BUFFERS</literal> option is also specified, then any + buffer accesses involved in the conversions are counted too. + In no case, however, will <command>EXPLAIN</command> actually send + the resulting data to the client; hence network transmission costs + cannot be investigated this way. + Serialization may only be enabled when <literal>ANALYZE</literal> is + also enabled. If <literal>SERIALIZE</literal> is written without an + argument, <literal>TEXT</literal> is assumed. + </para> + </listitem> + </varlistentry> + + <varlistentry> <term><literal>WAL</literal></term> <listitem> <para> |