diff options
author | Thomas Munro <tmunro@postgresql.org> | 2022-04-07 19:28:40 +1200 |
---|---|---|
committer | Thomas Munro <tmunro@postgresql.org> | 2022-04-07 19:42:14 +1200 |
commit | 5dc0418fab281d017a61a5756240467af982bdfd (patch) | |
tree | cdcfda92621a9a7cd458999ede8f3974ef2a0bc1 /doc/src | |
parent | 9553b4115f1879f66935f42fff0b798ef91866d0 (diff) | |
download | postgresql-5dc0418fab281d017a61a5756240467af982bdfd.tar.gz postgresql-5dc0418fab281d017a61a5756240467af982bdfd.zip |
Prefetch data referenced by the WAL, take II.
Introduce a new GUC recovery_prefetch. When enabled, look ahead in the
WAL and try to initiate asynchronous reading of referenced data blocks
that are not yet cached in our buffer pool. For now, this is done with
posix_fadvise(), which has several caveats. Since not all OSes have
that system call, "try" is provided so that it can be enabled where
available. Better mechanisms for asynchronous I/O are possible in later
work.
Set to "try" for now for test coverage. Default setting to be finalized
before release.
The GUC wal_decode_buffer_size limits the distance we can look ahead in
bytes of decoded data.
The existing GUC maintenance_io_concurrency is used to limit the number
of concurrent I/Os allowed, based on pessimistic heuristics used to
infer that I/Os have begun and completed. We'll also not look more than
maintenance_io_concurrency * 4 block references ahead.
Reviewed-by: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (earlier version)
Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version)
Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> (earlier version)
Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (earlier version)
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com> (earlier version)
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com> (earlier version)
Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com> (earlier version)
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/config.sgml | 64 | ||||
-rw-r--r-- | doc/src/sgml/monitoring.sgml | 86 | ||||
-rw-r--r-- | doc/src/sgml/wal.sgml | 12 |
3 files changed, 160 insertions, 2 deletions
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 6901e71f9d3..ac533968a0c 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3657,6 +3657,70 @@ include_dir 'conf.d' </variablelist> </sect2> + <sect2 id="runtime-config-wal-recovery"> + + <title>Recovery</title> + + <indexterm> + <primary>configuration</primary> + <secondary>of recovery</secondary> + <tertiary>general settings</tertiary> + </indexterm> + + <para> + This section describes the settings that apply to recovery in general, + affecting crash recovery, streaming replication and archive-based + replication. + </para> + + + <variablelist> + <varlistentry id="guc-recovery-prefetch" xreflabel="recovery_prefetch"> + <term><varname>recovery_prefetch</varname> (<type>enum</type>) + <indexterm> + <primary><varname>recovery_prefetch</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Whether to try to prefetch blocks that are referenced in the WAL that + are not yet in the buffer pool, during recovery. Valid values are + <literal>off</literal> (the default), <literal>on</literal> and + <literal>try</literal>. The setting <literal>try</literal> enables + prefetching only if the operating system provides the + <function>posix_fadvise</function> function, which is currently used + to implement prefetching. Note that some operating systems provide the + function, but it doesn't do anything. + </para> + <para> + Prefetching blocks that will soon be needed can reduce I/O wait times + during recovery with some workloads. + See also the <xref linkend="guc-wal-decode-buffer-size"/> and + <xref linkend="guc-maintenance-io-concurrency"/> settings, which limit + prefetching activity. + </para> + </listitem> + </varlistentry> + + <varlistentry id="guc-wal-decode-buffer-size" xreflabel="wal_decode_buffer_size"> + <term><varname>wal_decode_buffer_size</varname> (<type>integer</type>) + <indexterm> + <primary><varname>wal_decode_buffer_size</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + A limit on how far ahead the server can look in the WAL, to find + blocks to prefetch. If this value is specified without units, it is + taken as bytes. + The default is 512kB. + </para> + </listitem> + </varlistentry> + + </variablelist> + </sect2> + <sect2 id="runtime-config-wal-archive-recovery"> <title>Archive Recovery</title> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 24924647b5f..76766d28dd4 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -329,6 +329,13 @@ postgres 27093 0.0 0.0 30096 2752 ? Ss 11:34 0:00 postgres: ser </row> <row> + <entry><structname>pg_stat_recovery_prefetch</structname><indexterm><primary>pg_stat_recovery_prefetch</primary></indexterm></entry> + <entry>Only one row, showing statistics about blocks prefetched during recovery. + See <xref linkend="pg-stat-recovery-prefetch-view"/> for details. + </entry> + </row> + + <row> <entry><structname>pg_stat_subscription</structname><indexterm><primary>pg_stat_subscription</primary></indexterm></entry> <entry>At least one row per subscription, showing information about the subscription workers. @@ -2979,6 +2986,78 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i copy of the subscribed tables. </para> + <table id="pg-stat-recovery-prefetch-view" xreflabel="pg_stat_recovery_prefetch"> + <title><structname>pg_stat_recovery_prefetch</structname> View</title> + <tgroup cols="3"> + <thead> + <row> + <entry>Column</entry> + <entry>Type</entry> + <entry>Description</entry> + </row> + </thead> + + <tbody> + <row> + <entry><structfield>prefetch</structfield></entry> + <entry><type>bigint</type></entry> + <entry>Number of blocks prefetched because they were not in the buffer pool</entry> + </row> + <row> + <entry><structfield>hit</structfield></entry> + <entry><type>bigint</type></entry> + <entry>Number of blocks not prefetched because they were already in the buffer pool</entry> + </row> + <row> + <entry><structfield>skip_init</structfield></entry> + <entry><type>bigint</type></entry> + <entry>Number of blocks not prefetched because they would be zero-initialized</entry> + </row> + <row> + <entry><structfield>skip_new</structfield></entry> + <entry><type>bigint</type></entry> + <entry>Number of blocks not prefetched because they didn't exist yet</entry> + </row> + <row> + <entry><structfield>skip_fpw</structfield></entry> + <entry><type>bigint</type></entry> + <entry>Number of blocks not prefetched because a full page image was included in the WAL</entry> + </row> + <row> + <entry><structfield>skip_rep</structfield></entry> + <entry><type>bigint</type></entry> + <entry>Number of blocks not prefetched because they were already recently prefetched</entry> + </row> + <row> + <entry><structfield>wal_distance</structfield></entry> + <entry><type>integer</type></entry> + <entry>How many bytes ahead the prefetcher is looking</entry> + </row> + <row> + <entry><structfield>block_distance</structfield></entry> + <entry><type>integer</type></entry> + <entry>How many blocks ahead the prefetcher is looking</entry> + </row> + <row> + <entry><structfield>io_depth</structfield></entry> + <entry><type>integer</type></entry> + <entry>How many prefetches have been initiated but are not yet known to have completed</entry> + </row> + </tbody> + </tgroup> + </table> + + <para> + The <structname>pg_stat_recovery_prefetch</structname> view will contain + only one row. It is filled with nulls if recovery has not run or + <xref linkend="guc-recovery-prefetch"/> is not enabled. The + columns <structfield>wal_distance</structfield>, + <structfield>block_distance</structfield> + and <structfield>io_depth</structfield> show current values, and the + other columns show cumulative counters that can be reset + with the <function>pg_stat_reset_shared</function> function. + </para> + <table id="pg-stat-subscription" xreflabel="pg_stat_subscription"> <title><structname>pg_stat_subscription</structname> View</title> <tgroup cols="1"> @@ -5199,8 +5278,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i all the counters shown in the <structname>pg_stat_bgwriter</structname> view, <literal>archiver</literal> to reset all the counters shown in - the <structname>pg_stat_archiver</structname> view or <literal>wal</literal> - to reset all the counters shown in the <structname>pg_stat_wal</structname> view. + the <structname>pg_stat_archiver</structname> view, + <literal>wal</literal> to reset all the counters shown in the + <structname>pg_stat_wal</structname> view or + <literal>recovery_prefetch</literal> to reset all the counters shown + in the <structname>pg_stat_recovery_prefetch</structname> view. </para> <para> This function is restricted to superusers by default, but other users diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index 2bb27a84682..6b3406b7de6 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -803,6 +803,18 @@ counted as <literal>wal_write</literal> and <literal>wal_sync</literal> in <structname>pg_stat_wal</structname>, respectively. </para> + + <para> + The <xref linkend="guc-recovery-prefetch"/> parameter can be used to reduce + I/O wait times during recovery by instructing the kernel to initiate reads + of disk blocks that will soon be needed but are not currently in + <productname>PostgreSQL</productname>'s buffer pool. + The <xref linkend="guc-maintenance-io-concurrency"/> and + <xref linkend="guc-wal-decode-buffer-size"/> settings limit prefetching + concurrency and distance, respectively. By default, it is set to + <literal>try</literal>, which enabled the feature on systems where + <function>posix_fadvise</function> is available. + </para> </sect1> <sect1 id="wal-internals"> |