diff options
author | Alexander Korotkov <akorotkov@postgresql.org> | 2023-12-25 00:52:42 +0200 |
---|---|---|
committer | Alexander Korotkov <akorotkov@postgresql.org> | 2023-12-25 01:12:36 +0200 |
commit | 12915a58eec962f407a6c38ce2bf08a48dde57b5 (patch) | |
tree | 157866d53bed7766c4bb41ee7be37b0cc4579fb7 /doc/src | |
parent | 64e77b496af61ee31189ba69b40e785e11e9967f (diff) | |
download | postgresql-12915a58eec962f407a6c38ce2bf08a48dde57b5.tar.gz postgresql-12915a58eec962f407a6c38ce2bf08a48dde57b5.zip |
Enhance checkpointer restartpoint statistics
Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding
three new columns: restartpoints_timed, restartpoints_req, and
restartpoints_done. These additions aim to improve the visibility and
monitoring of restartpoint processes on replicas.
Previously, it was challenging to differentiate between successful and failed
restartpoint requests. This limitation arises because restartpoints on replicas
are dependent on checkpoint records from the primary, and cannot occur more
frequently than these checkpoints.
The new columns allow for clear distinction and tracking of restartpoint
requests, their triggers, and successful completions. This enhancement aids
database administrators and developers in better understanding and diagnosing
issues related to restartpoint behavior, particularly in scenarios where
restartpoint requests may fail.
System catalog is changed. Catversion is bumped.
Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru
Author: Anton A. Melnikov
Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/monitoring.sgml | 27 | ||||
-rw-r--r-- | doc/src/sgml/wal.sgml | 39 |
2 files changed, 60 insertions, 6 deletions
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 4f8058d8b1b..b804eb8b5ef 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -2984,6 +2984,33 @@ description | Waiting for a newly initialized WAL file to reach durable storage <row> <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>restartpoints_timed</structfield> <type>bigint</type> + </para> + <para> + Number of scheduled restartpoints due to timeout or after a failed attempt to perform it + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>restartpoints_req</structfield> <type>bigint</type> + </para> + <para> + Number of requested restartpoints + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>restartpoints_done</structfield> <type>bigint</type> + </para> + <para> + Number of restartpoints that have been performed + </para></entry> + </row> + + <row> + <entry role="catalog_table_entry"><para role="column_definition"> <structfield>write_time</structfield> <type>double precision</type> </para> <para> diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index 2ed4eb659db..05e2a8f8be9 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -655,14 +655,41 @@ directory. Restartpoints can't be performed more frequently than checkpoints on the primary because restartpoints can only be performed at checkpoint records. - A restartpoint is triggered when a checkpoint record is reached if at - least <varname>checkpoint_timeout</varname> seconds have passed since the last - restartpoint, or if WAL size is about to exceed - <varname>max_wal_size</varname>. However, because of limitations on when a - restartpoint can be performed, <varname>max_wal_size</varname> is often exceeded - during recovery, by up to one checkpoint cycle's worth of WAL. + A restartpoint can be demanded by a schedule or by an external request. + The <structfield>restartpoints_timed</structfield> counter in the + <link linkend="monitoring-pg-stat-checkpointer-view"><structname>pg_stat_checkpointer</structname></link> + view counts the first ones while the <structfield>restartpoints_req</structfield> + the second. + A restartpoint is triggered by schedule when a checkpoint record is reached + if at least <xref linkend="guc-checkpoint-timeout"/> seconds have passed since + the last performed restartpoint or when the previous attempt to perform + the restartpoint has failed. In the last case, the next restartpoint + will be scheduled in 15 seconds. + A restartpoint is triggered by request due to similar reasons like checkpoint + but mostly if WAL size is about to exceed <xref linkend="guc-max-wal-size"/> + However, because of limitations on when a restartpoint can be performed, + <varname>max_wal_size</varname> is often exceeded during recovery, + by up to one checkpoint cycle's worth of WAL. (<varname>max_wal_size</varname> is never a hard limit anyway, so you should always leave plenty of headroom to avoid running out of disk space.) + The <structfield>restartpoints_done</structfield> counter in the + <link linkend="monitoring-pg-stat-checkpointer-view"><structname>pg_stat_checkpointer</structname></link> + view counts the restartpoints that have really been performed. + </para> + + <para> + In some cases, when the WAL size on the primary increases quickly, + for instance during massive INSERT, + the <structfield>restartpoints_req</structfield> counter on the standby + may demonstrate a peak growth. + This occurs because requests to create a new restartpoint due to increased + XLOG consumption cannot be performed because the safe checkpoint record + since the last restartpoint has not yet been replayed on the standby. + This behavior is normal and does not lead to an increase in system resource + consumption. + Only the <structfield>restartpoints_done</structfield> + counter among the restartpoint-related ones indicates that noticeable system + resources have been spent. </para> <para> |