aboutsummaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
authorAmit Kapila <akapila@postgresql.org>2021-02-12 07:41:51 +0530
committerAmit Kapila <akapila@postgresql.org>2021-02-12 07:41:51 +0530
commitce0fdbfe9722867b7fad4d3ede9b6a6bfc51fb4e (patch)
treebe540b24d4cc30cbbd52e92ac164239b6773a699 /doc/src
parent3063eb17593c3ad498ce4e89db3358862ea2dbb6 (diff)
downloadpostgresql-ce0fdbfe9722867b7fad4d3ede9b6a6bfc51fb4e.tar.gz
postgresql-ce0fdbfe9722867b7fad4d3ede9b6a6bfc51fb4e.zip
Allow multiple xacts during table sync in logical replication.
For the initial table data synchronization in logical replication, we use a single transaction to copy the entire table and then synchronize the position in the stream with the main apply worker. There are multiple downsides of this approach: (a) We have to perform the entire copy operation again if there is any error (network breakdown, error in the database operation, etc.) while we synchronize the WAL position between tablesync worker and apply worker; this will be onerous especially for large copies, (b) Using a single transaction in the synchronization-phase (where we can receive WAL from multiple transactions) will have the risk of exceeding the CID limit, (c) The slot will hold the WAL till the entire sync is complete because we never commit till the end. This patch solves all the above downsides by allowing multiple transactions during the tablesync phase. The initial copy is done in a single transaction and after that, we commit each transaction as we receive. To allow recovery after any error or crash, we use a permanent slot and origin to track the progress. The slot and origin will be removed once we finish the synchronization of the table. We also remove slot and origin of tablesync workers if the user performs DROP SUBSCRIPTION .. or ALTER SUBSCRIPTION .. REFERESH and some of the table syncs are still not finished. The commands ALTER SUBSCRIPTION ... REFRESH PUBLICATION and ALTER SUBSCRIPTION ... SET PUBLICATION ... with refresh option as true cannot be executed inside a transaction block because they can now drop the slots for which we have no provision to rollback. This will also open up the path for logical replication of 2PC transactions on the subscriber side. Previously, we can't do that because of the requirement of maintaining a single transaction in tablesync workers. Bump catalog version due to change of state in the catalog (pg_subscription_rel). Author: Peter Smith, Amit Kapila, and Takamichi Osumi Reviewed-by: Ajin Cherian, Petr Jelinek, Hou Zhijie and Amit Kapila Discussion: https://postgr.es/m/CAA4eK1KHJxaZS-fod-0fey=0tq3=Gkn4ho=8N4-5HWiCfu0H1A@mail.gmail.com
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/catalogs.sgml1
-rw-r--r--doc/src/sgml/logical-replication.sgml59
-rw-r--r--doc/src/sgml/ref/alter_subscription.sgml18
-rw-r--r--doc/src/sgml/ref/drop_subscription.sgml6
4 files changed, 60 insertions, 24 deletions
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index ea222c04640..692ad65de2d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -7673,6 +7673,7 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
State code:
<literal>i</literal> = initialize,
<literal>d</literal> = data is being copied,
+ <literal>f</literal> = finished table copy,
<literal>s</literal> = synchronized,
<literal>r</literal> = ready (normal replication)
</para></entry>
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index a560ad69b44..d0742f2c526 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -186,9 +186,10 @@
<para>
Each subscription will receive changes via one replication slot (see
- <xref linkend="streaming-replication-slots"/>). Additional temporary
- replication slots may be required for the initial data synchronization
- of pre-existing table data.
+ <xref linkend="streaming-replication-slots"/>). Additional replication
+ slots may be required for the initial data synchronization of
+ pre-existing table data and those will be dropped at the end of data
+ synchronization.
</para>
<para>
@@ -248,13 +249,23 @@
<para>
As mentioned earlier, each (active) subscription receives changes from a
- replication slot on the remote (publishing) side. Normally, the remote
- replication slot is created automatically when the subscription is created
- using <command>CREATE SUBSCRIPTION</command> and it is dropped
- automatically when the subscription is dropped using <command>DROP
- SUBSCRIPTION</command>. In some situations, however, it can be useful or
- necessary to manipulate the subscription and the underlying replication
- slot separately. Here are some scenarios:
+ replication slot on the remote (publishing) side.
+ </para>
+ <para>
+ Additional table synchronization slots are normally transient, created
+ internally to perform initial table synchronization and dropped
+ automatically when they are no longer needed. These table synchronization
+ slots have generated names: <quote><literal>pg_%u_sync_%u_%llu</literal></quote>
+ (parameters: Subscription <parameter>oid</parameter>,
+ Table <parameter>relid</parameter>, system identifier <parameter>sysid</parameter>)
+ </para>
+ <para>
+ Normally, the remote replication slot is created automatically when the
+ subscription is created using <command>CREATE SUBSCRIPTION</command> and it
+ is dropped automatically when the subscription is dropped using
+ <command>DROP SUBSCRIPTION</command>. In some situations, however, it can
+ be useful or necessary to manipulate the subscription and the underlying
+ replication slot separately. Here are some scenarios:
<itemizedlist>
<listitem>
@@ -294,8 +305,9 @@
using <command>ALTER SUBSCRIPTION</command> before attempting to drop
the subscription. If the remote database instance no longer exists, no
further action is then necessary. If, however, the remote database
- instance is just unreachable, the replication slot should then be
- dropped manually; otherwise it would continue to reserve WAL and might
+ instance is just unreachable, the replication slot (and any still
+ remaining table synchronization slots) should then be
+ dropped manually; otherwise it/they would continue to reserve WAL and might
eventually cause the disk to fill up. Such cases should be carefully
investigated.
</para>
@@ -468,16 +480,19 @@
<sect2 id="logical-replication-snapshot">
<title>Initial Snapshot</title>
<para>
- The initial data in existing subscribed tables are snapshotted and
- copied in a parallel instance of a special kind of apply process.
- This process will create its own temporary replication slot and
- copy the existing data. Once existing data is copied, the worker
- enters synchronization mode, which ensures that the table is brought
- up to a synchronized state with the main apply process by streaming
- any changes that happened during the initial data copy using standard
- logical replication. Once the synchronization is done, the control
- of the replication of the table is given back to the main apply
- process where the replication continues as normal.
+ The initial data in existing subscribed tables are snapshotted and
+ copied in a parallel instance of a special kind of apply process.
+ This process will create its own replication slot and copy the existing
+ data. As soon as the copy is finished the table contents will become
+ visible to other backends. Once existing data is copied, the worker
+ enters synchronization mode, which ensures that the table is brought
+ up to a synchronized state with the main apply process by streaming
+ any changes that happened during the initial data copy using standard
+ logical replication. During this synchronization phase, the changes
+ are applied and committed in the same order as they happened on the
+ publisher. Once the synchronization is done, the control of the
+ replication of the table is given back to the main apply process where
+ the replication continues as normal.
</para>
</sect2>
</sect1>
diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index db5e59f707c..bcb0acf28d8 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -48,6 +48,24 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
(Currently, all subscription owners must be superusers, so the owner checks
will be bypassed in practice. But this might change in the future.)
</para>
+
+ <para>
+ When refreshing a publication we remove the relations that are no longer
+ part of the publication and we also remove the tablesync slots if there are
+ any. It is necessary to remove tablesync slots so that the resources
+ allocated for the subscription on the remote host are released. If due to
+ network breakdown or some other error, <productname>PostgreSQL</productname>
+ is unable to remove the slots, an ERROR will be reported. To proceed in this
+ situation, either the user need to retry the operation or disassociate the
+ slot from the subscription and drop the subscription as explained in
+ <xref linkend="sql-dropsubscription"/>.
+ </para>
+
+ <para>
+ Commands <command>ALTER SUBSCRIPTION ... REFRESH PUBLICATION</command> and
+ <command>ALTER SUBSCRIPTION ... SET PUBLICATION ...</command> with refresh
+ option as true cannot be executed inside a transaction block.
+ </para>
</refsect1>
<refsect1>
diff --git a/doc/src/sgml/ref/drop_subscription.sgml b/doc/src/sgml/ref/drop_subscription.sgml
index adbdeafb4e1..aee96155463 100644
--- a/doc/src/sgml/ref/drop_subscription.sgml
+++ b/doc/src/sgml/ref/drop_subscription.sgml
@@ -79,7 +79,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
<para>
When dropping a subscription that is associated with a replication slot on
the remote host (the normal state), <command>DROP SUBSCRIPTION</command>
- will connect to the remote host and try to drop the replication slot as
+ will connect to the remote host and try to drop the replication slot (and
+ any remaining table synchronization slots) as
part of its operation. This is necessary so that the resources allocated
for the subscription on the remote host are released. If this fails,
either because the remote host is not reachable or because the remote
@@ -89,7 +90,8 @@ DROP SUBSCRIPTION [ IF EXISTS ] <replaceable class="parameter">name</replaceable
executing <literal>ALTER SUBSCRIPTION ... SET (slot_name = NONE)</literal>.
After that, <command>DROP SUBSCRIPTION</command> will no longer attempt any
actions on a remote host. Note that if the remote replication slot still
- exists, it should then be dropped manually; otherwise it will continue to
+ exists, it (and any related table synchronization slots) should then be
+ dropped manually; otherwise it/they will continue to
reserve WAL and might eventually cause the disk to fill up. See
also <xref linkend="logical-replication-subscription-slot"/>.
</para>