diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2022-01-16 13:29:02 -0500 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2022-01-16 13:29:02 -0500 |
commit | ed48e3582e8447cb9969c366e46c82a56b7608cd (patch) | |
tree | 8fda7d1a039095c3d93691d44efeba1f2c15a34e /src/test/perl/PostgreSQL/Test | |
parent | 269b532aef55a579ae02a3e8e8df14101570dfd9 (diff) | |
download | postgresql-ed48e3582e8447cb9969c366e46c82a56b7608cd.tar.gz postgresql-ed48e3582e8447cb9969c366e46c82a56b7608cd.zip |
Clean up TAP tests' usage of wait_for_catchup().
By default, wait_for_catchup() waits for the replication connection
to reach the primary's write LSN. That's fine, but in an apparent
attempt to save one query round-trip, it was coded so that we
executed pg_current_wal_lsn() again during each probe query.
Thus, we presented the standby with a moving target to be reached.
(While the test script itself couldn't be causing the write LSN
to advance while it's blocked in wait_for_catchup(), it's plenty
plausible that background activity such as autovacuum is emitting
more WAL.) That could make the test take longer than necessary,
and potentially it could mask bugs by allowing the standby to process
more WAL than a strict interpretation of the test scenario allows.
So, change wait_for_catchup() to do it "by the book", explicitly
collecting the write LSN to wait for at the outset.
Also, various call sites were instructing wait_for_catchup() to
wait for the standby to reach the primary's insert LSN rather than
its write LSN. This also seems like a bad idea. While in most
test scenarios those are the same, if they are different then the
inserted-but-not-yet-written WAL is not presently available to the
standby. The test isn't doing anything to make it become so, so
again we have the potential for unwanted test delay, perhaps even
a test timeout. (Again, background activity would be needed to
make this more than a hypothetical problem.) Hence, change the
callers where necessary so that the wait target is always the
primary's write LSN.
While at it, simplify callers by making use of wait_for_catchup's
default arguments wherever possible (the preceding change makes
this possible in more places than it was before). And rewrite
wait_for_catchup's documentation a bit.
Patch by me; thanks to Julien Rouhaud for review.
Discussion: https://postgr.es/m/2368336.1641843098@sss.pgh.pa.us
Diffstat (limited to 'src/test/perl/PostgreSQL/Test')
-rw-r--r-- | src/test/perl/PostgreSQL/Test/Cluster.pm | 38 |
1 files changed, 19 insertions, 19 deletions
diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm b/src/test/perl/PostgreSQL/Test/Cluster.pm index e18f27276cd..7af0f8db139 100644 --- a/src/test/perl/PostgreSQL/Test/Cluster.pm +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm @@ -2496,22 +2496,27 @@ sub lsn =item $node->wait_for_catchup(standby_name, mode, target_lsn) -Wait for the node with application_name standby_name (usually from node->name, -also works for logical subscriptions) -until its replication location in pg_stat_replication equals or passes the -upstream's WAL insert point at the time this function is called. By default -the replay_lsn is waited for, but 'mode' may be specified to wait for any of -sent|write|flush|replay. The connection catching up must be in a streaming -state. +Wait for the replication connection with application_name standby_name until +its 'mode' replication column in pg_stat_replication equals or passes the +specified or default target_lsn. By default the replay_lsn is waited for, +but 'mode' may be specified to wait for any of sent|write|flush|replay. +The replication connection must be in a streaming state. + +When doing physical replication, the standby is usually identified by +passing its PostgreSQL::Test::Cluster instance. When doing logical +replication, standby_name identifies a subscription. + +The default value of target_lsn is $node->lsn('write'), which ensures +that the standby has caught up to what has been committed on the primary. +If you pass an explicit value of target_lsn, it should almost always be +the primary's write LSN; so this parameter is seldom needed except when +querying some intermediate replication node rather than the primary. If there is no active replication connection from this peer, waits until poll_query_until timeout. Requires that the 'postgres' db exists and is accessible. -target_lsn may be any arbitrary lsn, but is typically $primary_node->lsn('insert'). -If omitted, pg_current_wal_lsn() is used. - This is not a test. It die()s on failure. =cut @@ -2531,23 +2536,18 @@ sub wait_for_catchup { $standby_name = $standby_name->name; } - my $lsn_expr; - if (defined($target_lsn)) - { - $lsn_expr = "'$target_lsn'"; - } - else + if (!defined($target_lsn)) { - $lsn_expr = 'pg_current_wal_lsn()'; + $target_lsn = $self->lsn('write'); } print "Waiting for replication conn " . $standby_name . "'s " . $mode . "_lsn to pass " - . $lsn_expr . " on " + . $target_lsn . " on " . $self->name . "\n"; my $query = - qq[SELECT $lsn_expr <= ${mode}_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';]; + qq[SELECT '$target_lsn' <= ${mode}_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';]; $self->poll_query_until('postgres', $query) or croak "timed out waiting for catchup"; print "done\n"; |