| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit allows logical replication to publish and replicate generated
columns when explicitly listed in the column list. We also ensured that
the generated columns were copied during the initial tablesync when they
were published.
We will allow to replicate generated columns even when they are not
specified in the column list (via a new publication option) in a separate
commit.
The motivation of this work is to allow replication for cases where the
client doesn't have generated columns. For example, the case where one is
trying to replicate data from Postgres to the non-Postgres database.
Author: Shubham Khanna, Vignesh C, Hou Zhijie
Reviewed-by: Peter Smith, Hayato Kuroda, Shlok Kyal, Amit Kapila
Discussion: https://postgr.es/m/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E@gmail.com
|
|
|
|
|
|
|
|
| |
as determined by IWYU
These are mostly issues that are new since commit dbbca2cf299.
Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The inplace update survives ROLLBACK. The inval didn't, so another
backend's DDL could then update the row without incorporating the
inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER
TABLE resulted in a table with an index, yet relhasindex=f. That is a
source of index corruption. Back-patch to v12 (all supported versions).
The back branch versions don't change WAL, because those branches just
added end-of-recovery SIResetAll(). All branches change the ABI of
extern function PrepareToInvalidateCacheTuple(). No PGXN extension
calls that, and there's no apparent use case in extensions.
Reviewed by Nitin Motiani and (in earlier versions) Andres Freund.
Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existing get_publications_str() is renamed to GetPublicationsStr()
and is moved to pg_subscription.c, so as it is possible to reuse it at
two locations of the tablesync code where the same logic was duplicated.
fetch_remote_table_info() was doing two List->StringInfo conversions
when dealing with a server of version 15 or newer. The conversion
happens only once now.
This refactoring leads to less code overall.
Author: Peter Smith
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/CAHut+PtJMk4bKXqtpvqVy9ckknCgK9P6=FeG8zHF=6+Em_Snpw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit a4ccc1cef introduced the Generation Context and modified the
logical decoding process to use a Generation Context with a fixed
block size of 8MB for storing tuple data decoded during logical
decoding (i.e., rb->tup_context). Several reports have indicated that
the logical decoding process can be terminated due to
out-of-memory (OOM) situations caused by excessive memory usage in
rb->tup_context.
This issue can occur when decoding a workload involving several
concurrent transactions, including a long-running transaction that
modifies tuples. By design, the Generation Context does not free a
memory block until all chunks within that block are
released. Consequently, if tuples modified by the long-running
transaction are stored across multiple memory blocks, these blocks
remain allocated until the long-running transaction completes, leading
to substantial memory fragmentation. The memory usage during logical
decoding, tracked by rb->size, does not account for memory
fragmentation, resulting in potentially much higher memory consumption
than the value of the logical_decoding_work_mem parameter.
Various improvement strategies were discussed in the relevant
thread. This change reduces the block size of the Generation Context
used in rb->tup_context from 8MB to 8kB. This modification
significantly decreases the likelihood of substantial memory
fragmentation occurring and is relatively straightforward to
backport. Performance testing across multiple platforms has confirmed
that this change will not introduce any performance degradation that
would impact actual operation.
Backport to all supported branches.
Reported-by: Alex Richman, Michael Guissine, Avi Weinberg
Reviewed-by: Amit Kapila, Fujii Masao, David Rowley
Tested-by: Hayato Kuroda, Shlok Kyal
Discussion: https://postgr.es/m/CAD21AoBTY1LATZUmvSXEssvq07qDZufV4AF-OHh9VD2pC0VY2A%40mail.gmail.com
Backpatch-through: 12
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This module provides SQL functions that allow to inspect logical
decoding components.
It currently allows to inspect the contents of serialized logical
snapshots of a running database cluster, which is useful for debugging
or educational purposes.
Author: Bertrand Drouvot
Reviewed-by: Amit Kapila, Shveta Malik, Peter Smith, Peter Eisentraut
Reviewed-by: David G. Johnston
Discussion: https://postgr.es/m/ZscuZ92uGh3wm4tW%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit moves the definitions of the SnapBuild and SnapBuildOnDisk
structs, related to logical snapshots, to the snapshot_internal.h
file. This change allows external tools, such as
pg_logicalinspect (with an upcoming patch), to access and utilize the
contents of logical snapshots.
Author: Bertrand Drouvot
Reviewed-by: Amit Kapila, Shveta Malik, Peter Smith
Discussion: https://postgr.es/m/ZscuZ92uGh3wm4tW%40ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The first part of the assertion verifying that the passed index must be PK
or RI was incorrectly passing index relation instead of heap relation in
GetRelationIdentityOrPK(). The assertion was not failing because the
second part of the assertion which needs to be performed only when remote
relation has REPLICA_IDENTITY_FULL set was also incorrect.
The change is not backpatched because the current coding doesn't lead to
any failure.
Reported-by: Dilip Kumar
Author: Amit Kapila
Reviewed-by: Vignesh C
Discussion: https://postgr.es/m/CAFiTN-tmguaT1DXbCC+ZomZg-oZLmU6BPhr0po7akQSG6vNJrg@mail.gmail.com
|
|
|
|
|
|
|
|
|
| |
This is a continuation of 17974ec25946. More quotes are applied to
GUC names in error messages and hints, taking care of what seems to be
all the remaining holes currently in the tree for the GUCs.
Author: Peter Smith
Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds columns in view pg_stat_subscription_stats to show the
number of times a particular conflict type has occurred during the
application of logical replication changes. The following columns are
added:
confl_insert_exists:
Number of times a row insertion violated a NOT DEFERRABLE unique
constraint.
confl_update_origin_differs:
Number of times an update was performed on a row that was
previously modified by another origin.
confl_update_exists:
Number of times that the updated value of a row violates a
NOT DEFERRABLE unique constraint.
confl_update_missing:
Number of times that the tuple to be updated is missing.
confl_delete_origin_differs:
Number of times a delete was performed on a row that was
previously modified by another origin.
confl_delete_missing:
Number of times that the tuple to be deleted is missing.
The update_origin_differs and delete_origin_differs conflicts can be
detected only when track_commit_timestamp is enabled.
Author: Hou Zhijie
Reviewed-by: Shveta Malik, Peter Smith, Anit Kapila
Discussion: https://postgr.es/m/OS0PR01MB57160A07BD575773045FC214948F2@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
|
|
|
|
|
|
| |
Add const qualifiers to XLogRegisterData() and XLogRegisterBufData().
Several unconstify() calls can be removed.
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/dd889784-9ce7-436a-b4f1-52e4a5e577bd@eisentraut.org
|
|
|
|
|
| |
Author: Alexander Lakhin
Discussion: https://postgr.es/m/f7e514cf-2446-21f1-a5d2-8c089a6e2168@gmail.com
|
|
|
|
|
|
|
|
|
|
| |
This is similar to 2065ddf5e34c, but this time for pg_logical/ itself
and its contents, like the paths for snapshots, mappings or origin
checkpoints.
Author: Bertrand Drouvot
Reviewed-by: Ashutosh Bapat, Yugo Nagata, Michael Paquier
Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal
|
|
|
|
|
|
|
|
|
|
|
| |
This commit replaces most of the hardcoded values of "pg_replslot" by a
new PG_REPLSLOT_DIR #define. This makes the style more consistent with
the existing PG_STAT_TMP_DIR, for example. More places will follow a
similar change.
Author: Bertrand Drouvot
Reviewed-by: Ashutosh Bapat, Yugo Nagata, Michael Paquier
Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The conflict types 'update_differ' and 'delete_differ' indicate that a row
to be modified was previously altered by another origin. Rename those to
'update_origin_differs' and 'delete_origin_differs' to clarify their
meaning.
Author: Hou Zhijie
Reviewed-by: Shveta Malik, Peter Smith
Discussion: https://postgr.es/m/CAA4eK1+HEKwG_UYt4Zvwh5o_HoCKCjEGesRjJX38xAH3OxuuYA@mail.gmail.com
|
|
|
|
| |
They did not immediately precede the code they were applying to.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 5bec1d6bc5e changed the memory usage updates of the
ReorderBufferTXN to zero all at once by subtracting txn->size, rather
than updating it for each change. However, if TOAST reconstruction
data remained in the transaction when freeing it, there were cases
where it further subtracted the memory counter from zero, resulting in
an assertion failure.
This change calculates the memory size for each change and updates the
memory usage to precisely the amount that has been freed.
Backpatch to v17, where this was introducd.
Reviewed-by: Amit Kapila, Shlok Kyal
Discussion: https://postgr.es/m/CAD21AoAqkNUvicgKPT_dXzNoOwpPkVTg0QPPxEcWmzT0moCJ1g%40mail.gmail.com
Backpatch-through: 17
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We advance origin progress during abort on successful streaming and
application of ROLLBACK in parallel streaming mode. But the origin
shouldn't be advanced during an error or unsuccessful apply due to
shutdown. Otherwise, it will result in a transaction loss as such a
transaction won't be sent again by the server.
Reported-by: Hou Zhijie
Author: Hayato Kuroda and Shveta Malik
Reviewed-by: Amit Kapila
Backpatch-through: 16
Discussion: https://postgr.es/m/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch provides the additional logging information in the following
conflict scenarios while applying changes:
insert_exists: Inserting a row that violates a NOT DEFERRABLE unique constraint.
update_differ: Updating a row that was previously modified by another origin.
update_exists: The updated row value violates a NOT DEFERRABLE unique constraint.
update_missing: The tuple to be updated is missing.
delete_differ: Deleting a row that was previously modified by another origin.
delete_missing: The tuple to be deleted is missing.
For insert_exists and update_exists conflicts, the log can include the origin
and commit timestamp details of the conflicting key with track_commit_timestamp
enabled.
update_differ and delete_differ conflicts can only be detected when
track_commit_timestamp is enabled on the subscriber.
We do not offer additional logging for exclusion constraint violations because
these constraints can specify rules that are more complex than simple equality
checks. Resolving such conflicts won't be straightforward. This area can be
further enhanced if required.
Author: Hou Zhijie
Reviewed-by: Shveta Malik, Amit Kapila, Nisha Moond, Hayato Kuroda, Dilip Kumar
Discussion: https://postgr.es/m/OS0PR01MB5716352552DFADB8E9AD1D8994C92@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The apply worker was using XactLastCommitEnd as local end_lsn for applying
prepare and rollback_prepare. The XactLastCommitEnd value is the end lsn
of the last commit applied before the prepare transaction which makes no
sense. This LSN is used to decide whether we can send the acknowledgment
of the corresponding remote LSN to the server.
It is okay not to set the local_end LSN with the actual WAL position for
the prepare because we always flush the prepare record. So, we can send
the acknowledgment of the remote_end LSN as soon as prepare is finished.
The current code is misleading but as such doesn't create any problem, so
decided not to backpatch.
Author: Hayato Kuroda
Reviewed-by: Shveta Malik, Amit Kapila
Discussion: https://postgr.es/m/TYAPR01MB5692FA4926754B91E9D7B5F0F5AA2@TYAPR01MB5692.jpnprd01.prod.outlook.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
replication.
When performing a cross-partition update in the apply worker, it
needlessly scans the old partition twice, resulting in noticeable
overhead.
This commit optimizes it by removing the redundant table scan.
Author: Hou Zhijie
Reviewed-by: Hayato Kuroda, Amit Kapila
Discussion: https://postgr.es/m/OS0PR01MB571623E39984D94CBB5341D994AB2@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The two_phase option is controlled by both the publisher (as a slot
option) and the subscriber (as a subscription option), so the slot option
must also be modified.
Changing the 'two_phase' option for a subscription from 'true' to 'false'
is permitted only when there are no pending prepared transactions
corresponding to that subscription. Otherwise, the changes of already
prepared transactions can be replicated again along with their corresponding
commit leading to duplicate data or errors.
To avoid data loss, the 'two_phase' option for a subscription can only be
changed from 'false' to 'true' once the initial data synchronization is
completed. Therefore this is performed later by the logical replication worker.
Author: Hayato Kuroda, Ajin Cherian, Amit Kapila
Reviewed-by: Peter Smith, Hou Zhijie, Amit Kapila, Vitaly Davydov, Vignesh C
Discussion: https://postgr.es/m/8fab8-65d74c80-1-2f28e880@39088166
|
|
|
|
|
| |
Author: ChangAo Chen
Discussion: https://postgr.es/m/tencent_CDECB843B30A8B6B5152FA6458F0F00FDE09@qq.com
|
|
|
|
|
|
|
|
|
|
| |
These messages mostly said "could not connect to the publisher: %s"
which is lacking context. Add some verbiage to indicate which
subscription or worker process is failing.
Nisha Moond
Discussion: https://postgr.es/m/CABdArM7q1=zqL++cYd0hVMg3u_tc0S=0Of=Um-KvDhLony0cSg@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When creating and initializing a logical slot, the restart_lsn is set
to the latest WAL insertion point (or the latest replay point on
standbys). Subsequently, WAL records are decoded from that point to
find the start point for extracting changes in the
DecodingContextFindStartpoint() function. Since the initial
restart_lsn could be in the middle of a transaction, the start point
must be a consistent point where we won't see the data for partial
transactions.
Previously, when not building a full snapshot, serialized snapshots
were restored, and the SnapBuild jumps to the consistent state even
while finding the start point. Consequently, the slot's restart_lsn
and confirmed_flush could be set to the middle of a transaction. This
could lead to various unexpected consequences. Specifically, there
were reports of logical decoding decoding partial transactions, and
assertion failures occurred because only subtransactions were decoded
without decoding their top-level transaction until decoding the commit
record.
To resolve this issue, the changes prevent restoring the serialized
snapshot and jumping to the consistent state while finding the start
point.
On v17 and HEAD, a flag indicating whether snapshot restores should be
skipped has been added to the SnapBuild struct, and SNAPBUILD_VERSION
has been bumpded.
On backbranches, the flag is stored in the LogicalDecodingContext
instead, preserving on-disk compatibility.
Backpatch to all supported versions.
Reported-by: Drew Callahan
Reviewed-by: Amit Kapila, Hayato Kuroda
Discussion: https://postgr.es/m/2444AA15-D21B-4CCE-8052-52C7C2DAFE5C%40amazon.com
Backpatch-through: 12
|
|
|
|
|
|
|
|
|
|
|
| |
After several refactoring iterations, auxiliary processes are no
longer initialized from the bootstrapper. Using the InitProcessing
mode for initializing auxiliary processes is more appropriate. Since
the global variable Mode is initialized to InitProcessing, we can just
remove the redundant calls of SetProcessingMode(InitProcessing).
Author: Xing Guo <higuoxing@gmail.com>
Discussion: https://www.postgresql.org/message-id/CACpMh%2BDBHVT4xPGimzvex%3DwMdMLQEu9PYhT%2BkwwD2x2nu9dU_Q%40mail.gmail.com
|
|
|
|
|
|
|
|
| |
These probably should have been static all along, it was only
forgotten out of sloppiness.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org
|
|
|
|
|
|
|
|
|
|
|
|
| |
The standby_slot_names GUC allows the specification of physical standby
slots that must be synchronized before the logical walsenders associated
with logical failover slots. However, for this purpose, the GUC name is
too generic.
Author: Hou Zhijie
Reviewed-by: Bertrand Drouvot, Masahiko Sawada
Backpatch-through: 17
Discussion: https://postgr.es/m/ZnWeUgdHong93fQN@momjian.us
|
|
|
|
|
|
|
|
| |
This makes the code more consistent with the surroundings.
Author: ChangAo Chen
Reviewed-by: Ashutosh Bapat
Discussion: https://postgr.es/m/CAExHW5tNTevUh58SKddTtcX3yU_5_PDSC8Mdp-Q2hc9PpZHRJg@mail.gmail.com
|
|
|
|
|
|
| |
Fixups for 17974ec259: Some messages were missed (and some were new
since the patch was originally proposed), and there was a typo
introduced.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After further review, we want to move in the direction of always
quoting GUC names in error messages, rather than the previous (PG16)
wildly mixed practice or the intermittent (mid-PG17) idea of doing
this depending on how possibly confusing the GUC name is.
This commit applies appropriate quotes to (almost?) all mentions of
GUC names in error messages. It partially supersedes a243569bf65 and
8d9978a7176, which had moved things a bit in the opposite direction
but which then were abandoned in a partial state.
Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
|
| |
|
|
|
|
|
| |
Author: Alexander Lakhin
Discussion: https://postgr.es/m/ae9f2fcb-4b24-5bb0-4240-efbbbd944ca1@gmail.com
|
|
|
|
|
|
|
| |
Also, fix a comment incorrectly referencing the "streaming read API".
This was renamed to "read stream" shortly before being committed.
Discussion: https://postgr.es/m/CAApHDvq-2Zdqytm_Hf3RmVf0qg5PS9jTFAJ5QTc9xH9pwvwDTA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow pg_sync_replication_slots() to error out during promotion of standby.
This makes the behavior of the SQL function consistent with the slot sync
worker. We also ensured that pg_sync_replication_slots() cannot be
executed if sync_replication_slots is enabled and the slotsync worker is
already running to perform the synchronization of slots. Previously, it
would have succeeded in cases when the worker is idle and failed when it
is performing sync which could confuse users.
This patch fixes another issue in the slot sync worker where
SignalHandlerForShutdownRequest() needs to be registered *before* setting
SlotSyncCtx->pid, otherwise, the slotsync worker could miss handling
SIGINT sent by the startup process(ShutDownSlotSync) if it is sent before
worker could register SignalHandlerForShutdownRequest(). To be consistent,
all signal handlers' registration is moved to a prior location before we
set the worker's pid.
Ensure that we clean up synced temp slots at the end of
pg_sync_replication_slots() to avoid such slots being left over after
promotion.
Ensure that ShutDownSlotSync() captures SlotSyncCtx->pid under spinlock to
avoid accessing invalid value as it can be reset by concurrent slot sync
exit due to an error.
Author: Shveta Malik
Reviewed-by: Hou Zhijie, Bertrand Drouvot, Amit Kapila, Masahiko Sawada
Discussion: https://postgr.es/m/CAJpy0uBefXUS_TSz=oxmYKHdg-fhxUT0qfjASW3nmqnzVC3p6A@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We missed performing table sync if the invalidation happened while the
non-ready tables list was being prepared. This occurs because the sync
state was set to valid at the end of non-ready table list preparation
irrespective of the invalidations processed while the list is being
prepared.
Fix it by changing the boolean variable to a tri-state enum and by setting
table state to valid only if no invalidations have occurred while the list
is being prepared.
Reprted-by: Alexander Lakhin
Diagnosed-by: Alexander Lakhin
Author: Vignesh C
Reviewed-by: Hou Zhijie, Alexander Lakhin, Ajin Cherian, Amit Kapila
Backpatch-through: 15
Discussion: https://postgr.es/m/711a6afe-edb7-1211-cc27-1bef8239eec7@gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes various typos, duplicated words, and tiny bits of whitespace
mainly in code comments but also in docs.
Author: Daniel Gustafsson <daniel@yesql.se>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Ensure that when updating the catalog_xmin of the synced slots, it is
first written to disk before changing the in-memory value
(effective_catalog_xmin). This is to prevent a scenario where the
in-memory value change triggers a vacuum to remove catalog tuples before
the catalog_xmin is written to disk. In the event of a crash before the
catalog_xmin is persisted, we would not know that some required catalog
tuples have been removed and the synced slot would be invalidated.
Change the sanity check to ensure that remote_slot's confirmed_flush LSN
can't precede the local/synced slot during slot sync. Note that the
restart_lsn of the synced/local slot can be ahead of remote_slot. This can
happen when slot advancing machinery finds a running xacts record after
reaching the consistent state at a later point than the primary where it
serializes the snapshot and updates the restart_lsn.
Make the check to sync slots robust by allowing to sync only when the
confirmed_lsn, restart_lsn, or catalog_xmin of the remote slot is ahead of
the synced/local slot.
Reported-by: Amit Kapila and Shveta Malik
Author: Hou Zhijie, Shveta Malik
Reviewed-by: Amit Kapila, Bertrand Drouvot
Discussion: https://postgr.es/m/OS0PR01MB57162B67D3CB01B2756FBA6D94062@OS0PR01MB5716.jpnprd01.prod.outlook.com
Discussion: https://postgr.es/m/CAJpy0uCSS5zmdyUXhvw41HSdTbRqX1hbYqkOfHNj7qQ+2zn0AQ@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit b840508644 and bcb14f4abc. These commits were made
for commit 5bec1d6bc5 (Improve eviction algorithm in ReorderBuffer
using max-heap for many subtransactions). However, per discussion,
commit efb8acc0d0 replaced binary heap + index with pairing heap, and
made these commits unnecessary.
Reported-by: Jeff Davis
Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A pairing heap can perform the same operations as the binary heap +
index, with as good or better algorithmic complexity, and that's an
existing data structure so that we don't need to invent anything new
compared to v16. This commit makes the new binaryheap functionality
that was added in commits b840508644 and bcb14f4abc unnecessary, but
they will be reverted separately.
Remove the optimization to only build and maintain the heap when the
amount of memory used is close to the limit, becuase the bookkeeping
overhead with the pairing heap seems to be small enough that it
doesn't matter in practice.
Reported-by: Jeff Davis
Author: Heikki Linnakangas
Reviewed-by: Michael Paquier, Hayato Kuroda, Masahiko Sawada
Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adjusts various appendStringInfo* function calls to use a more
appropriate and efficient function with the same behavior. For example,
use appendStringInfoChar() when appending a single character rather than
appendStringInfo() and appendStringInfoString() when no formatting is
required rather than using appendStringInfo().
All adjustments made here are in code that's new to v17, so it makes
sense to fix these now rather than wait a few years and make
backpatching harder.
Discussion: https://postgr.es/m/CAApHDvojY2UvMiO+9_55ArTj10P1LBNJyyoGB+C65BLDNT0GsQ@mail.gmail.com
Reviewed-by: Nathan Bossart, Tom Lane
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit does two things:
1) Maintains inactive_since for sync slots whenever the slot is released
just like any other regular slot.
2) Ensures the value is set to the current timestamp during the promotion
of standby to help correctly interpret the time after promotion. We don't
want the slots to appear inactive for a long time after promotion if they
haven't been synchronized recently. This would also avoid the invalidation
of such slots immediately after promotion if tomorrow we have a feature
that invalidates slots based on their inactivity time. Whoever acquires
the slot i.e. makes the slot active will reset it to NULL.
Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik, Masahiko Sawada
Discussion: https://postgr.es/m/CAA4eK1KrPGwfZV9LYGidjxHeW+rxJ=E2ThjXvwRGLO=iLNuo=Q@mail.gmail.com
Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
Discussion: https://postgr.es/m/CA+Tgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
losing data.
We were directly copying the LSN locations while syncing the slots on the
standby. Now, it is possible that at some particular restart_lsn there are
some running xacts, which means if we start reading the WAL from that
location after promotion, we won't reach a consistent snapshot state at
that point. However, on the primary, we would have already been in a
consistent snapshot state at that restart_lsn so we would have just
serialized the existing snapshot.
To avoid this problem we will use the advance_slot functionality unless
the snapshot already exists at the synced restart_lsn location. This will
help us to ensure that snapbuilder/slot statuses are updated properly
without generating any changes. Note that the synced slot will remain as
RS_TEMPORARY till the decoding from corresponding restart_lsn can reach a
consistent snapshot state after which they will be marked as
RS_PERSISTENT.
Per buildfarm
Author: Hou Zhijie
Reviewed-by: Bertrand Drouvot, Shveta Malik, Bharath Rupireddy, Amit Kapila
Discussion: https://postgr.es/m/OS0PR01MB5716B3942AE49F3F725ACA92943B2@OS0PR01MB5716.jpnprd01.prod.outlook.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
subtransactions.
Previously, when selecting the transaction to evict during logical
decoding, we check all transactions to find the largest
transaction. This could lead to a significant replication lag
especially in the case where there are many subtransactions.
This commit improves the eviction algorithm in ReorderBuffer using the
max-heap with transaction size as the key to efficiently find the
largest transaction.
The max-heap starts with empty. While the max-heap is empty, we don't
do anything for the max-heap when updating the memory
counter. Therefore, we get the largest transaction in O(N) time, where
N is the number of transactions including top-level transactions and
subtransactions.
We build the max-heap just before selecting the largest transactions
if the number of transactions being decoded is higher than the
threshold, MAX_HEAP_TXN_COUNT_THRESHOLD. After building the max-heap,
we also update the max-heap when updating the memory counter. The
intention is to efficiently find the largest transaction in O(1) time
instead of incurring the cost of memory counter updates (O(log
N)). Once the number of transactions got lower than the threshold, we
reset the max-heap.
The performance benchmark results showed significant speed up (more
than x30 speed up on my machine) in decoding a transaction with 100k
subtransactions, whereas there is no visible overhead in other cases.
Reviewed-by: Amit Kapila, Hayato Kuroda, Vignesh C, Ajin Cherian,
Tomas Vondra, Shubham Khanna, Peter Smith, Álvaro Herrera,
Euler Taveira
Discussion: https://postgr.es/m/CAD21AoAfKTgrBrLq96GcTv9d6k97zaQcDM-rxfKEt4GSe0qnaQ%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, binaryheap didn't support updating a key and removing a
node in an efficient way. For example, in order to remove a node from
the binaryheap, the caller had to pass the node's position within the
array that the binaryheap internally has. Removing a node from the
binaryheap is done in O(log n) but searching for the key's position is
done in O(n).
This commit adds a hash table to binaryheap in order to track the
position of each nodes in the binaryheap. That way, by using newly
added functions such as binaryheap_update_up() etc., both updating a
key and removing a node can be done in O(1) on an average and O(log n)
in worst case. This is known as the indexed binary heap. The caller
can specify to use the indexed binaryheap by passing indexed = true.
The current code does not use the new indexing logic, but it will be
used by an upcoming patch.
Reviewed-by: Vignesh C, Peter Smith, Hayato Kuroda, Ajin Cherian,
Tomas Vondra, Shubham Khanna
Discussion: https://postgr.es/m/CAD21AoDffo37RC-eUuyHJKVEr017V2YYDLyn1xF_00ofptWbkg%40mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The new combined WAL record is now used for pruning, freezing and 2nd
pass of vacuum. This is in preparation for changing VACUUM to write a
combined prune+freeze record per page, instead of separate two
records. The new WAL record format now supports that, but the code
still always writes separate records for pruning and freezing.
This reserves separate XLOG_HEAP2_* info codes for when the pruning
record is emitted for on-access pruning or VACUUM, per Peter
Geoghegan's suggestion. The record format is identical, but having
separate info codes makes it easier analyze pruning and vacuuming with
pg_waldump.
The function to emit the new WAL record, log_heap_prune_and_freeze(),
is in pruneheap.c. The existing heap_log_freeze_plan() and its
subroutines are moved to pruneheap.c without changes, to keep them
together with log_heap_prune_and_freeze().
Author: Melanie Plageman <melanieplageman@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAAKRu_azf-zH%3DDgVbquZ3tFWjMY1w5pO8m-TXJaMdri8z3933g@mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAAKRu_b2oE4GL%3Dq4g9mcByS9yT7wTQvEH9OLpabj28e%2BWKFi2A@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Till now, the reason for replication slot invalidation is not tracked
directly in pg_replication_slots. A recent commit 007693f2a3 added
'conflict_reason' to show the reasons for slot conflict/invalidation, but
only for logical slots.
This commit adds a new column 'invalidation_reason' to show invalidation
reasons for both physical and logical slots. And, this commit also turns
'conflict_reason' text column to 'conflicting' boolean column (effectively
reverting commit 007693f2a3). The 'conflicting' column is true for
invalidation reasons 'rows_removed' and 'wal_level_insufficient' because
those make the slot conflict with recovery. When 'conflicting' is true,
one can now look at the new 'invalidation_reason' column for the reason
for the logical slot's conflict with recovery.
The new 'invalidation_reason' column will also be useful to track other
invalidation reasons in the future commit.
Author: Bharath Rupireddy
Reviewed-by: Bertrand Drouvot, Amit Kapila, Shveta Malik
Discussion: https://www.postgresql.org/message-id/ZfR7HuzFEswakt/a%40ip-10-97-1-34.eu-west-3.compute.internal
Discussion: https://www.postgresql.org/message-id/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce new postmaster_child_launch() function that deals with the
differences in EXEC_BACKEND mode.
Refactor the mechanism of passing information from the parent to child
process. Instead of using different command-line arguments when
launching the child process in EXEC_BACKEND mode, pass a
variable-length blob of startup data along with all the global
variables. The contents of that blob depend on the kind of child
process being launched. In !EXEC_BACKEND mode, we use the same blob,
but it's simply inherited from the parent to child process.
Reviewed-by: Tristan Partin, Andres Freund
Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi
|