aboutsummaryrefslogtreecommitdiff
path: root/src/backend/replication
Commit message (Collapse)AuthorAge
...
* Track unpruned relids to avoid processing pruned relationsAmit Langote2025-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit introduces changes to track unpruned relations explicitly, making it possible for top-level plan nodes, such as ModifyTable and LockRows, to avoid processing partitions pruned during initial pruning. Scan-level nodes, such as Append and MergeAppend, already avoid the unnecessary processing by accessing partition pruning results directly via part_prune_index. In contrast, top-level nodes cannot access pruning results directly and need to determine which partitions remain unpruned. To address this, this commit introduces a new bitmapset field, es_unpruned_relids, which the executor uses to track the set of unpruned relations. This field is referenced during plan initialization to skip initializing certain nodes for pruned partitions. It is initialized with PlannedStmt.unprunableRelids, a new field that the planner populates with RT indexes of relations that cannot be pruned during runtime pruning. These include relations not subject to partition pruning and those required for execution regardless of pruning. PlannedStmt.unprunableRelids is computed during set_plan_refs() by removing the RT indexes of runtime-prunable relations, identified from PartitionPruneInfos, from the full set of relation RT indexes. ExecDoInitialPruning() then updates es_unpruned_relids by adding partitions that survive initial pruning. To support this, PartitionedRelPruneInfo and PartitionedRelPruningData now include a leafpart_rti_map[] array that maps partition indexes to their corresponding RT indexes. The former is used in set_plan_refs() when constructing unprunableRelids, while the latter is used in ExecDoInitialPruning() to convert partition indexes returned by get_matching_partitions() into RT indexes, which are then added to es_unpruned_relids. These changes make it possible for ModifyTable and LockRows nodes to process only relations that remain unpruned after initial pruning. ExecInitModifyTable() trims lists, such as resultRelations, withCheckOptionLists, returningLists, and updateColnosLists, to consider only unpruned partitions. It also creates ResultRelInfo structs only for these partitions. Similarly, child RowMarks for pruned relations are skipped. By avoiding unnecessary initialization of structures for pruned partitions, these changes improve the performance of updates and deletes on partitioned tables during initial runtime pruning. Due to ExecInitModifyTable() changes as described above, EXPLAIN on a plan for UPDATE and DELETE that uses runtime initial pruning no longer lists partitions pruned during initial pruning. Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
* Avoid updating inactive_since for invalid replication slots.Amit Kapila2025-02-05
| | | | | | | | | | | | | | | | | | | | | | | It is possible for the inactive_since value of an invalid replication slot to be updated multiple times, which is unexpected behavior like during the release of the slot or at the time of restart. This is harmless because invalid slots are not allowed to be accessed but it is not prudent to update invalid slots. We are planning to invalidate slots due to other reasons like idle time and it will look odd that the slot's inactive_since displays the recent time in this field after invalidated due to idle time. So, this patch ensures that the inactive_since field of slots is not updated for invalid slots. In the passing, ensure to use the same inactive_since time for all the slots at restart while restoring them from the disk. Author: Nisha Moond <nisha.moond412@gmail.com> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CABdArM7QdifQ_MHmMA=Cc4v8+MeckkwKncm2Nn6tX9wSCQ-+iw@mail.gmail.com
* Integrate GistTranslateCompareType() into IndexAmTranslateCompareType()Peter Eisentraut2025-02-03
| | | | | | | | | | | | | | | | | This turns GistTranslateCompareType() into a callback function of the gist index AM instead of a standalone function. The existing callers are changed to use IndexAmTranslateCompareType(). This then makes that code not hardcoded toward gist. This means in particular that the temporal keys code is now independent of gist. Also, this generalizes commit 74edabce7a3, so other index access methods other than the previously hardcoded ones could now work as REPLICA IDENTITY in a logical replication subscriber. Author: Mark Dilger <mark.dilger@enterprisedb.com> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
* Get rid of our dependency on type "long" for memory size calculations.Tom Lane2025-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | Consistently use "Size" (or size_t, or in some places int64 or double) as the type for variables holding memory allocation sizes. In most places variables' data types were fine already, but we had an ancient habit of computing bytes from kilobytes-units GUCs with code like "work_mem * 1024L". That risks overflow on Win64 where they did not make "long" as wide as "size_t". We worked around that by restricting such GUCs' ranges, so you couldn't set work_mem et al higher than 2GB on Win64. This patch removes that restriction, after replacing such calculations with "work_mem * (Size) 1024" or variants of that. It should be noted that this patch was constructed by searching outwards from the GUCs that have MAX_KILOBYTES as upper limit. So I can't positively guarantee there are no other places doing memory-size arithmetic in int or long variables. I do however feel pretty confident that increasing MAX_KILOBYTES on Win64 is safe now. Also, nothing in our code should be dealing in multiple-gigabyte allocations without authorization from a relevant GUC, so it seems pretty likely that this search caught everything that could be at risk of overflow. Author: Vladlen Popolitov <v.popolitov@postgrespro.ru> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/1a01f0-66ec2d80-3b-68487680@27595217
* Raise an error while trying to acquire an invalid slot.Amit Kapila2025-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once a replication slot is invalidated, it cannot be altered or used to fetch changes. However, a process could still acquire an invalid slot and fail later. For example, if a process acquires a logical slot that was invalidated due to wal_removed, it will eventually fail in CreateDecodingContext() when attempting to access the removed WAL. Similarly, for physical replication slots, even if the slot is invalidated and invalidation_reason is set to wal_removed, the walsender does not currently check for invalidation when starting physical replication. Instead, replication starts, and an error is only reported later while trying to access WAL. Similarly, we prohibit modifying slot properties for invalid slots but give the error for the same after acquiring the slot. This patch improves error handling by detecting invalid slots earlier at the time of slot acquisition which is the first step. This also helped in unifying different ERROR messages at different places and gave a consistent message for invalid slots. This means that the message for invalid slots will change to a generic message. This will also be helpful for future patches where we are planning to invalidate slots due to more reasons like idle_timeout because we don't have to modify multiple places in such cases and avoid the chances of missing out on a particular place. Author: Nisha Moond <nisha.moond412@gmail.com> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CABdArM6pBL5hPnSQ+5nEVMANcF4FCH7LQmgskXyiLY75TMnKpw@mail.gmail.com
* Fix grammatical typos around possessive "its"John Naylor2025-01-29
| | | | | | | | Some places spelled it "it's", which is short for "it is". In passing, fix a couple other nearby grammatical errors. Author: Jacob Brazeal <jacob.brazeal@gmail.com> Discussion: https://postgr.es/m/CA+COZaAO8g1KJCV0T48=CkJMjAnnfTGLWOATz+2aCh40c2Nm+g@mail.gmail.com
* Return yyparse() result not via global variablePeter Eisentraut2025-01-24
| | | | | | | | | Instead of passing the parse result from yyparse() via a global variable, pass it via a function output argument. This complements earlier work to make the parsers reentrant. Discussion: Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
* Change publication's publish_generated_columns option type to enum.Amit Kapila2025-01-23
| | | | | | | | | | | | | | | The current boolean publish_generated_columns option only supports a binary choice, which is insufficient for future enhancements where generated columns can be of different types (e.g., stored or virtual). The supported values for the publish_generated_columns option are 'none' and 'stored'. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/d718d219-dd47-4a33-bb97-56e8fc4da994@eisentraut.org Discussion: https://postgr.es/m/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E@gmail.com
* postmaster: Rename some shutdown related PMState phase namesAndres Freund2025-01-10
| | | | | | | | | | The previous names weren't particularly clear. Future patches will add more shutdown phases, making it even more important to have understandable shutdown phases. Suggested-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/d2cd8fd3-396a-4390-8f0b-74be65e72899@iki.fi
* flex code modernization: Replace YY_EXTRA_TYPE define with flex optionPeter Eisentraut2025-01-06
| | | | | | | | Replace #define YY_EXTRA_TYPE with %option extra-type. The latter is the way recommended by the flex manual (available since flex 2.5.34). Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
* Fix an assortment of spelling mistakes and typosDavid Rowley2025-01-02
| | | | | Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/5812a0b9-b0cf-4151-9a14-d9f00e4f2858@gmail.com
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Fix memory leak in pgoutput with relation attribute mapMichael Paquier2024-12-30
| | | | | | | | | | | | | | | | | | | | pgoutput caches the attribute map of a relation, that is free()'d only when validating a RelationSyncEntry. However, this code path is not taken when calling any of the SQL functions able to do some logical decoding, like pg_logical_slot_{get,peek}_changes(), leaking some memory into CacheMemoryContext on repeated calls. To address this, a relation's attribute map is allocated in PGOutputData's cachectx, free()'d at the end of the execution of these SQL functions when logical decoding ends. This is available down to 15. v13 and v14 have a similar leak, which will be dealt with later. Reported-by: Masahiko Sawada Author: Vignesh C Reviewed-by: Hou Zhijie Discussion: https://postgr.es/m/CAD21AoDkAhQVSukOfH3_reuF-j4EU0-HxMqU3dU+bSTxsqT14Q@mail.gmail.com Discussion: https://postgr.es/m/CALDaNm1hewNAsZ_e6FF52a=9drmkRJxtEPrzCB6-9mkJyeBBqA@mail.gmail.com Backpatch-through: 15
* Partial pgindent of .l and .y filesPeter Eisentraut2024-12-25
| | | | | | | Trying to clean up the code a bit while we're working on these files for the reentrant scanner/pure parser patches. This cleanup only touches the code sections after the second '%%' in each file, via a manually-supervised and locally hacked up pgindent.
* syncrep parser: pure parser and reentrant scannerPeter Eisentraut2024-12-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use the flex %option reentrant and the bison option %pure-parser to make the generated scanner and parser pure, reentrant, and thread-safe. Make the generated scanner use palloc() etc. instead of malloc() etc. Previously, we only used palloc() for the buffer, but flex would still use malloc() for its internal structures. Now, all the memory is under palloc() control. Simplify flex scan buffer management: Instead of constructing the buffer from pieces and then using yy_scan_buffer(), we can just use yy_scan_string(), which does the same thing internally. The previous code was necessary because we allocated the buffer with palloc() and the rest of the state was handled by malloc(). But this is no longer the case; everything is under palloc() now. Use flex yyextra to handle context information, instead of global variables. This complements the other changes to make the scanner reentrant. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
* replication parser: pure parser and reentrant scannerPeter Eisentraut2024-12-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the flex %option reentrant and the bison option %pure-parser to make the generated scanner and parser pure, reentrant, and thread-safe. Make the generated scanner use palloc() etc. instead of malloc() etc. Previously, we only used palloc() for the buffer, but flex would still use malloc() for its internal structures. As a result, there could be some small memory leaks in case of uncaught errors. Now, all the memory is under palloc() control, so there are no more such issues. Simplify flex scan buffer management: Instead of constructing the buffer from pieces and then using yy_scan_buffer(), we can just use yy_scan_string(), which does the same thing internally. The previous code was necessary because we allocated the buffer with palloc() and the rest of the state was handled by malloc(). But this is no longer the case; everything is under palloc() now. Use flex yyextra to handle context information, instead of global variables. This complements the other changes to make the scanner reentrant. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Co-authored-by: Andreas Karlsson <andreas@proxel.se> Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
* Remove unnecessary GetTransactionSnapshot() callsHeikki Linnakangas2024-12-23
| | | | | | | | | | | | | | | | | | | In get_database_list() and get_subscription_list(), the GetTransactionSnapshot() call is not required because the catalog table scans use the catalog snapshot, which is held until the end of the scan. See table_beginscan_catalog(), which calls RegisterSnapshot(GetCatalogSnapshot(relid)). In InitPostgres, it's a little less obvious that it's not required, but still true I believe. All the catalog lookups in InitPostgres() also use the catalog snapshot, and the looked up values are copied while still holding the snapshot. Furthermore, as the removed FIXME comments said, calling GetTransactionSnapshot() didn't really prevent MyProc->xmin from being reset anyway. Discussion: https://www.postgresql.org/message-id/7c56f180-b9e1-481e-8c1d-efa63de3ecbb@iki.fi
* Fix variable reference in commentHeikki Linnakangas2024-12-20
| | | | | | | | | This used to say "nsubxcnt isn't decreased when subtransactions abort", but there's no variable called nsubxcnt. Commit 8548ddc61b changed it to "subxcnt", among other typo fixes, but that was wrong too: the comment actually talks about txn->nsubtxns. That's the field that's incremented but never decremented and is used for the allocation earlier in the function.
* Introduce CompactAttribute array in TupleDesc, take 2David Rowley2024-12-20
| | | | | | | | | | | | | | | | | | | | | | | | The new compact_attrs array stores a few select fields from FormData_pg_attribute in a more compact way, using only 16 bytes per column instead of the 104 bytes that FormData_pg_attribute uses. Using CompactAttribute allows performance-critical operations such as tuple deformation to be performed without looking at the FormData_pg_attribute element in TupleDesc which means fewer cacheline accesses. For some workloads, tuple deformation can be the most CPU intensive part of processing the query. Some testing with 16 columns on a table where the first column is variable length showed around a 10% increase in transactions per second for an OLAP type query performing aggregation on the 16th column. However, in certain cases, the increases were much higher, up to ~25% on one AMD Zen4 machine. This also makes pg_attribute.attcacheoff redundant. A follow-on commit will remove it, thus shrinking the FormData_pg_attribute struct by 4 bytes. Author: David Rowley Reviewed-by: Andres Freund, Victor Yegorov Discussion: https://postgr.es/m/CAApHDvrBztXP3yx=NKNmo3xwFAFhEdyPnvrDg3=M0RhDs+4vYw@mail.gmail.com
* Small whitespace improvementPeter Eisentraut2024-12-19
| | | | | Author: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/eb6faeac-2a8a-4b69-9189-c33c520e5b7b@eisentraut.org
* Rewrite maybe_reread_subscription() commentÁlvaro Herrera2024-12-13
| | | | One sentence was gramatically wrong, but also too terse. Expand on it.
* Make the conditions in IsIndexUsableForReplicaIdentityFull() more explicitPeter Eisentraut2024-12-10
| | | | | | | | | | | | | | | | IsIndexUsableForReplicaIdentityFull() described a number of conditions that a suitable index has to fulfill. But not all of these were actually checked in the code. Instead, it appeared to rely on get_equal_strategy_number() to filter out any indexes that are not btree or hash. As we look to generalize index AM capabilities, this would possibly break if we added additional support in get_equal_strategy_number(). Instead, write out code to check for the required capabilities explicitly. This shouldn't change any behaviors at the moment. Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
* Replace get_equal_strategy_number_for_am() by get_equal_strategy_number()Peter Eisentraut2024-12-10
| | | | | | | | | | | | | | | | | get_equal_strategy_number_for_am() gets the equal strategy number for an AM. This currently only supports btree and hash. In the more general case, this also depends on the operator class (see for example GistTranslateStratnum()). To support that, replace this function with get_equal_strategy_number() that takes an opclass and derives it from there. (This function already existed before as a static function, so the signature is kept for simplicity.) This patch is only a refactoring, it doesn't add support for other index AMs such as gist. This will be done separately. Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
* Fix memory leak in pgoutput with publication list cacheMichael Paquier2024-12-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pgoutput module caches publication names in a list and frees it upon invalidation. However, the code forgot to free the actual publication names within the list elements, as publication names are pstrdup()'d in GetPublication(). This would cause memory to leak in CacheMemoryContext, bloating it over time as this context is not cleaned. This is a problem for WAL senders running for a long time, as an accumulation of invalidation requests would bloat its cache memory usage. A second case, where this leak is easier to see, involves a backend calling SQL functions like pg_logical_slot_{get,peek}_changes() which create a new decoding context with each execution. More publications create more bloat. To address this, this commit adds a new memory context within the logical decoding context and resets it each time the publication names cache is invalidated, based on a suggestion from Amit Kapila. This ensures that the lifespan of the publication names aligns with that of the logical decoding context. This solution changes PGOutputData, which is fine for HEAD but it could cause an ABI breakage in stable branches as the structure size would change, so these are left out for now. Analyzed-by: Michael Paquier, Jeff Davis Author: Zhijie Hou Reviewed-by: Michael Paquier, Masahiko Sawada, Euler Taveira Discussion: https://postgr.es/m/Z0khf9EVMVLOc_YY@paquier.xyz
* Simplify IsIndexUsableForReplicaIdentityFull()Peter Eisentraut2024-12-04
| | | | | | | | | | Take Relation as argument instead of IndexInfo. Building the IndexInfo is an unnecessary intermediate step here. A future patch wants to get some information that is in the relcache but not in IndexInfo, so this will also help there. Discussion: https://www.postgresql.org/message-id/333d3886-b737-45c3-93f4-594c96bb405d@eisentraut.org
* Fix synchronized_standby_slots GUC check hookÁlvaro Herrera2024-12-03
| | | | | | | | | | | | | | | | | The validate_sync_standby_slots subroutine requires an LWLock, so it cannot run in processes without PGPROC; skip it there to avoid a crash. This replaces the current test for ReplicationSlotCtl being not null, which appears to be a solution for the same problem but less general. I also rewrote a related comment that mentioned ReplicationSlotCtl in StandbySlotsHaveCaughtup. This code came in with commit bf279ddd1c28; backpatch to 17. Reported-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Discussion: https://postgr.es/m/202411281216.sutbxtr6idnn@alvherre.pgsql
* Remove useless casts to (void *)Peter Eisentraut2024-11-28
| | | | | | | | Many of them just seem to have been copied around for no real reason. Their presence causes (small) risks of hiding actual type mismatches or silently discarding qualifiers Discussion: https://www.postgresql.org/message-id/flat/461ea37c-8b58-43b4-9736-52884e862820@eisentraut.org
* Make GUC_check_errdetail messages full sentencesÁlvaro Herrera2024-11-27
| | | | | | | They were all missing punctuation, one was missing initial capital. Per our message style guidelines. No backpatch, to avoid breaking existing translations.
* Improve error message for replication of generated columns.Amit Kapila2024-11-27
| | | | | | | | | | | | | Currently, logical replication produces a generic error message when targeting a subscriber-side table column that is either missing or generated. The error message can be misleading for generated columns. This patch introduces a specific error message to clarify the issue when generated columns are involved. Author: Shubham Khanna Reviewed-by: Peter Smith, Vignesh C, Amit Kapila Discussion: https://postgr.es/m/CAHv8RjJBvYtqU7OAofBizOmQOK2Q8h+w9v2_cQWxT_gO7er3Aw@mail.gmail.com
* Doc: Clarify the `inactive_since` field description.Amit Kapila2024-11-25
| | | | | | | | | | | Updated to specify that it represents the exact time a slot became inactive, rather than the period of inactivity. Reported-by: Peter Smith Author: Bruce Momjian, Nisha Moond Reviewed-by: Amit Kapila, Peter Smith Backpatch-through: 17 Discussion: https://postgr.es/m/CAHut+PuvsyA5v8y7rYoY9mkDQzUhwaESM05yCByTMaDoRh30tA@mail.gmail.com
* Fix memory leak in pgoutput for the WAL senderMichael Paquier2024-11-21
| | | | | | | | | | | | | | | | | | | | | RelationSyncCache, the hash table in charge of tracking the relation schemas sent through pgoutput, was forgetting to free the TupleDesc associated to the two slots used to store the new and old tuples, causing some memory to be leaked each time a relation is invalidated when the slots of an existing relation entry are cleaned up. This is rather hard to notice as the bloat is pretty minimal, but a long-running WAL sender would be in trouble over time depending on the workload. sysbench has proved to be pretty good at showing the problem, coupled with some memory monitoring of the WAL sender. Issue introduced in 52e4f0cd472d, that has added row filters for tables logically replicated. Author: Boyu Yang Reviewed-by: Michael Paquier, Hou Zhijie Discussion: https://postgr.es/m/DM3PR84MB3442E14B340E553313B5C816E3252@DM3PR84MB3442.NAMPRD84.PROD.OUTLOOK.COM Backpatch-through: 15
* Fix a possibility of logical replication slot's restart_lsn going backwards.Masahiko Sawada2024-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously LogicalIncreaseRestartDecodingForSlot() accidentally accepted any LSN as the candidate_lsn and candidate_valid after the restart_lsn of the replication slot was updated, so it potentially caused the restart_lsn to move backwards. A scenario where this could happen in logical replication is: after a logical replication restart, based on previous candidate_lsn and candidate_valid values in memory, the restart_lsn advances upon receiving a subscriber acknowledgment. Then, logical decoding restarts from an older point, setting candidate_lsn and candidate_valid based on an old RUNNING_XACTS record. Subsequent subscriber acknowledgments then update the restart_lsn to an LSN older than the current value. In the reported case, after WAL files were removed by a checkpoint, the retreated restart_lsn prevented logical replication from restarting due to missing WAL segments. This change essentially modifies the 'if' condition to 'else if' condition within the function. The previous code had an asymmetry in this regard compared to LogicalIncreaseXminForSlot(), which does almost the same thing for different fields. The WAL removal issue was reported by Hubert Depesz Lubaczewski. Backpatch to all supported versions, since the bug exists since 9.4 where logical decoding was introduced. Reviewed-by: Tomas Vondra, Ashutosh Bapat, Amit Kapila Discussion: https://postgr.es/m/Yz2hivgyjS1RfMKs%40depesz.com Discussion: https://postgr.es/m/85fff40e-148b-4e86-b921-b4b846289132%40vondra.me Backpatch-through: 13
* Add pg_constraint rows for not-null constraintsÁlvaro Herrera2024-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We now create contype='n' pg_constraint rows for not-null constraints on user tables. Only one such constraint is allowed for a column. We propagate these constraints to other tables during operations such as adding inheritance relationships, creating and attaching partitions and creating tables LIKE other tables. These related constraints mostly follow the well-known rules of conislocal and coninhcount that we have for CHECK constraints, with some adaptations: for example, as opposed to CHECK constraints, we don't match not-null ones by name when descending a hierarchy to alter or remove it, instead matching by the name of the column that they apply to. This means we don't require the constraint names to be identical across a hierarchy. The inheritance status of these constraints can be controlled: now we can be sure that if a parent table has one, then all children will have it as well. They can optionally be marked NO INHERIT, and then children are free not to have one. (There's currently no support for altering a NO INHERIT constraint into inheriting down the hierarchy, but that's a desirable future feature.) This also opens the door for having these constraints be marked NOT VALID, as well as allowing UNIQUE+NOT NULL to be used for functional dependency determination, as envisioned by commit e49ae8d3bc58. It's likely possible to allow DEFERRABLE constraints as followup work, as well. psql shows these constraints in \d+, though we may want to reconsider if this turns out to be too noisy. Earlier versions of this patch hid constraints that were on the same columns of the primary key, but I'm not sure that that's very useful. If clutter is a problem, we might be better off inventing a new \d++ command and not showing the constraints in \d+. For now, we omit these constraints on system catalog columns, because they're unlikely to achieve anything. The main difference to the previous attempt at this (b0e96f311985) is that we now require that such a constraint always exists when a primary key is in the column; we didn't require this previously which had a number of unpalatable consequences. With this requirement, the code is easier to reason about. For example: - We no longer have "throwaway constraints" during pg_dump. We needed those for the case where a table had a PK without a not-null underneath, to prevent a slow scan of the data during restore of the PK creation, which was particularly problematic for pg_upgrade. - We no longer have to cope with attnotnull being set spuriously in case a primary key is dropped indirectly (e.g., via DROP COLUMN). Some bits of code in this patch were authored by Jian He. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Author: Bernd Helmle <mailings@oopsware.de> Reviewed-by: 何建 (jian he) <jian.universality@gmail.com> Reviewed-by: 王刚 (Tender Wang) <tndrwang@gmail.com> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Peter Eisentraut <peter.eisentraut@enterprisedb.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/202408310358.sdhumtyuy2ht@alvherre.pgsql
* Replicate generated columns when 'publish_generated_columns' is set.Amit Kapila2024-11-07
| | | | | | | | | | | | | | | | | | | | This patch builds on the work done in commit 745217a051 by enabling the replication of generated columns alongside regular column changes through a new publication parameter: publish_generated_columns. Example usage: CREATE PUBLICATION pub1 FOR TABLE tab_gencol WITH (publish_generated_columns = true); The column list takes precedence. If the generated columns are specified in the column list, they will be replicated even if 'publish_generated_columns' is set to false. Conversely, if generated columns are not included in the column list (assuming the user specifies a column list), they will not be replicated even if 'publish_generated_columns' is true. Author: Vignesh C, Shubham Khanna Reviewed-by: Peter Smith, Amit Kapila, Hayato Kuroda, Shlok Kyal, Ajin Cherian, Hou Zhijie, Masahiko Sawada Discussion: https://postgr.es/m/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E@gmail.com
* Use ProcNumbers instead of direct Latch pointers to address other procsHeikki Linnakangas2024-11-01
| | | | | | | | This is in preparation for replacing Latches with a new abstraction. That's still work in progress, but this seems a little tidier anyway, so let's get this refactoring out of the way already. Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c%40iki.fi
* Replicate generated columns when specified in the column list.Amit Kapila2024-10-30
| | | | | | | | | | | | | | | | | | | This commit allows logical replication to publish and replicate generated columns when explicitly listed in the column list. We also ensured that the generated columns were copied during the initial tablesync when they were published. We will allow to replicate generated columns even when they are not specified in the column list (via a new publication option) in a separate commit. The motivation of this work is to allow replication for cases where the client doesn't have generated columns. For example, the case where one is trying to replicate data from Postgres to the non-Postgres database. Author: Shubham Khanna, Vignesh C, Hou Zhijie Reviewed-by: Peter Smith, Hayato Kuroda, Shlok Kyal, Amit Kapila Discussion: https://postgr.es/m/B80D17B2-2C8E-4C7D-87F2-E5B4BE3C069E@gmail.com
* Remove unused #include's from backend .c filesPeter Eisentraut2024-10-27
| | | | | | | | as determined by IWYU These are mostly issues that are new since commit dbbca2cf299. Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org
* For inplace update, send nontransactional invalidations.Noah Misch2024-10-25
| | | | | | | | | | | | | | | | The inplace update survives ROLLBACK. The inval didn't, so another backend's DDL could then update the row without incorporating the inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER TABLE resulted in a table with an index, yet relhasindex=f. That is a source of index corruption. Back-patch to v12 (all supported versions). The back branch versions don't change WAL, because those branches just added end-of-recovery SIResetAll(). All branches change the ABI of extern function PrepareToInvalidateCacheTuple(). No PGXN extension calls that, and there's no apparent use case in extensions. Reviewed by Nitin Motiani and (in earlier versions) Andres Freund. Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
* Refactor code converting a publication name List to a StringInfoMichael Paquier2024-10-25
| | | | | | | | | | | | | | | | The existing get_publications_str() is renamed to GetPublicationsStr() and is moved to pg_subscription.c, so as it is possible to reuse it at two locations of the tablesync code where the same logic was duplicated. fetch_remote_table_info() was doing two List->StringInfo conversions when dealing with a server of version 15 or newer. The conversion happens only once now. This refactoring leads to less code overall. Author: Peter Smith Reviewed-by: Michael Paquier, Masahiko Sawada Discussion: https://postgr.es/m/CAHut+PtJMk4bKXqtpvqVy9ckknCgK9P6=FeG8zHF=6+Em_Snpw@mail.gmail.com
* Reduce memory block size for decoded tuple storage to 8kB.Masahiko Sawada2024-10-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a4ccc1cef introduced the Generation Context and modified the logical decoding process to use a Generation Context with a fixed block size of 8MB for storing tuple data decoded during logical decoding (i.e., rb->tup_context). Several reports have indicated that the logical decoding process can be terminated due to out-of-memory (OOM) situations caused by excessive memory usage in rb->tup_context. This issue can occur when decoding a workload involving several concurrent transactions, including a long-running transaction that modifies tuples. By design, the Generation Context does not free a memory block until all chunks within that block are released. Consequently, if tuples modified by the long-running transaction are stored across multiple memory blocks, these blocks remain allocated until the long-running transaction completes, leading to substantial memory fragmentation. The memory usage during logical decoding, tracked by rb->size, does not account for memory fragmentation, resulting in potentially much higher memory consumption than the value of the logical_decoding_work_mem parameter. Various improvement strategies were discussed in the relevant thread. This change reduces the block size of the Generation Context used in rb->tup_context from 8MB to 8kB. This modification significantly decreases the likelihood of substantial memory fragmentation occurring and is relatively straightforward to backport. Performance testing across multiple platforms has confirmed that this change will not introduce any performance degradation that would impact actual operation. Backport to all supported branches. Reported-by: Alex Richman, Michael Guissine, Avi Weinberg Reviewed-by: Amit Kapila, Fujii Masao, David Rowley Tested-by: Hayato Kuroda, Shlok Kyal Discussion: https://postgr.es/m/CAD21AoBTY1LATZUmvSXEssvq07qDZufV4AF-OHh9VD2pC0VY2A%40mail.gmail.com Backpatch-through: 12
* Add contrib/pg_logicalinspect.Masahiko Sawada2024-10-14
| | | | | | | | | | | | | | This module provides SQL functions that allow to inspect logical decoding components. It currently allows to inspect the contents of serialized logical snapshots of a running database cluster, which is useful for debugging or educational purposes. Author: Bertrand Drouvot Reviewed-by: Amit Kapila, Shveta Malik, Peter Smith, Peter Eisentraut Reviewed-by: David G. Johnston Discussion: https://postgr.es/m/ZscuZ92uGh3wm4tW%40ip-10-97-1-34.eu-west-3.compute.internal
* Move SnapBuild and SnapBuildOnDisk structs to snapshot_internal.h.Masahiko Sawada2024-10-14
| | | | | | | | | | | | This commit moves the definitions of the SnapBuild and SnapBuildOnDisk structs, related to logical snapshots, to the snapshot_internal.h file. This change allows external tools, such as pg_logicalinspect (with an upcoming patch), to access and utilize the contents of logical snapshots. Author: Bertrand Drouvot Reviewed-by: Amit Kapila, Shveta Malik, Peter Smith Discussion: https://postgr.es/m/ZscuZ92uGh3wm4tW%40ip-10-97-1-34.eu-west-3.compute.internal
* Use aux process resource owner in walsenderAndres Freund2024-10-08
| | | | | | | | | | AIO will need a resource owner to do IO. Right now we create a resowner on-demand during basebackup, and we could do the same for AIO. But it seems easier to just always create an aux process resowner. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi
* Remove unused latchHeikki Linnakangas2024-10-05
| | | | | | | It was left unused by commit bc971f4025, which replaced the latch usage with a condition variable Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c@iki.fi
* Prohibit altering invalidated replication slots.Amit Kapila2024-09-13
| | | | | | | | | | ALTER_REPLICATION_SLOT for invalid replication slots should not be allowed because there is no way to get back the invalidated (logical) slot to work. Author: Bharath Rupireddy Reviewed-by: Peter Smith, Shveta Malik Discussion: https://www.postgresql.org/message-id/CALj2ACW4fSOMiKjQ3=2NVBMTZRTG8Ujg6jsK9z3EvOtvA4vzKQ@mail.gmail.com
* Improve assertion in FindReplTupleInLocalRel().Amit Kapila2024-09-11
| | | | | | | | | | | | | | | | The first part of the assertion verifying that the passed index must be PK or RI was incorrectly passing index relation instead of heap relation in GetRelationIdentityOrPK(). The assertion was not failing because the second part of the assertion which needs to be performed only when remote relation has REPLICA_IDENTITY_FULL set was also incorrect. The change is not backpatched because the current coding doesn't lead to any failure. Reported-by: Dilip Kumar Author: Amit Kapila Reviewed-by: Vignesh C Discussion: https://postgr.es/m/CAFiTN-tmguaT1DXbCC+ZomZg-oZLmU6BPhr0po7akQSG6vNJrg@mail.gmail.com
* Apply more quoting to GUC names in messagesMichael Paquier2024-09-04
| | | | | | | | | This is a continuation of 17974ec25946. More quotes are applied to GUC names in error messages and hints, taking care of what seems to be all the remaining holes currently in the tree for the GUCs. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com
* Collect statistics about conflicts in logical replication.Amit Kapila2024-09-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds columns in view pg_stat_subscription_stats to show the number of times a particular conflict type has occurred during the application of logical replication changes. The following columns are added: confl_insert_exists: Number of times a row insertion violated a NOT DEFERRABLE unique constraint. confl_update_origin_differs: Number of times an update was performed on a row that was previously modified by another origin. confl_update_exists: Number of times that the updated value of a row violates a NOT DEFERRABLE unique constraint. confl_update_missing: Number of times that the tuple to be updated is missing. confl_delete_origin_differs: Number of times a delete was performed on a row that was previously modified by another origin. confl_delete_missing: Number of times that the tuple to be deleted is missing. The update_origin_differs and delete_origin_differs conflicts can be detected only when track_commit_timestamp is enabled. Author: Hou Zhijie Reviewed-by: Shveta Malik, Peter Smith, Anit Kapila Discussion: https://postgr.es/m/OS0PR01MB57160A07BD575773045FC214948F2@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Add const qualifiers to XLogRegister*() functionsPeter Eisentraut2024-09-03
| | | | | | | | Add const qualifiers to XLogRegisterData() and XLogRegisterBufData(). Several unconstify() calls can be removed. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/dd889784-9ce7-436a-b4f1-52e4a5e577bd@eisentraut.org
* Fix typos and grammar in code comments and docsMichael Paquier2024-09-03
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/f7e514cf-2446-21f1-a5d2-8c089a6e2168@gmail.com