aboutsummaryrefslogtreecommitdiff
path: root/src/backend/replication/logical/reorderbuffer.c
Commit message (Collapse)AuthorAge
...
* Fix ReorderBuffer memory overflow check.Amit Kapila2020-06-10
| | | | | | | | | | | | | | | | | | | | | Commit cec2edfa78 introduced logical_decoding_work_mem to limit ReorderBuffer memory usage. We spill the changes once the memory occupied by changes exceeds logical_decoding_work_mem. There was an assumption in the code that by evicting the largest (sub)transaction we will come under the memory limit as the selected transaction will be at least as large as the most recent change (which caused us to go over the memory limit). However, that is not true because a user can reduce the logical_decoding_work_mem to a smaller value before the most recent change. We fix it by allowing to evict the transactions until we reach under the memory limit. Reported-by: Fujii Masao Author: Amit Kapila Reviewed-by: Fujii Masao Backpatch-through: 13, where it was introduced Discussion: https://postgr.es/m/2b7ba291-22e0-a187-d167-9e5309a3458d@oss.nttdata.com
* Spelling adjustmentsPeter Eisentraut2020-06-07
|
* Dial back -Wimplicit-fallthrough to level 3Alvaro Herrera2020-05-13
| | | | | | | | | The additional pain from level 4 is excessive for the gain. Also revert all the source annotation changes to their original wordings, to avoid back-patching pain. Discussion: https://postgr.es/m/31166.1589378554@sss.pgh.pa.us
* Add -Wimplicit-fallthrough to CFLAGS and CXXFLAGSAlvaro Herrera2020-05-12
| | | | | | | | | | | | | | | Use it at level 4, a bit more restrictive than the default level, and tweak our commanding comments to FALLTHROUGH. (However, leave zic.c alone, since it's external code; to avoid the warnings that would appear there, change CFLAGS for that file in the Makefile.) Author: Julien Rouhaud <rjuju123@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/20200412081825.qyo5vwwco3fv4gdo@nol Discussion: https://postgr.es/m/flat/E1fDenm-0000C8-IJ@gemulon.postgresql.org
* logical decoding: Remove TODO about unnecessary optimization.Andres Freund2020-03-24
| | | | | | | | | Measurements show, and intuition agrees, that there's currently no known cases where adding a fastpath to avoid allocating / ordering a heap for a single transaction is worthwhile. Author: Dilip Kumar Discussion: https://postgr.es/m/CAFiTN-sp701wvzvnLQJGk7JDqrFM8f--97-ihbwkU8qvn=p8nw@mail.gmail.com
* Stop demanding that top xact must be seen before subxact in decoding.Amit Kapila2020-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Manifested as ERROR: subtransaction logged without previous top-level txn record this check forbids legit behaviours like - First xl_xact_assignment record is beyond reading, i.e. earlier restart_lsn. - After restart_lsn there is some change of a subxact. - After that, there is second xl_xact_assignment (for another subxact) revealing the relationship between top and first subxact. Such a transaction won't be streamed anyway because we hadn't seen it in full. Saying for sure whether xact of some record encountered after the snapshot was deserialized can be streamed or not requires to know whether it wrote something before deserialization point --if yes, it hasn't been seen in full and can't be decoded. Snapshot doesn't have such info, so there is no easy way to relax the check. Reported-by: Hsu, John Diagnosed-by: Arseny Sher Author: Arseny Sher, Amit Kapila Reviewed-by: Amit Kapila, Dilip Kumar Backpatch-through: 9.5 Discussion: https://postgr.es/m/AB5978B2-1772-4FEE-A245-74C91704ECB0@amazon.com
* Clean up newlines following left parenthesesAlvaro Herrera2020-01-30
| | | | | | | | | | | | We used to strategically place newlines after some function call left parentheses to make pgindent move the argument list a few chars to the left, so that the whole line would fit under 80 chars. However, pgindent no longer does that, so the newlines just made the code vertically longer for no reason. Remove those newlines, and reflow some of those lines for some extra naturality. Reviewed-by: Michael Paquier, Tom Lane Discussion: https://postgr.es/m/20200129200401.GA6303@alvherre.pgsql
* Set ReorderBufferTXN->final_lsn more eagerlyAlvaro Herrera2020-01-17
| | | | | | | | | | | | | | | | | | | ... specifically, set it incrementally as each individual change is spilled down to disk. This way, it is set correctly when the transaction disappears without trace, ie. without leaving an XACT_ABORT wal record. (This happens when the server crashes midway through a transaction.) Failing to have final_lsn prevents ReorderBufferRestoreCleanup() from working, since it needs the final_lsn in order to know the endpoint of its iteration through spilled files. Commit df9f682c7bf8 already tried to fix the problem, but it didn't set the final_lsn in all cases. Revert that, since it's no longer needed. Author: Vignesh C Reviewed-by: Amit Kapila, Dilip Kumar Discussion: https://postgr.es/m/CALDaNm2CLk+K9JDwjYST0sPbGg5AQdvhUt0jbKyX_HdAE0jk3A@mail.gmail.com
* Clean up representation of flags in struct ReorderBufferTXNAlvaro Herrera2020-01-10
| | | | | | | This simplifies addition of further flags. Author: Nikhil Sontakke Discussion: https://postgr.es/m/CAMGcDxeViP+R-OL7QhzUV9eKCVjURobuY1Zijik4Ay_Ddwo4Cg@mail.gmail.com
* Fix running out of file descriptors for spill files.Amit Kapila2020-01-02
| | | | | | | | | | | | | | | | | | | Currently while decoding changes, if the number of changes exceeds a certain threshold, we spill those to disk.  And this happens for each (sub)transaction.  Now, while reading all these files, we don't close them until we read all the files.  While reading these files, if the number of such files exceeds the maximum number of file descriptors, the operation errors out. Use PathNameOpenFile interface to open these files as that internally has the mechanism to release kernel FDs as needed to get us under the max_safe_fds limit. Reported-by: Amit Khandekar Author: Amit Khandekar Reviewed-by: Amit Kapila Backpatch-through: 9.4 Discussion: https://postgr.es/m/CAJ3gD9c-sECEn79zXw4yBnBdOttacoE-6gAyP0oy60nfs_sabQ@mail.gmail.com
* Update copyrights for 2020Bruce Momjian2020-01-01
| | | | Backpatch-through: update all files in master, backpatch legal files through 9.4
* Track statistics for spilling of changes from ReorderBuffer.Amit Kapila2019-11-21
| | | | | | | | | | This adds the statistics about transactions spilled to disk from ReorderBuffer. Users can query the pg_stat_replication view to check these stats. Author: Tomas Vondra, with bug-fixes and minor changes by Dilip Kumar Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
* Add logical_decoding_work_mem to limit ReorderBuffer memory usage.Amit Kapila2019-11-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of deciding to serialize a transaction merely based on the number of changes in that xact (toplevel or subxact), this makes the decisions based on amount of memory consumed by the changes. The memory limit is defined by a new logical_decoding_work_mem GUC, so for example we can do this SET logical_decoding_work_mem = '128kB' to reduce the memory usage of walsenders or set the higher value to reduce disk writes. The minimum value is 64kB. When adding a change to a transaction, we account for the size in two places. Firstly, in the ReorderBuffer, which is then used to decide if we reached the total memory limit. And secondly in the transaction the change belongs to, so that we can pick the largest transaction to evict (and serialize to disk). We still use max_changes_in_memory when loading changes serialized to disk. The trouble is we can't use the memory limit directly as there might be multiple subxact serialized, we need to read all of them but we don't know how many are there (and which subxact to read first). We do not serialize the ReorderBufferTXN entries, so if there is a transaction with many subxacts, most memory may be in this type of objects. Those records are not included in the memory accounting. We also do not account for INTERNAL_TUPLECID changes, which are kept in a separate list and not evicted from memory. Transactions with many CTID changes may consume significant amounts of memory, but we can't really do much about that. The current eviction algorithm is very simple - the transaction is picked merely by size, while it might be useful to also consider age (LSN) of the changes for example. With the new Generational memory allocator, evicting the oldest changes would make it more likely the memory gets actually pfreed. The logical_decoding_work_mem can be set in postgresql.conf, in which case it serves as the default for all publishers on that instance. Author: Tomas Vondra, with changes by Dilip Kumar and Amit Kapila Reviewed-by: Dilip Kumar and Amit Kapila Tested-By: Vignesh C Discussion: https://postgr.es/m/688b0b7f-2f6c-d827-c27b-216a8e3ea700@2ndquadrant.com
* Fix most -Wundef warningsPeter Eisentraut2019-10-19
| | | | | | | | | | | | In some cases #if was used instead of #ifdef in an inconsistent style. Cleaning this up also helps when analyzing cases like 38d8dce61fff09daae0edb6bcdd42b0c7f10ebcd where this makes a difference. There are no behavior changes here, but the change in pg_bswap.h would prevent possible accidental misuse by third-party code. Discussion: https://www.postgresql.org/message-id/flat/3b615ca5-c595-3f1d-fdf7-a429e564f614%402ndquadrant.com
* Fix RelationIdGetRelation calls that weren't bothering with error checks.Tom Lane2019-09-08
| | | | | | | | | | | | Some of these are quite old, but that doesn't make them not bugs. We'd rather report a failure via elog than SIGSEGV. While at it, uniformly spell the error check as !RelationIsValid(rel) rather than a bare rel == NULL test. The machine code is the same but it seems better to be consistent. Coverity complained about this today, not sure why, because the mistake is in fact old.
* Split tuptoaster.c into three separate files.Robert Haas2019-09-05
| | | | | | | | | | | | | | | | | | | | | | | | detoast.c/h contain functions required to detoast a datum, partially or completely, plus a few other utility functions for examining the size of toasted datums. toast_internals.c/h contain functions that are used internally to the TOAST subsystem but which (mostly) do not need to be accessed from outside. heaptoast.c/h contains code that is intrinsically specific to the heap AM, either because it operates on HeapTuples or is based on the layout of a heap page. detoast.c and toast_internals.c are placed in src/backend/access/common rather than src/backend/access/heap. At present, both files still have dependencies on the heap, but that will be improved in a future commit. Patch by me, reviewed and tested by Prabhat Sabu, Thomas Munro, Andres Freund, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoZv-=2iWM4jcw5ZhJeL18HF96+W1yJeYrnGMYdkFFnEpQ@mail.gmail.com
* Fix inconsistencies and typos in the tree, take 11Michael Paquier2019-08-19
| | | | | | | | This fixes various typos in docs and comments, and removes some orphaned definitions. Author: Alexander Lakhin Discussion: https://postgr.es/m/5da8e325-c665-da95-21e0-c8a99ea61fbf@gmail.com
* Fix inconsistencies and typos in the tree, take 9Michael Paquier2019-08-05
| | | | | | | | This addresses more issues with code comments, variable names and unreferenced variables. Author: Alexander Lakhin Discussion: https://postgr.es/m/7ab243e0-116d-3e44-d120-76b3df7abefd@gmail.com
* Redesign the API for list sorting (list_qsort becomes list_sort).Tom Lane2019-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | In the wake of commit 1cff1b95a, the obvious way to sort a List is to apply qsort() directly to the array of ListCells. list_qsort was building an intermediate array of pointers-to-ListCells, which we no longer need, but getting rid of it forces an API change: the comparator functions need to do one less level of indirection. Since we're having to touch the callers anyway, let's do two additional changes: sort the given list in-place rather than making a copy (as none of the existing callers have any use for the copying behavior), and rename list_qsort to list_sort. It was argued that the old name exposes more about the implementation than it should, which I find pretty questionable, but a better reason to rename it is to be sure we get the attention of any external callers about the need to fix their comparator functions. While we're at it, change four existing callers of qsort() to use list_sort instead; previously, they all had local reinventions of list_qsort, ie build-an-array-from-a-List-and-qsort-it. (There are some other places where changing to list_sort perhaps would be worthwhile, but they're less obviously wins.) Discussion: https://postgr.es/m/29361.1563220190@sss.pgh.pa.us
* Use consistent style for checking return from system callsPeter Eisentraut2019-07-07
| | | | | | | | | | | | | | | | | Use if (something() != 0) error ... instead of just if (something) error ... The latter is not incorrect, but it's a bit confusing and not the common style. Discussion: https://www.postgresql.org/message-id/flat/5de61b6b-8be9-7771-0048-860328efe027%402ndquadrant.com
* Fix typos.Thomas Munro2019-05-24
| | | | | Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/CA%2BhUKGJFWXmtYo6Frd77RR8YXCHz7hJ2mRy5aHV%3D7fJOqDnBHA%40mail.gmail.com
* Phase 2 pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | | Switch to 2.1 version of pg_bsd_indent. This formats multiline function declarations "correctly", that is with additional lines of parameter declarations indented to match where the first line's left parenthesis is. Discussion: https://postgr.es/m/CAEepm=0P3FeTXRcU5B2W3jv3PgRVZ-kGUXLGfd42FFhUROO3ug@mail.gmail.com
* Initial pgindent run for v12.Tom Lane2019-05-22
| | | | | | | | This is still using the 2.0 version of pg_bsd_indent. I thought it would be good to commit this separately, so as to document the differences between 2.0 and 2.1 behavior. Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
* Fix duplicated words in commentsMichael Paquier2019-05-14
| | | | | Author: Stephen Amell Discussion: https://postgr.es/m/539fa271-21b3-777e-a468-d96cffe9c768@gmail.com
* Tighten use of OpenTransientFile and CloseTransientFileMichael Paquier2019-03-09
| | | | | | | | | | | | | | | | | | | | | | | | | | This fixes two sets of issues related to the use of transient files in the backend: 1) OpenTransientFile() has been used in some code paths with read-write flags while read-only is sufficient, so switch those calls to be read-only where necessary. These have been reported by Joe Conway. 2) When opening transient files, it is up to the caller to close the file descriptors opened. In error code paths, CloseTransientFile() gets called to clean up things before issuing an error. However in normal exit paths, a lot of callers of CloseTransientFile() never actually reported errors, which could leave a file descriptor open without knowing about it. This is an issue I complained about a couple of times, but never had the courage to write and submit a patch, so here we go. Note that one frontend code path is impacted by this commit so as an error is issued when fetching control file data, making backend and frontend to be treated consistently. Reported-by: Joe Conway, Michael Paquier Author: Michael Paquier Reviewed-by: Álvaro Herrera, Georgios Kokolatos, Joe Conway Discussion: https://postgr.es/m/20190301023338.GD1348@paquier.xyz Discussion: https://postgr.es/m/c49b69ec-e2f7-ff33-4f17-0eaa4f2cef27@joeconway.com
* Relax overly strict assertionAlvaro Herrera2019-02-12
| | | | | | | | | | | | | | | Ever since its birth, ReorderBufferBuildTupleCidHash() has contained an assertion that a catalog tuple cannot change Cmax after acquiring one. But that's wrong: if a subtransaction executes DDL that affects that catalog tuple, and later aborts and another DDL affects the same tuple, it will change Cmax. Relax the assertion to merely verify that the Cmax remains valid and monotonically increasing, instead. Add a test that tickles the relevant code. Diagnosed by, and initial patch submitted by: Arseny Sher Co-authored-by: Arseny Sher Discussion: https://postgr.es/m/874l9p8hyw.fsf@ars-thinkpad
* Move remaining code from tqual.[ch] to heapam.h / heapam_visibility.c.Andres Freund2019-01-21
| | | | | | | | | | | | | | Given these routines are heap specific, and that there will be more generic visibility support in via table AM, it makes sense to move the prototypes to heapam.h (routines like HeapTupleSatisfiesVacuum will not be exposed in a generic fashion, because they are too storage specific). Similarly, the code in tqual.c is specific to heap, so moving it into access/heap/ makes sense. Author: Andres Freund Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
* Update copyright for 2019Bruce Momjian2019-01-02
| | | | Backpatch-through: certain files through 9.4
* Do not decode TOAST data for table rewritesTomas Vondra2018-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During table rewrites (VACUUM FULL and CLUSTER), the main heap is logged using XLOG / FPI records, and thus (correctly) ignored in decoding. But the associated TOAST table is WAL-logged as plain INSERT records, and so was logically decoded and passed to reorder buffer. That has severe consequences with TOAST tables of non-trivial size. Firstly, reorder buffer has to keep all those changes, possibly spilling them to a file, incurring I/O costs and disk space. Secondly, ReoderBufferCommit() was stashing all those TOAST chunks into a hash table, which got discarded only after processing the row from the main heap. But as the main heap is not decoded for rewrites, this never happened, so all the TOAST data accumulated in memory, resulting either in excessive memory consumption or OOM. The fix is simple, as commit e9edc1ba already introduced infrastructure (namely HEAP_INSERT_NO_LOGICAL flag) to skip logical decoding of TOAST tables, but it only applied it to system tables. So simply use it for all TOAST data in raw_heap_insert(). That would however solve only the memory consumption issue - the TOAST changes would still be decoded and added to the reorder buffer, and spilled to disk (although without TOAST tuple data, so much smaller). But we can solve that by tweaking DecodeInsert() to just ignore such INSERT records altogether, using XLH_INSERT_CONTAINS_NEW_TUPLE flag, instead of skipping them later in ReorderBufferCommit(). Review: Masahiko Sawada Discussion: https://www.postgresql.org/message-id/flat/1a17c643-e9af-3dba-486b-fbe31bc1823a%402ndquadrant.com Backpatch: 9.4-, where logical decoding was introduced
* Fix logical decoding error when system table w/ toast is repeatedly rewritten.Andres Freund2018-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Repeatedly rewriting a mapped catalog table with VACUUM FULL or CLUSTER could cause logical decoding to fail with: ERROR, "could not map filenode \"%s\" to relation OID" To trigger the problem the rewritten catalog had to have live tuples with toasted columns. The problem was triggered as during catalog table rewrites the heap_insert() check that prevents logical decoding information to be emitted for system catalogs, failed to treat the new heap's toast table as a system catalog (because the new heap is not recognized as a catalog table via RelationIsLogicallyLogged()). The relmapper, in contrast to the normal catalog contents, does not contain historical information. After a single rewrite of a mapped table the new relation is known to the relmapper, but if the table is rewritten twice before logical decoding occurs, the relfilenode cannot be mapped to a relation anymore. Which then leads us to error out. This only happens for toast tables, because the main table contents aren't re-inserted with heap_insert(). The fix is simple, add a new heap_insert() flag that prevents logical decoding information from being emitted, and accept during decoding that there might not be tuple data for toast tables. Unfortunately that does not fix pre-existing logical decoding errors. Doing so would require not throwing an error when a filenode cannot be mapped to a relation during decoding, and that seems too likely to hide bugs. If it's crucial to fix decoding for an existing slot, temporarily changing the ERROR in ReorderBufferCommit() to a WARNING appears to be the best fix. Author: Andres Freund Discussion: https://postgr.es/m/20180914021046.oi7dm4ra3ot2g2kt@alap3.anarazel.de Backpatch: 9.4-, where logical decoding was introduced
* Fix memory leak in TRUNCATE decodingTomas Vondra2018-09-03
| | | | | | | | | | | | | | | | | | | | | | | | When decoding a TRUNCATE record, the relids array was being allocated in the main ReorderBuffer memory context, but not released with the change resulting in a memory leak. The array was also ignored when serializing/deserializing the change, assuming all the information is stored in the change itself. So when spilling the change to disk, we've only we have serialized only the pointer to the relids array. Thanks to never releasing the array, the pointer however remained valid even after loading the change back to memory, preventing an actual crash. This fixes both the memory leak and (de)serialization. The relids array is still allocated in the main ReorderBuffer memory context (none of the existing ones seems like a good match, and adding an extra context seems like an overkill). The allocation is wrapped in a new ReorderBuffer API functions, to keep the details within reorderbuffer.c, just like the other ReorderBufferGet methods do. Author: Tomas Vondra Discussion: https://www.postgresql.org/message-id/flat/66175a41-9342-2845-652f-1bd4c3ee50aa%402ndquadrant.com Backpatch: 11, where decoding of TRUNCATE was introduced
* Reconsider new file extension in commit 91f26d5f.Jeff Davis2018-08-25
| | | | | | | Andres and Tom objected to the choice of the ".tmp" extension. Changing to Andres's suggestion of ".spill". Discussion: https://postgr.es/m/88092095-3348-49D8-8746-EB574B1D30EA%40anarazel.de
* Change extension of spilled ReorderBufferChange data to ".tmp".Jeff Davis2018-08-25
| | | | | | | The previous extension, ".snap", was chosen for historical reasons and became confusing. Discussion: https://postgr.es/m/CAMp0ubd_P8vBGx8=MfDXQJZxHA5D_Zarw5cCkDxJ_63+pWRJ9w@mail.gmail.com
* Close the file descriptor in ApplyLogicalMappingFileTomas Vondra2018-08-16
| | | | | | | | | | | | | | The function was forgetting to close the file descriptor, resulting in failures like this: ERROR: 53000: exceeded maxAllocatedDescs (492) while trying to open file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b" LOCATION: OpenTransientFile, fd.c:2161 Simply close the file at the end, and backpatch to 9.4 (where logical decoding was introduced). While at it, fix a nearby typo. Discussion: https://www.postgresql.org/message-id/flat/738a590a-2ce5-9394-2bef-7b1caad89b37%402ndquadrant.com
* Reset properly errno before calling write()Michael Paquier2018-08-05
| | | | | | | | | | | | 6cb3372 enforces errno to ENOSPC when less bytes than what is expected have been written when it is unset, though it forgot to properly reset errno before doing a system call to write(), causing errno to potentially come from a previous system call. Reported-by: Tom Lane Author: Michael Paquier Reviewed-by: Tom Lane Discussion: https://postgr.es/m/31797.1533326676@sss.pgh.pa.us
* Flip argument order in XLogSegNoOffsetToRecPtrAlvaro Herrera2018-07-09
| | | | | | | | | Commit fc49e24fa69a added an input argument after the existing output argument. Flip those. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20180708182345.imdgovmkffgtihhk@alvherre.pgsql
* logical decoding: beware of an unset specinsert changeAlvaro Herrera2018-07-05
| | | | | | Coverity complains that there is no protection in the code (at least in non-assertion-enabled builds) against speculative insertion failing to follow the expected protocol. Add an elog(ERROR) for the case.
* pgindent run prior to branchingAndrew Dunstan2018-06-30
|
* Fix "base" snapshot handling in logical decodingAlvaro Herrera2018-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two closely related bugs are fixed. First, xmin of logical slots was advanced too early. During xl_running_xacts processing, xmin of the slot was set to the oldest running xid in the record, but that's wrong: actually, snapshots which will be used for not-yet-replayed transactions might consider older txns as running too, so we need to keep xmin back for them. The problem wasn't noticed earlier because DDL which allows to delete tuple (set xmax) while some another not-yet-committed transaction looks at it is pretty rare, if not unique: e.g. all forms of ALTER TABLE which change schema acquire ACCESS EXCLUSIVE lock conflicting with any inserts. The included test case (test_decoding's oldest_xmin) uses ALTER of a composite type, which doesn't have such interlocking. To deal with this, we must be able to quickly retrieve oldest xmin (oldest running xid among all assigned snapshots) from ReorderBuffer. To fix, add another list of ReorderBufferTXNs to the reorderbuffer, where transactions are sorted by base-snapshot-LSN. This is slightly different from the existing (sorted by first-LSN) list, because a transaction can have an earlier LSN but a later Xmin, if its first record does not obtain an xmin (eg. xl_xact_assignment). Note this new list doesn't fully replace the existing txn list: we still need that one to prevent WAL recycling. The second issue concerns SnapBuilder snapshots and subtransactions. SnapBuildDistributeNewCatalogSnapshot never assigned a snapshot to a transaction that is known to be a subtxn, which is good in the common case that the top-level transaction already has one (no point in doing so), but a bug otherwise. To fix, arrange to transfer the snapshot from the subtxn to its top-level txn as soon as the kinship gets known. test_decoding's snapshot_transfer verifies this. Also, fix a minor memory leak: refcount of toplevel's old base snapshot was not decremented when the snapshot is transferred from child. Liberally sprinkle code comments, and rewrite a few existing ones. This part is my (Álvaro's) contribution to this commit, as I had to write all those comments in order to understand the existing code and Arseny's patch. Reported-by: Arseny Sher <a.sher@postgrespro.ru> Diagnosed-by: Arseny Sher <a.sher@postgrespro.ru> Co-authored-by: Arseny Sher <a.sher@postgrespro.ru> Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Antonin Houska <ah@cybertec.at> Discussion: https://postgr.es/m/87lgdyz1wj.fsf@ars-thinkpad
* Update obsolete commentsAlvaro Herrera2018-06-25
| | | | | | | Commit 9fab40ad32ef removed some pre-allocating logic in reorderbuffer.c, but left outdated comments in place. Repair. Author: Álvaro Herrera
* Address set of issues with errno handlingMichael Paquier2018-06-25
| | | | | | | | | | | | | | | | | | | | System calls mixed up in error code paths are causing two issues which several code paths have not correctly handled: 1) For write() calls, sometimes the system may return less bytes than what has been written without errno being set. Some paths were careful enough to consider that case, and assumed that errno should be set to ENOSPC, other calls missed that. 2) errno generated by a system call is overwritten by other system calls which may succeed once an error code path is taken, causing what is reported to the user to be incorrect. This patch uses the brute-force approach of correcting all those code paths. Some refactoring could happen in the future, but this is let as future work, which is not targeted for back-branches anyway. Author: Michael Paquier Reviewed-by: Ashutosh Sharma Discussion: https://postgr.es/m/20180622061535.GD5215@paquier.xyz
* Post-feature-freeze pgindent run.Tom Lane2018-04-26
| | | | Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
* Logical decoding of TRUNCATEPeter Eisentraut2018-04-07
| | | | | | | | | | | | | | Add a new WAL record type for TRUNCATE, which is only used when wal_level >= logical. (For physical replication, TRUNCATE is already replicated via SMGR records.) Add new callback for logical decoding output plugins to receive TRUNCATE actions. Author: Simon Riggs <simon@2ndquadrant.com> Author: Marco Nenciarini <marco.nenciarini@2ndquadrant.it> Author: Peter Eisentraut <peter.eisentraut@2ndquadrant.com> Reviewed-by: Petr Jelinek <petr.jelinek@2ndquadrant.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
* Allow memory contexts to have both fixed and variable ident strings.Tom Lane2018-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally, we treated memory context names as potentially variable in all cases, and therefore always copied them into the context header. Commit 9fa6f00b1 rethought this a little bit and invented a distinction between fixed and variable names, skipping the copy step for the former. But we can make things both simpler and more useful by instead allowing there to be two parts to a context's identification, a fixed "name" and an optional, variable "ident". The name supplied in the context create call is now required to be a compile-time-constant string in all cases, as it is never copied but just pointed to. The "ident" string, if wanted, is supplied later. This is needed because typically we want the ident to be stored inside the context so that it's cleaned up automatically on context deletion; that means it has to be copied into the context before we can set the pointer. The cost of this approach is basically just an additional pointer field in struct MemoryContextData, which isn't much overhead, and is bought back entirely in the AllocSet case by not needing a headerSize field anymore, since we no longer have to cope with variable header length. In addition, we can simplify the internal interfaces for memory context creation still further, saving a few cycles there. And it's no longer true that a custom identifier disqualifies a context from participating in aset.c's freelist scheme, so possibly there's some win on that end. All the places that were using non-compile-time-constant context names are adjusted to put the variable info into the "ident" instead. This allows more effective identification of those contexts in many cases; for example, subsidary contexts of relcache entries are now identified by both type (e.g. "index info") and relname, where before you got only one or the other. Contexts associated with PL function cache entries are now identified more fully and uniformly, too. I also arranged for plancache contexts to use the query source string as their identifier. This is basically free for CachedPlanSources, as they contained a copy of that string already. We pay an extra pstrdup to do it for CachedPlans. That could perhaps be avoided, but it would make things more fragile (since the CachedPlanSource is sometimes destroyed first). I suspect future improvements in error reporting will require CachedPlans to have a copy of that string anyway, so it's not clear that it's worth moving mountains to avoid it now. This also changes the APIs for context statistics routines so that the context-specific routines no longer assume that output goes straight to stderr, nor do they know all details of the output format. This is useful immediately to reduce code duplication, and it also allows for external code to do something with stats output that's different from printing to stderr. The reason for pushing this now rather than waiting for v12 is that it rethinks some of the API changes made by commit 9fa6f00b1. Seems better for extension authors to endure just one round of API changes not two. Discussion: https://postgr.es/m/CAB=Je-FdtmFZ9y9REHD7VsSrnCkiBhsA4mdsLKSPauwXtQBeNA@mail.gmail.com
* Handle heap rewrites even better in logical decodingPeter Eisentraut2018-03-21
| | | | | | | | | | | | | | | | | Logical decoding should not publish anything about tables created as part of a heap rewrite during DDL. Those tables don't exist externally, so consumers of logical decoding cannot do anything sensible with that information. In ab28feae2bd3d4629bd73ae3548e671c57d785f0, we worked around this for built-in logical replication, but that was hack. This is a more proper fix: We mark such transient heaps using the new field pg_class.relwrite, linking to the original relation OID. By default, we ignore them in logical decoding before they get to the output plugin. Optionally, a plugin can register their interest in getting such changes, if they handle DDL specially, in which case the new field will help them get information about the actual table. Reviewed-by: Craig Ringer <craig@2ndquadrant.com>
* Refrain from duplicating data in reorderbuffersAlvaro Herrera2018-03-06
| | | | | | | | | | | | | | If a walsender exits leaving data in reorderbuffers, the next walsender that tries to decode the same transaction would append its decoded data in the same spill files without truncating it first, which effectively duplicate the data. Avoid that by removing any leftover reorderbuffer spill files when a walsender starts. Backpatch to 9.4; this bug has been there from the very beginning of logical decoding. Author: Craig Ringer, revised by me Reviewed by: Álvaro Herrera, Petr Jelínek, Masahiko Sawada
* Fix Latin spellingPeter Eisentraut2018-01-11
| | | | "c.f." should be "cf.".
* Fix failure to delete spill files of aborted transactionsAlvaro Herrera2018-01-05
| | | | | | | | | | | | | | | | | | Logical decoding's reorderbuffer.c may spill transaction files to disk when transactions are large. These are supposed to be removed when they become "too old" by xid; but file removal requires the boundary LSNs of the transaction to be known. The final_lsn is only set when we see the commit or abort record for the transaction, but nothing sets the value for transactions that crash, so the removal code misbehaves -- in assertion-enabled builds, it crashes by a failed assertion. To fix, modify the final_lsn of transactions that don't have a value set, to the LSN of the very latest change in the transaction. This causes the spilled files to be removed appropriately. Author: Atsushi Torikoshi Reviewed-by: Kyotaro HORIGUCHI, Craig Ringer, Masahiko Sawada Discussion: https://postgr.es/m/54e4e488-186b-a056-6628-50628e4e4ebc@lab.ntt.co.jp
* Update copyright for 2018Bruce Momjian2018-01-02
| | | | Backpatch-through: certain files through 9.3
* Rethink MemoryContext creation to improve performance.Tom Lane2017-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes a number of interrelated changes to reduce the overhead involved in creating/deleting memory contexts. The key ideas are: * Include the AllocSetContext header of an aset.c context in its first malloc request, rather than allocating it separately in TopMemoryContext. This means that we now always create an initial or "keeper" block in an aset, even if it never receives any allocation requests. * Create freelists in which we can save and recycle recently-destroyed asets (this idea is due to Robert Haas). * In the common case where the name of a context is a constant string, just store a pointer to it in the context header, rather than copying the string. The first change eliminates a palloc/pfree cycle per context, and also avoids bloat in TopMemoryContext, at the price that creating a context now involves a malloc/free cycle even if the context never receives any allocations. That would be a loser for some common usage patterns, but recycling short-lived contexts via the freelist eliminates that pain. Avoiding copying constant strings not only saves strlen() and strcpy() overhead, but is an essential part of the freelist optimization because it makes the context header size constant. Currently we make no attempt to use the freelist for contexts with non-constant names. (Perhaps someday we'll need to think harder about that, but in current usage, most contexts with custom names are long-lived anyway.) The freelist management in this initial commit is pretty simplistic, and we might want to refine it later --- but in common workloads that will never matter because the freelists will never get full anyway. To create a context with a non-constant name, one is now required to call AllocSetContextCreateExtended and specify the MEMCONTEXT_COPY_NAME option. AllocSetContextCreate becomes a wrapper macro, and it includes a test that will complain about non-string-literal context name parameters on gcc and similar compilers. An unfortunate side effect of making AllocSetContextCreate a macro is that one is now *required* to use the size parameter abstraction macros (ALLOCSET_DEFAULT_SIZES and friends) with it; the pre-9.6 habit of writing out individual size parameters no longer works unless you switch to AllocSetContextCreateExtended. Internally to the memory-context-related modules, the context creation APIs are simplified, removing the rather baroque original design whereby a context-type module called mcxt.c which then called back into the context-type module. That saved a bit of code duplication, but not much, and it prevented context-type modules from exercising control over the allocation of context headers. In passing, I converted the test-and-elog validation of aset size parameters into Asserts to save a few more cycles. The original thought was that callers might compute size parameters on the fly, but in practice nobody does that, so it's useless to expend cycles on checking those numbers in production builds. Also, mark the memory context method-pointer structs "const", just for cleanliness. Discussion: https://postgr.es/m/2264.1512870796@sss.pgh.pa.us