aboutsummaryrefslogtreecommitdiff
path: root/src/backend
Commit message (Collapse)AuthorAge
* Flexible options for CREATE_REPLICATION_SLOT.Robert Haas2021-10-05
| | | | | | | | | | | | | | | | | | | | Like BASE_BACKUP, CREATE_REPLICATION_SLOT has historically used a hard-coded syntax. To improve future extensibility, adopt a flexible options syntax here, too. In the new syntax, instead of three mutually exclusive options EXPORT_SNAPSHOT, USE_SNAPSHOT, and NOEXPORT_SNAPSHOT, there is now a single SNAPSHOT option with three possible values: 'export', 'use', and 'nothing'. This commit does not remove support for the old syntax. It just adds the new one as an additional option, makes pg_receivewal, pg_recvlogical, and walreceiver processes use it. Patch by me, reviewed by Fabien Coelho, Sergei Kornilov, and Fujii Masao. Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
* Flexible options for BASE_BACKUP.Robert Haas2021-10-05
| | | | | | | | | | | | | | | | | | | | | | | | Previously, BASE_BACKUP used an entirely hard-coded syntax, but that's hard to extend. Instead, adopt the same kind of syntax we've used for SQL commands such as VACUUM, ANALYZE, COPY, and EXPLAIN, where it's not necessary for all of the option names to be parser keywords. In the new syntax, most of the options now take an optional Boolean argument. To match our practice in other in places, the options which the old syntax called NOWAIT and NOVERIFY_CHECKSUMS options are in the new syntax called WAIT and VERIFY_CHECKUMS, and the default value is false. In the new syntax, the FAST option has been replaced by a CHECKSUM option whose value may be 'fast' or 'spread'. This commit does not remove support for the old syntax. It just adds the new one as an additional option, and makes pg_basebackup prefer the new syntax when the server is new enough to support it. Patch by me, reviewed and tested by Fabien Coelho, Sergei Kornilov, Fujii Masao, and Tushar Ahuja. Discussion: http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com Discussion: http://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
* Make recovery report error message when invalid page header is found.Fujii Masao2021-10-06
| | | | | | | | | | | | | | | | | | | | | | | | Commit 0668719801 changed XLogPageRead() so that it validated the page header, if invalid page header was found reset the error message and retried reading the page, to fix the scenario where streaming standby got stuck at a continuation record. This change hid the error message about invalid page header, which would make it harder for users to investigate what the actual issue was found in WAL. To fix the issue, this commit makes XLogPageRead() report the error message when invalid page header is found. When not in standby mode, an invalid page header should cause recovery to end, not retry reading the page, so XLogPageRead() doesn't need to validate the page header for the retry. Instead, ReadPageInternal() should be responsible for the validation in that case. Therefore this commit changes XLogPageRead() so that if not in standby mode it doesn't validate the page header for the retry. Reported-by: Yugo Nagata Author: Yugo Nagata, Kyotaro Horiguchi Reviewed-by: Ranier Vilela, Fujii Masao Discussion: https://postgr.es/m/20210718045505.32f463ed6c227111038d8ae4@sraoss.co.jp
* Remove obsolete comment in snapbuild.c.Amit Kapila2021-10-05
| | | | | | | | | Commits 955a684e04 and a975ff4980 removed the usage of running xacts information from serialized snapshots but forgot to remove the corresponding comment. Author: Masahiko Sawada Discussion: https://postgr.es/m/CAD21AoBifOr7RS=jRe7YCavc646y9omChv6zkWXvJeZcjS9mXA@mail.gmail.com
* Make Unicode makefile parallel-safePeter Eisentraut2021-10-04
| | | | | | | | | Fix the rules so that each rule is parallel safe, using the same trickery that we use elsewhere in the tree for rules that produce more than one output file. Refactor the whole makefile so that there is less repetition. Discussion: https://www.postgresql.org/message-id/18e34084-aab1-1b4c-edd1-c4f9fb04f714%40enterprisedb.com
* Fix duplicate words in commentsDaniel Gustafsson2021-10-04
| | | | | | | Remove accidentally duplicated words in code comments. Author: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://postgr.es/m/87bl45t0co.fsf@wibble.ilmari.org
* Update Unicode map text filesPeter Eisentraut2021-10-04
| | | | | | | A couple of newer ones are available. There are no functional differences, but let's get them in anyway, so that there is no surprise diff next time someone wants to do some actual work in this area.
* Replace occurrences of InvalidXid with InvalidTransactionIdDaniel Gustafsson2021-10-04
| | | | | | | | | While Xid is a known shortening of TransactionId, InvalidXid is not defined in the code. Fix comments which mistakenly were using the shorter version. Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CALj2ACUQzdigML868nV4cojfELPkEzNLNOk7b91Pho4JB90fng@mail.gmail.com
* Fix snapshot builds during promotion of hot standby node with 2PCMichael Paquier2021-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some specific logic is done at the end of recovery when involving 2PC transactions: 1) Call RecoverPreparedTransactions(), to recover the state of 2PC transactions into memory (re-acquire locks, etc.). 2) ShutdownRecoveryTransactionEnvironment(), to move back to normal operations, mainly cleaning up recovery locks and KnownAssignedXids (including any 2PC transaction tracked previously). 3) Switch XLogCtl->SharedRecoveryState to RECOVERY_STATE_DONE, which is the tipping point for any process calling RecoveryInProgress() to check if the cluster is still in recovery or not. Any snapshot taken between steps 2) and 3) would be empty, causing any transaction relying on a snapshot at this point to potentially corrupt data as there could still be some 2PC transactions to track, with RecentXmin moving backwards on successive calls to GetSnapshotData() in the same transaction. As SharedRecoveryState is the point to take into account to know if it is safe to discard KnownAssignedXids, this commit moves step 2) after step 3), so as we can never finish with empty snapshots. This exists since the introduction of hot standby, so backpatch all the way down. The window with incorrect snapshots is extremely small, but I have seen it when running 023_pitr_prepared_xact.pl, as did buildfarm member fairywren. Thomas Munro also found it independently. Special thanks to Andres Freund for taking the time to analyze this issue. Reported-by: Thomas Munro, Michael Paquier Analyzed-by: Andres Freund Discussion: https://postgr.es/m/20210422203603.fdnh3fu2mmfp2iov@alap3.anarazel.de Backpatch-through: 9.6
* Fix checking of query type in plpgsql's RETURN QUERY command.Tom Lane2021-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to v14, we insisted that the query in RETURN QUERY be of a type that returns tuples. (For instance, INSERT RETURNING was allowed, but not plain INSERT.) That happened indirectly because we opened a cursor for the query, so spi.c checked SPI_is_cursor_plan(). As a consequence, the error message wasn't terribly on-point, but at least it was there. Commit 2f48ede08 lost this detail. Instead, plain RETURN QUERY insisted that the query be a SELECT (by checking for SPI_OK_SELECT) while RETURN QUERY EXECUTE failed to check the query type at all. Neither of these changes was intended. The only convenient place to check this in the EXECUTE case is inside _SPI_execute_plan, because we haven't done parse analysis until then. So we need to pass down a flag saying whether to enforce that the query returns tuples. Fortunately, we can squeeze another boolean into struct SPIExecuteOptions without an ABI break, since there's padding space there. (It's unlikely that any extensions would already be using this new struct, but preserving ABI in v14 seems like a smart idea anyway.) Within spi.c, it seemed like _SPI_execute_plan's parameter list was already ridiculously long, and I didn't want to make it longer. So I thought of passing SPIExecuteOptions down as-is, allowing that parameter list to become much shorter. This makes the patch a bit more invasive than it might otherwise be, but it's all internal to spi.c, so that seems fine. Per report from Marc Bachmann. Back-patch to v14 where the faulty code came in. Discussion: https://postgr.es/m/1F2F75F0-27DF-406F-848D-8B50C7EEF06A@gmail.com
* Enable deduplication in system catalog indexes.Peter Geoghegan2021-10-02
| | | | | | | | | | | | | | | The "equality implies image equality" opclass infrastructure disallowed deduplication in system catalog indexes and TOAST indexes before now. That seemed like the right approach back when the infrastructure was added by commit 612a1ab7, since ALTER INDEX cannot set deduplicate_items to 'off' (due to an old implementation restriction). But that decision now seems arbitrary at best. Remove special case handling implementing this policy. No catversion bump, since existing catalog indexes will still work. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-Wz=rYQHFaJ3WYBdK=xgwxKzaiGMSSrh-ZCREa-pS-7Zjew@mail.gmail.com
* Error out if SKIP LOCKED and WITH TIES are both specifiedAlvaro Herrera2021-10-01
| | | | | | | | | | | | | | | Both bugs #16676[1] and #17141[2] illustrate that the combination of SKIP LOCKED and FETCH FIRST WITH TIES break expectations when it comes to rows returned to other sessions accessing the same row. Since this situation is detectable from the syntax and hard to fix otherwise, forbid for now, with the potential to fix in the future. [1] https://postgr.es/m/16676-fd62c3c835880da6@postgresql.org [2] https://postgr.es/m/17141-913d78b9675aac8e@postgresql.org Backpatch-through: 13, where WITH TIES was introduced Author: David Christensen <david.christensen@crunchydata.com> Discussion: https://postgr.es/m/CAOxo6XLPccCKru3xPMaYDpa+AXyPeWFs+SskrrL+HKwDjJnLhg@mail.gmail.com
* Remove unstable, unnecessary test; fix typoAlvaro Herrera2021-10-01
| | | | | | | | | | | | | Commit ff9f111bce24 added some test code that's unportable and doesn't add meaningful coverage. Remove it rather than try and get it to work everywhere. While at it, fix a typo in a log message added by the aforementioned commit. Backpatch to 14. Discussion: https://postgr.es/m/3000074.1632947632@sss.pgh.pa.us
* Avoid believing incomplete MCV-only stats in get_variable_range().Tom Lane2021-10-01
| | | | | | | | | | | | | | | | | | | | | | get_variable_range() would incautiously believe that statistics containing only an MCV list are sufficient to derive a range estimate. That's okay for an enum-like column that contains only MCVs, but otherwise the estimate could be pretty bad. Make it report that the range is indeterminate unless the MCVs plus nullfrac account for the whole table. I don't think this needs a dedicated test case, since a quick code coverage check verifies that the existing regression tests traverse all the alternatives. There is room to doubt that a future-proof test case could be built anyway, given that the submitted example accidentally doesn't fail before v11. Per bug #17207 from Simon Perepelitsa. Back-patch to v10. In principle this has been broken all along, but I'm hesitant to make such changes in 9.6, since if anyone is unhappy with 9.6.24's behavior there will be no second chance to fix it. Discussion: https://postgr.es/m/17207-5265aefa79e333b4@postgresql.org
* Fix Portal snapshot tracking to handle subtransactions properly.Tom Lane2021-10-01
| | | | | | | | | | | | | | | | | | | | | | | Commit 84f5c2908 forgot to consider the possibility that EnsurePortalSnapshotExists could run inside a subtransaction with lifespan shorter than the Portal's. In that case, the new active snapshot would be popped at the end of the subtransaction, leaving a dangling pointer in the Portal, with mayhem ensuing. To fix, make sure the ActiveSnapshot stack entry is marked with the same subtransaction nesting level as the associated Portal. It's certainly safe to do so since we won't be here at all unless the stack is empty; hence we can't create an out-of-order stack. Let's also apply this logic in the case where PortalRunUtility sets portalSnapshot, just to be sure that path can't cause similar problems. It's slightly less clear that that path can't create an out-of-order stack, so add an assertion guarding it. Report and patch by Bertrand Drouvot (with kibitzing by me). Back-patch to v11, like the previous commit. Discussion: https://postgr.es/m/ff82b8c5-77f4-3fe7-6028-fcf3303e82dd@amazon.com
* Ensure interleaved_parts field is always initializedDavid Rowley2021-10-01
| | | | | | | | | | | | | | | This field was recently added in db632fbca, however that commit missed one place where it should have initialized the new field to NULL. The missed location is where the PartitionBoundInfo is created for partition-wise join relations. Technically there could be interleaved partitions in a partition-wise join relation, but currently the only optimization we use this field for only does so for base rels and other member rels. So just document that we don't populate this field for join rels. Reported-by: Amit Langote Author: Amit Langote, David Rowley Reviewed-by: Amit Langote, David Rowley Discussion: https://postgr.es/m/CA+HiwqE76Rps24kwHsd2Cr82Ua07tJC9t9reG0c7ScX9n_xrEA@mail.gmail.com
* Treat ETIMEDOUT as indicating a non-recoverable connection failure.Tom Lane2021-09-30
| | | | | | | | | | | | | | | | | | | | | | Add ETIMEDOUT to ALL_CONNECTION_FAILURE_ERRNOS' list of "errnos that identify hard failure of a previously-established network connection". While one could imagine that this is sometimes recoverable, the same could be said of other entries such as ENETDOWN. In support of this, handle ETIMEDOUT on par with other socket errors in relevant infrastructure, such as TranslateSocketError(). (I made a couple of cosmetic adjustments in TranslateSocketError(), too.) The code now assumes that ETIMEDOUT is defined everywhere, which it should be given that POSIX has required it since SUSv2. Perhaps this should be back-patched, but I'm hesitant to do so given the lack of previous complaints, and the hazard that there's a small ABI break on Windows from redefining the symbol. Even if we decide to do that, it'd be prudent to let this bake awhile in HEAD first. Jelte Fennema Discussion: https://postgr.es/m/AM5PR83MB01782BFF2978505F6D6C559AF7AA9@AM5PR83MB0178.EURPRD83.prod.outlook.com
* Fix WAL replay in presence of an incomplete recordAlvaro Herrera2021-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Physical replication always ships WAL segment files to replicas once they are complete. This is a problem if one WAL record is split across a segment boundary and the primary server crashes before writing down the segment with the next portion of the WAL record: WAL writing after crash recovery would happily resume at the point where the broken record started, overwriting that record ... but any standby or backup may have already received a copy of that segment, and they are not rewinding. This causes standbys to stop following the primary after the latter crashes: LOG: invalid contrecord length 7262 at A8/D9FFFBC8 because the standby is still trying to read the continuation record (contrecord) for the original long WAL record, but it is not there and it will never be. A workaround is to stop the replica, delete the WAL file, and restart it -- at which point a fresh copy is brought over from the primary. But that's pretty labor intensive, and I bet many users would just give up and re-clone the standby instead. A fix for this problem was already attempted in commit 515e3d84a0b5, but it only addressed the case for the scenario of WAL archiving, so streaming replication would still be a problem (as well as other things such as taking a filesystem-level backup while the server is down after having crashed), and it had performance scalability problems too; so it had to be reverted. This commit fixes the problem using an approach suggested by Andres Freund, whereby the initial portion(s) of the split-up WAL record are kept, and a special type of WAL record is written where the contrecord was lost, so that WAL replay in the replica knows to skip the broken parts. With this approach, we can continue to stream/archive segment files as soon as they are complete, and replay of the broken records will proceed across the crash point without a hitch. Because a new type of WAL record is added, users should be careful to upgrade standbys first, primaries later. Otherwise they risk the standby being unable to start if the primary happens to write such a record. A new TAP test that exercises this is added, but the portability of it is yet to be seen. This has been wrong since the introduction of physical replication, so backpatch all the way back. In stable branches, keep the new XLogReaderState members at the end of the struct, to avoid an ABI break. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Nathan Bossart <bossartn@amazon.com> Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql
* Clarify use of "statistics objects" in the codeMichael Paquier2021-09-29
| | | | | | | | | | | | | | | The code inconsistently used "statistic object" or "statistics" where the correct term, as discussed, is actually "statistics object". This improves the state of the code to be more consistent. While on it, fix an incorrect error message introduced in a4d75c8. This error should never happen, as the code states, but it would be misleading. Author: Justin Pryzby Reviewed-by: Álvaro Herrera, Michael Paquier Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com Backpatch-through: 14
* Refactor output file handling when forking syslogger under EXEC_BACKENDMichael Paquier2021-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | A forked logging collector in EXEC_BACKEND builds passes down file descriptors (or HANDLEs in WIN32) through a command for files to be reopened (for stderr and csvlog). Some of its logic was duplicated, and this commit refactors the code with some wrapper routines for file reopening after forking and fd grabbing when building the command for the fork. While on it, this simplifies a use of "long" in the code, introduced by ab0ba6e to take care of a warning related to MinGW-W64 when mapping a intptr_t to a printed value. "long" is 32-bit long on Windows, and interoperability of Win32 and Win64 ensures that handles are always 32-bit significant, so we can just use "int" for the same result. This also makes the new routines more symmetric. This change makes easier the introduction of new log destinations in the logging collector, and this is not the only piece of refactoring planned. I have tested this change with EXEC_BACKEND on linux, macos, and of course MSVC (both Win32 and Win64), but not MinGW so the buildfarm may have something to say here. Author: Sehrope Sarkuni, Michael Paquier Discussion: https://postgr.es/m/CAH7T-aqswBM6JWe4pDehi1uOiufqe06DJWaU5=X7dDLyqUExHg@mail.gmail.com
* Fix typos and grammar in code commentsMichael Paquier2021-09-27
| | | | | | | | | | | Several mistakes have piled in the code comments over the time, including incorrect grammar, function names and simple typos. This commit takes care of a portion of these. No backpatch is done as this is only cosmetic. Author: Justin Pryzby Discussion: https://postgr.es/m/20210924215827.GS831@telsasoft.com
* Remove unneeded nbtree latestRemovedXid comments.Peter Geoghegan2021-09-26
| | | | | | | | | | Discussing the low level issue of nbtree VACUUM and recovery conflicts in btvacuumpage() now seems inappropriate. The same issue is discussed in nbtxlog.h, as well as in a comment block above _bt_delitems_vacuum(). The comment block made more sense when it was part of a broader discussion of nbtree VACUUM "pin scans". These were removed by commit 9f83468b.
* Track LLVM 14 API changes.Thomas Munro2021-09-27
| | | | | | | | | Only done on the master branch for now to fix build farm animal seawasp (which tests bleeeding edge PostgreSQL with bleeding edge LLVM). We can back-patch a consolidated fix closer to LLVM 14's release, once its API has stopped moving around. Discussion: https://postgr.es/m/CA%2BhUKGL%3Dyg6qqgg6W6SAuvRQejditeoDNy-X3b9H_6Fnw8j5Wg%40mail.gmail.com
* Avoid unnecessary division in interval_cmp_value().Tom Lane2021-09-26
| | | | | | | | | Splitting the time field into days and microseconds is pretty useless when we're just going to recombine those values. It's unclear if anyone will notice the speedup in real-world cases, but a cycle shaved is a cycle earned. Discussion: https://postgr.es/m/2629129.1632675713@sss.pgh.pa.us
* Update obsolete nbtree deletion comments.Peter Geoghegan2021-09-25
| | | | | | | | _bt_delitems_delete() is no longer the high-level entry point used by index tuple deletion driven by index tuples whose LP_DEAD bits are set (now called "simple index tuple deletion"). It became a lower level routine that's only called by _bt_delitems_delete_check() following commit d168b66682.
* vacuumlazy.c: Remove obsolete 'onecall' comment.Peter Geoghegan2021-09-25
| | | | | | | | | | | | Remove obsolete reference to lazy_vacuum()'s onecall argument. The function argument was removed by commit 3499df0dee. Also remove adjoining comment block that introduces the wraparound failsafe concept. Talking about the failsafe here no longer makes sense, since lazy_vacuum() (and related functions) are no longer the only place where the failsafe might be triggered. This has been the case since commit c242baa4a8 taught VACUUM to consider triggering the failsafe mechanism during its initial heap scan.
* nbtree README: Add note about latestRemovedXid.Peter Geoghegan2021-09-24
| | | | | | | | | | | Point out that index tuple deletion generally needs a latestRemovedXid value for the deletion operation's WAL record. This is bound to be the most expensive part of the whole deletion operation now that it takes place up front, during original execution. This was arguably an oversight in commit 558a9165e08, which moved the work required to generate these values from index deletion REDO routines to original execution of index deletion operations.
* Release memory allocated by dependency_degreeTomas Vondra2021-09-23
| | | | | | | | | | | | | | | | | Calculating degree of a functional dependency may allocate a lot of memory - we have released mot of the explicitly allocated memory, but e.g. detoasted varlena values were left behind. That may be an issue, because we consider a lot of dependencies (all combinations), and the detoasting may happen for each one again. Fixed by calling dependency_degree() in a dedicated context, and resetting it after each call. We only need the calculated dependency degree, so we don't need to copy anything. Backpatch to PostgreSQL 10, where extended statistics were introduced. Backpatch-through: 10 Discussion: https://www.postgresql.org/message-id/20210915200928.GP831%40telsasoft.com
* Free memory after building each statistics objectTomas Vondra2021-09-23
| | | | | | | | | | | | | | | | | | | | | | | | Until now, all extended statistics on a given relation were built in the same memory context, without resetting. Some of the memory was released explicitly, but not all of it - for example memory allocated while detoasting values is hard to free. This is how it worked since extended statistics were introduced in PostgreSQL 10, but adding support for extended stats on expressions made the issue somewhat worse as it increases the number of statistics to build. Fixed by adding a memory context which gets reset after building each statistics object (all the statistics kinds included in it). Resetting it after building each statistics kind would be even better, but it would require more invasive changes and copying of results, making it harder to backpatch. Backpatch to PostgreSQL 10, where extended statistics were introduced. Author: Justin Pryzby Reported-by: Justin Pryzby Reviewed-by: Tomas Vondra Backpatch-through: 10 Discussion: https://www.postgresql.org/message-id/20210915200928.GP831%40telsasoft.com
* Document issue with heapam line pointer truncation.Peter Geoghegan2021-09-22
| | | | | | | | | | | | Checking that an offset number isn't past the end of a heap page's line pointer array was just a defensive sanity check for HOT-chain traversal code before commit 3c3b8a4b. It's etrictly necessary now, though. Add comments that reference the issue to code in heapam that needs to get it right. Per suggestion from Alexander Lakhin. Discussion: https://postgr.es/m/f76a292c-9170-1aef-91a0-59d9443b99a3@gmail.com
* Make use of PG_INT64_MAX/PG_INT64_MINPeter Eisentraut2021-09-22
| | | | | This code was written before those symbols were introduced, but now we can simplify it.
* Invalidate all partitions for a partitioned table in publication.Amit Kapila2021-09-22
| | | | | | | | | | | | | | | | | Updates/Deletes on a partition were allowed even without replica identity after the parent table was added to a publication. This would later lead to an error on subscribers. The reason was that we were not invalidating the partition's relcache and the publication information for partitions was not getting rebuilt. Similarly, we were not invalidating the partitions' relcache after dropping a partitioned table from a publication which will prohibit Updates/Deletes on its partition without replica identity even without any publication. Reported-by: Haiying Tang Author: Hou Zhijie and Vignesh C Reviewed-by: Vignesh C and Amit Kapila Backpatch-through: 13 Discussion: https://postgr.es/m/OS0PR01MB6113D77F583C922F1CEAA1C3FBD29@OS0PR01MB6113.jpnprd01.prod.outlook.com
* Add parent table name in an error in reorderbuffer.c.Amit Kapila2021-09-22
| | | | | | | | | This can help in troubleshooting the cause of a particular error that can occur during decoding. Author: Jeremy Schneider Reviewed-by: Amit Kapila Discussion: https://postgr.es/m/808ed65b-994c-915a-361c-577f088b837f@amazon.com
* Fix "single value strategy" index deletion issue.Peter Geoghegan2021-09-21
| | | | | | | | | | | | | | | | | | | | | | | | It is not appropriate for deduplication to apply single value strategy when triggered by a bottom-up index deletion pass. This wastes cycles because later bottom-up deletion passes will overinterpret older duplicate tuples that deduplication actually just skipped over "by design". It also makes bottom-up deletion much less effective for low cardinality indexes that happen to cross a meaningless "index has single key value per leaf page" threshold. To fix, slightly narrow the conditions under which deduplication's single value strategy is considered. We already avoided the strategy for a unique index, since our high level goal must just be to buy time for VACUUM to run (not to buy space). We'll now also avoid it when we just had a bottom-up pass that reported failure. The two cases share the same high level goal, and already overlapped significantly, so this approach is quite natural. Oversight in commit d168b666, which added bottom-up index deletion. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznaOvM+Gyj-JQ0X=JxoMDxctDTYjiEuETdAGbF5EUc3MA@mail.gmail.com Backpatch: 14-, where bottom-up deletion was introduced.
* Fix misevaluation of STABLE parameters in CALL within plpgsql.Tom Lane2021-09-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Before commit 84f5c2908, a STABLE function in a plpgsql CALL statement's argument list would see an up-to-date snapshot, because exec_stmt_call would push a new snapshot. I got rid of that because the possibility of the snapshot disappearing within COMMIT made it too hard to manage a snapshot across the CALL statement. That's fine so far as the procedure itself goes, but I forgot to think about the possibility of STABLE functions within the CALL argument list. As things now stand, those'll be executed with the Portal's snapshot as ActiveSnapshot, keeping them from seeing updates more recent than Portal startup. (VOLATILE functions don't have a problem because they take their own snapshots; which indeed is also why the procedure itself doesn't have a problem. There are no STABLE procedures.) We can fix this by pushing a new snapshot transiently within ExecuteCallStmt itself. Popping the snapshot before we get into the procedure proper eliminates the management problem. The possibly-useless extra snapshot-grab is slightly annoying, but it's no worse than what happened before 84f5c2908. Per bug #17199 from Alexander Nawratil. Back-patch to v11, like the previous patch. Discussion: https://postgr.es/m/17199-1ab2561f0d94af92@postgresql.org
* Document XLOG_INCLUDE_XID a little betterAlvaro Herrera2021-09-21
| | | | | | | | | | | | I noticed that commit 0bead9af484c left this flag undocumented in XLogSetRecordFlags, which led me to discover that the flag doesn't actually do what the one comment on it said it does. Improve the situation by adding some more comments. Backpatch to 14, where the aforementioned commit appears. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/202109212119.c3nhfp64t2ql@alvherre.pgsql
* Introduce GUC shared_memory_size_in_huge_pagesMichael Paquier2021-09-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This runtime-computed GUC shows the number of huge pages required for the server's main shared memory area, taking advantage of the work done in 0c39c29 and 0bd305e. This is useful for users to estimate the amount of huge pages required for a server as it becomes possible to do an estimation without having to start the server and potentially allocate a large chunk of shared memory. The number of huge pages is calculated based on the existing GUC huge_page_size if set, or by using the system's default by looking at /proc/meminfo on Linux. There is nothing new here as this commit reuses the existing calculation methods, and just exposes this information directly to the user. The routine calculating the huge page size is refactored to limit the number of files with platform-specific flags. This new GUC's name was the most popular choice based on the discussion done. This is only supported on Linux. I have taken the time to test the change on Linux, Windows and MacOS, though for the last two ones large pages are not supported. The first one calculates correctly the number of pages depending on the existing GUC huge_page_size or the system's default. Thanks to Andres Freund, Robert Haas, Kyotaro Horiguchi, Tom Lane, Justin Pryzby (and anybody forgotten here) for the discussion. Author: Nathan Bossart Discussion: https://postgr.es/m/F2772387-CE0F-46BF-B5F1-CC55516EB885@amazon.com
* Remove overzealous index deletion assertion.Peter Geoghegan2021-09-20
| | | | | | | | | | | | | | | | | | | | A broken HOT chain is not an unexpected condition, even when the offset number points past the end of the page's line pointer array. heap_prune_chain() does not (and never has) treated this condition as unexpected, so derivative code in heap_index_delete_tuples() shouldn't do so either. Oversight in commit 4228817449. The assertion can probably only fail on Postgres 14 and master. Earlier releases don't have commit 3c3b8a4b, which taught VACUUM to truncate the line pointer array of heap pages. Backpatch all the same, just to be consistent. Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/17197-9438f31f46705182@postgresql.org Backpatch: 12-, just like commit 4228817449.
* pgstat: Prepare to use mechanism for truncated rels also for droppped rels.Andres Freund2021-09-20
| | | | | | | | | | | | | The upcoming shared memory stats patch drops stats for dropped objects in a transactional manner, rather than removing them later as part of vacuum. This means that stats for DROP inside a transaction needs to handle aborted (sub-)transactions similar to TRUNCATE: The stats up to the DROP should be restored. Rename the existing infrastructure in preparation. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de
* pgstat: Split out relation stats handling from AtEO[Sub]Xact_PgStat() etc.Andres Freund2021-09-20
| | | | | | | | | An upcoming patch will add additional work to these functions. To avoid the functions getting too complicated / doing too many things at once, split out sub-tasks into their own functions. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de
* Disallow extended statistics on system columnsTomas Vondra2021-09-20
| | | | | | | | | | | | | | | | | | | Since introduction of extended statistics, we've disallowed references to system columns. So for example CREATE STATISTICS s ON ctid FROM t; would fail. But with extended statistics on expressions, it was possible to work around this limitation quite easily CREATE STATISTICS s ON (ctid::text) FROM t; This is an oversight in a4d75c86bf, fixed by adding a simple check. Backpatch to PostgreSQL 14, where support for extended statistics on expressions was introduced. Backpatch-through: 14 Discussion: https://postgr.es/m/20210816013255.GS10479%40telsasoft.com
* process startup: Split single user code out of PostgresMain().Andres Freund2021-09-17
| | | | | | | | | | | | | | | It was harder than necessary to understand PostgresMain() because the code for a normal backend was interspersed with single-user mode specific code. Split most of the single-user mode code into its own function PostgresSingleUserMain(), that does all the necessary setup for single-user mode, and then hands off after that to PostgresMain(). There still is some single-user mode code in InitPostgres(), and it'd likely be worth moving at least some of it out. But that's for later. Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
* Fix pull_varnos to cope with translated PlaceHolderVars.Tom Lane2021-09-17
| | | | | | | | | | | | | | | | | | Commit 55dc86eca changed pull_varnos to use (if possible) the associated ph_eval_at for a PlaceHolderVar. I missed a fine point though: we might be looking at a PHV in the quals or tlist of a child appendrel, in which case we need to compute a ph_eval_at value that's been translated in the same way that the PHV itself has been (cf. adjust_appendrel_attrs). Fortunately, enough info is available in the PlaceHolderInfo to make such translation possible without additional outside data, so we don't need another round of uglification of planner APIs. This is a little bit complicated, but since it's a hard-to-hit corner case, I'm not much worried about adding cycles here. Per report from Jaime Casanova. Back-patch to v12, like the previous commit. Discussion: https://postgr.es/m/20210915230959.GB17635@ahch-to
* Fix EXPLAIN to handle SEARCH BREADTH FIRST queries.Tom Lane2021-09-16
| | | | | | | | | | | | | | | | | | | The rewriter transformation for SEARCH BREADTH FIRST produces a FieldSelect on a Var of type RECORD, where the Var references the recursive union's worktable output. EXPLAIN VERBOSE failed to handle this case, because it only expected such Vars to appear in CteScans not WorkTableScans. Fix that, and add some test cases exercising EXPLAIN on SEARCH and CYCLE queries. In principle this oversight is an old bug, but it seems that the case is unreachable without SEARCH BREADTH FIRST, because the parser fails when attempting to create such a reference manually. So for today I'll just patch HEAD/v14. Someday we might find that the code portion of this patch needs to be back-patched further. Per report from Atsushi Torikoshi. Discussion: https://postgr.es/m/5bafa66ad529e11860339565c9e7c166@oss.nttdata.com
* Message style improvementsPeter Eisentraut2021-09-16
|
* process startup: Do InitProcess() at the same time regardless of EXEC_BACKEND.Andres Freund2021-09-16
| | | | | | | | | An upcoming patch splits single user mode into its own function. This makes that easier. Split out for easier review / testing. Reviewed-By: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de
* Fix performance regression from session statistics.Andres Freund2021-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Session statistics, as introduced by 960869da08, had several shortcomings: - an additional GetCurrentTimestamp() call that also impaired the accuracy of the data collected This can be avoided by passing the current timestamp we already have in pgstat_report_stat(). - an additional statistics UDP packet sent every 500ms This is solved by adding the new statistics to PgStat_MsgTabstat. This is conceptually ugly, because session statistics are not table statistics. But the struct already contains data unrelated to tables, so there is not much damage done. Connection and disconnection are reported in separate messages, which reduces the number of additional messages to two messages per session and a slight increase in PgStat_MsgTabstat size (but the same number of table stats fit). - Session time computation could overflow on systems where long is 32 bit. Reported-By: Andres Freund <andres@anarazel.de> Author: Andres Freund <andres@anarazel.de> Author: Laurenz Albe <laurenz.albe@cybertec.at> Discussion: https://postgr.es/m/20210801205501.nyxzxoelqoo4x2qc%40alap3.anarazel.de Backpatch: 14-, where the feature was introduced.
* Fix variable shadowing in procarray.c.Fujii Masao2021-09-16
| | | | | | | | | | | | | ProcArrayGroupClearXid function has a parameter named "proc", but the same name was used for its local variables. This commit fixes this variable shadowing, to improve code readability. Back-patch to all supported versions, to make future back-patching easy though this patch is classified as refactoring only. Reported-by: Ranier Vilela Author: Ranier Vilela, Aleksander Alekseev https://postgr.es/m/CAEudQAqyoTZC670xWi6w-Oe2_Bk1bfu2JzXz6xRfiOUzm7xbyQ@mail.gmail.com
* Use int instead of size_t in procarray.c.Fujii Masao2021-09-16
| | | | | | | | | | | | | | All size_t variables declared in procarray.c are actually int ones. Let's use int instead of size_t for those variables. Which would reduce Wsign-compare compiler warnings. Back-patch to v14 where commit 941697c3c1 added size_t variables in procarray.c, to make future back-patching easy though this patch is classified as refactoring only. Reported-by: Ranier Vilela Author: Ranier Vilela, Aleksander Alekseev https://postgr.es/m/CAEudQAqyoTZC670xWi6w-Oe2_Bk1bfu2JzXz6xRfiOUzm7xbyQ@mail.gmail.com
* Support "postgres -C" with runtime-computed GUCsMichael Paquier2021-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, the -C option of postgres was handled before a small subset of GUCs computed at runtime are initialized, leading to incorrect results as GUC machinery would fall back to default values for such parameters. For example, data_checksums could report "off" for a cluster as the control file is not loaded yet. Or wal_segment_size would show a segment size at 16MB even if initdb --wal-segsize used something else. Worse, the command would fail to properly report the recently-introduced shared_memory, that requires to load shared_preload_libraries as these could ask for a chunk of shared memory. Support for runtime GUCs comes with a limitation, as the operation is now allowed on a running server. One notable reason for this is that _PG_init() functions of loadable libraries are called before all runtime-computed GUCs are initialized, and this is not guaranteed to be safe to do on running servers. For the case of shared_memory_size, where we want to know how much memory would be used without allocating it, this limitation is fine. Another case where this will help is for huge pages, with the introduction of a different GUC to evaluate the amount of huge pages required for a server before starting it, without having to allocate large chunks of memory. This feature is controlled with a new GUC flag, and four parameters are classified as runtime-computed as of this change: - data_checksums - shared_memory_size - data_directory_mode - wal_segment_size Some TAP tests are added to provide some coverage here, using data_checksums in the tests of pg_checksums. Per discussion with Andres Freund, Justin Pryzby, Magnus Hagander and more. Author: Nathan Bossart Discussion: https://postgr.es/m/F2772387-CE0F-46BF-B5F1-CC55516EB885@amazon.com