aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* Add tests for recovery deadlock conflicts.Andres Freund2022-05-02
| | | | | | | | | | | The recovery conflict tests added in 9f8a050f68d surfaced a bug in the interaction between buffer pin and deadlock recovery conflicts. To make sure that the bugfix won't break deadlock conflict detection, add a test for that scenario. 031_recovery_conflict.pl will later be backpatched, with this included. Discussion: https://postgr.es/m/20220413002626.udl7lll7f3o7nre7@alap3.anarazel.de
* Fix typo in comment.Etsuro Fujita2022-05-02
|
* pg_walinspect: fix case where flush LSN is in the middle of a record.Jeff Davis2022-04-30
| | | | | | | | | | | | | | | | | | | | | Instability in the test for pg_walinspect revealed that pg_get_wal_records_info_till_end_of_wal(x) would try to decode all the records with a start LSN earlier than the flush LSN, even though that might include a partial record at the end of the range. In that case, read_local_xlog_page_no_wait() would return NULL when it tried to read past the flush LSN, which would be interpreted as an error by the caller. That caused a test failure only on a BF animal that had been restarted recently, but could be expected to happen in the wild quite easily depending on the alignment of various parameters. Fix by using private data in read_local_xlog_page_no_wait() to signal end-of-wal to the caller, so that it can be properly distinguished from a real error. Discussion: https://postgr.es/m/Ymd/e5eeZMNAkrXo%40paquier.xyz Discussion: https://postgr.es/m/111657.1650910309@sss.pgh.pa.us Authors: Thomas Munro, Bharath Rupireddy.
* Tighten enforcement of variable CONSTANT markings in plpgsql.Tom Lane2022-04-30
| | | | | | | | | | | | | | | | | | | I noticed that plpgsql would allow assignment of a new value to a variable even when that variable is marked CONSTANT, if the variable is used as an output parameter in CALL or is a refcursor variable that OPEN assigns a new value to. Fix these oversights. In the CALL case, the check has to be done at runtime because we cannot know at parse time which parameters are OUT parameters. For OPEN, it seems best to likewise enforce at runtime because then we needn't throw error if the variable has a nonnull value (since OPEN will only try to overwrite a null value). Although this is surely a bug fix, no back-patch: it seems unlikely that anyone would thank us for breaking formerly-working code in minor releases. Discussion: https://postgr.es/m/214453.1651182729@sss.pgh.pa.us
* Claim SQL standard compliance for SQL/JSON featuresAndrew Dunstan2022-04-29
| | | | Discussion: https://postgr.es/m/d03d809c-d0fb-fd6a-1476-d6dc18ec940e@dunslane.net
* Fix JSON_OBJECTAGG uniquefying bugAndrew Dunstan2022-04-28
| | | | | | | | Commit f4fb45d15c contained a bug in removing items with null values when unique keys are required, where the leading items that are sorted contained such values. Fix that and add a test for it. Discussion: https://postgr.es/m/CAJA4AWQ_XbSmsNbW226UqNyRLJ+wb=iQkQMj77cQyoNkqtf=2Q@mail.gmail.com
* Disable asynchronous execution if using gating Result nodes.Etsuro Fujita2022-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | mark_async_capable_plan(), which is called from create_append_plan() to determine whether subplans are async-capable, failed to take into account that the given subplan created from a given subpath might include a gating Result node if the subpath is a SubqueryScanPath or ForeignPath, causing a segmentation fault there when the subplan created from a SubqueryScanPath includes the Result node, or causing ExecAsyncRequest() to throw an error about an unrecognized node type when the subplan created from a ForeignPath includes the Result node, because in the latter case the Result node was unintentionally considered as async-capable, but we don't currently support executing Result nodes asynchronously. Fix by modifying mark_async_capable_plan() to disable asynchronous execution in such cases. Also, adjust code in the ProjectionPath case in mark_async_capable_plan(), for consistency with other cases, and adjust/improve comments there. is_async_capable_path() added in commit 27e1f1456, which was rewritten to mark_async_capable_plan() in a later commit, has the same issue, causing the error at execution mentioned above, so back-patch to v14 where the aforesaid commit went in. Per report from Justin Pryzby. Etsuro Fujita, reviewed by Zhihong Yu and Justin Pryzby. Discussion: https://postgr.es/m/20220408124338.GK24419%40telsasoft.com
* Revert recent changes with durable_rename_excl()Michael Paquier2022-04-28
| | | | | | | | | | | | | | | This reverts commits 2c902bb and ccfbd92. Per buildfarm members kestrel, rorqual and calliphoridae, the assertions checking that a TLI history file should not exist when created by a WAL receiver have been failing, and switching to durable_rename() over durable_rename_excl() would cause the newest TLI history file to overwrite the existing one. We need to think harder about such cases, so revert the new logic for now. Note that all the failures have been reported in the test 025_stuck_on_old_timeline. Discussion: https://postgr.es/m/511362.1651116498@sss.pgh.pa.us
* Fix SQL syntax in comment in logical/worker.cJohn Naylor2022-04-28
| | | | | | Euler Taveira Disussion: https://www.postgresql.org/message-id/25f95189-eef8-43c4-9d7b-419b651963c8%40www.fastmail.com
* Remove durable_rename_excl()Michael Paquier2022-04-28
| | | | | | | | | | ccfbd92 has replaced all existing in-core callers of this function in favor of durable_rename(). durable_rename_excl() is by nature unsafe on crashes happening at the wrong time, so just remove it. Author: Nathan Bossart Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
* Replace existing durable_rename_excl() calls with durable_rename()Michael Paquier2022-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | durable_rename_excl() attempts to avoid overwriting any existing files by using link() and unlink(), falling back to rename() on some platforms (e.g., Windows where link() followed by unlink() is not concurrent-safe, see 909b449). Most callers of durable_rename_excl() use it just in case there is an existing file, but it happens that for all of them we never expect a target file to exist (WAL segment recycling, creation of timeline history file and basic_archive). basic_archive used durable_rename_excl() to avoid overwriting an archive concurrently created by another server. Now, there is a stat() call to avoid overwriting an existing archive a couple of lines above, so note that this change opens a small TOCTOU window in this module between the stat() call and durable_rename(). Furthermore, as mentioned in the top comment of durable_rename_excl(), this routine can result in multiple hard links to the same file and data corruption, with two or more links to the same file in pg_wal/ if a crash happens before the unlink() call during WAL recycling. Specifically, this would produce links to the same file for the current WAL file and the next one because the half-recycled WAL file was re-recycled during crash recovery of a follow-up cluster restart. This change replaces all calls to durable_rename_excl() with durable_rename(). This removes the protection against accidentally overwriting an existing file, but some platforms are already living without it, and all those code paths never expect an existing file (a couple of assertions are added to check after that, in case). This is a bug fix, but knowing the unlikeliness of the problem involving one of more crashes at an exceptionally bad moment, no backpatch is done. This could be revisited in the future. Author: Nathan Bossart Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13
* Fix incorrect format placeholdersPeter Eisentraut2022-04-27
|
* Handle NULL fields in WRITE_INDEX_ARRAYPeter Eisentraut2022-04-27
| | | | | | | | | | | | Unlike existing WRITE_*_ARRAY macros, WRITE_INDEX_ARRAY needs to handle the case that the field is NULL. We already have the convention to print NULL fields as "<>", so we do that here as well. There is currently no corresponding read function for this, so reading this back in is not implemented, but it could be if needed. Reported-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-LN%3DbF8f9eU2R94dJtF54DfDvBq%2BovqHnOQqbinYDrUw%40mail.gmail.com
* Add some isolation tests for CLUSTERMichael Paquier2022-04-26
| | | | | | | | | | | | | | | This commit adds two isolation tests for CLUSTER, using: - A normal table, making sure that CLUSTER blocks and completes if the table is locked by a concurrent session. - A partitioned table with a partition owned by a different user. If the partitioned table is locked by a concurrent session, CLUSTER on the partitioned table should block. If the partition owned by a different user is locked, CLUSTER on its partitioned table should complete and skip the partition. 3f19e17 has added an early check to ignore such a partition with a SQL regression test, but this was not checking that CLUSTER should not block. Discussion: https://postgr.es/m/YlqveniXn9AI6RFZ@paquier.xyz
* Inhibit mingw CRT's auto-globbing of command line argumentsAndrew Dunstan2022-04-25
| | | | | | | | | | | | | | | | For some reason by default the mingw C Runtime takes it upon itself to expand program arguments that look like shell globbing characters. That has caused much scratching of heads and mis-attribution of the causes of some TAP test failures, so stop doing that. This removes an inconsistency with Windows binaries built with MSVC, which have no such behaviour. Per suggestion from Noah Misch. Backpatch to all live branches. Discussion: https://postgr.es/m/20220423025927.GA1274057@rfd.leadboat.com
* Drop unlogged table after test is doneAlvaro Herrera2022-04-25
| | | | | | | | Another test is constructed on top of regression tests, which does not work correctly with unlogged tables. For now, cope with that by making sure no unlogged table is left behind. Per buildfarm pink after 4fb5c794e586.
* Cover brin/gin/gist/spgist ambuildempty routines in regression testsAlvaro Herrera2022-04-25
| | | | | | | | | Changing some TEMP or permanent tables to UNLOGGED is sufficient to invoke these ambuildempty routines, which were all not uncovered by any tests. These changes do not otherwise affect the test suite. Author: Amul Sul <sulamul@gmail.com> Discussion: https://postgr.es/m/CAAJ_b95nneRCLM-=qELEdgCYSk6W_++-C+Q_t+wH3SW-hF50iw@mail.gmail.com
* Always pfree strings returned by GetDatabasePathAlvaro Herrera2022-04-25
| | | | | | | | | | | | | | Several places didn't do it, and in many cases it didn't matter because it would be a small allocation in a short-lived context; but other places may accumulate a few (for example, in CreateDatabaseUsingFileCopy, one per tablespace). In most databases this is highly unlikely to be very serious either, but it seems better to make the code consistent in case there's future copy-and-paste. The only case of actual concern seems to be the aforementioned routine, which is new with commit 9c08aea6a309, so there's no need to backpatch. As pointed out by Coverity.
* Fix incautious CTE matching in rewriteSearchAndCycle().Tom Lane2022-04-23
| | | | | | | | | | | | | | | | This function looks for a reference to the recursive WITH CTE, but it checked only the CTE name not ctelevelsup, so that it could seize on a lower CTE that happened to have the same name. This would result in planner failures later, either weird errors such as "could not find attribute 2 in subquery targetlist", or crashes or assertion failures. The code also merely Assert'ed that it found a matching entry, which is not guaranteed at all by the parser. Per bugs #17320 and #17318 from Zhiyong Wu. Thanks to Kyotaro Horiguchi for investigation. Discussion: https://postgr.es/m/17320-70e37868182512ab@postgresql.org Discussion: https://postgr.es/m/17318-2eb65a3a611d2368@postgresql.org
* Test ALIGNOF_DOUBLE==4 compatibility under ALIGNOF_DOUBLE==8.Noah Misch2022-04-22
| | | | | | | | | Today's test case detected alignment problems only when executing on AIX. This change lets popular platforms detect the same problems. Reviewed by Masahiko Sawada. Discussion: https://postgr.es/m/20220415072601.GG862547@rfd.leadboat.com
* Remove some recently-added pg_dump test cases.Robert Haas2022-04-22
| | | | | | | | | | | | | Commit d2d35479796c3510e249d6fc72adbd5df918efbf included a pretty extensive set of test cases, and some of them don't work on all of our Windows machines. This happens because IPC::Run expands its arguments as shell globs on a few machines, but doesn't on most of the buildfarm. It might be good to fix that problem systematically somehow, but in the meantime, there are enough test cases for this commit that it seems OK to just remove the ones that are failing. Discussion: http://postgr.es/m/3a190754-b2b0-d02b-dcfd-4ec1610ffbcb@dunslane.net Discussion: http://postgr.es/m/CA+TgmoYRGUcFBy6VgN0+Pn4f6Wv=2H0HZLuPHqSy6VC8Ba7vdg@mail.gmail.com
* Fix performance regression in tuplesort specializationsDavid Rowley2022-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 697492434 added 3 new qsort specialization functions aimed to improve the performance of sorting many of the common pass-by-value data types when they're the leading or only sort key. Unfortunately, that has caused a performance regression when sorting datasets where many of the values being compared were equal. What was happening here was that we were falling back to the standard sort comparison function to handle tiebreaks. When the two given Datums compared equally we would incur both the overhead of an indirect function call to the standard comparer to perform the tiebreak and also the standard comparer function would go and compare the leading key needlessly all over again. Here improve the situation in the 3 new comparison functions. We now return 0 directly when the two Datums compare equally and we're performing a 1-key sort. Here we don't do anything to help the multi-key sort case where the leading key uses one of the sort specializations functions. On testing this case, even when the leading key's values are all equal, there appeared to be no performance regression. Let's leave it up to future work to optimize that case so that the tiebreak function no longer re-compares the leading key over again. Another possible fix for this would have been to add 3 additional sort specialization functions to handle single-key sorts for these pass-by-value types. The reason we didn't do that here is that we may deem some other sort specialization to be more useful than single-key sorts. It may be impractical to have sort specialization functions for every single combination of what may be useful and it was already decided that further analysis into which ones are the most useful would be delayed until the v16 cycle. Let's not let this regression force our hand into trying to make that decision for v15. Author: David Rowley Reviewed-by: John Naylor Discussion: https://postgr.es/m/CA+hUKGJRbzaAOUtBUcjF5hLtaSHnJUqXmtiaLEoi53zeWSizeA@mail.gmail.com
* Remove inadequate assertion check in CTE inlining.Tom Lane2022-04-21
| | | | | | | | | | | | | | | | | | | | inline_cte() expected to find exactly as many references to the target CTE as its cterefcount indicates. While that should be accurate for the tree as emitted by the parser, there are some optimizations that occur upstream of here that could falsify it, notably removal of unused subquery output expressions. Trying to make the accounting 100% accurate seems expensive and doomed to future breakage. It's not really worth it, because all this code is protecting is downstream assumptions that every referenced CTE has a plan. Let's convert those assertions to regular test-and-elog just in case there's some actual problem, and then drop the failing assertion. Per report from Tomas Vondra (thanks also to Richard Guo for analysis). Back-patch to v12 where the faulty code came in. Discussion: https://postgr.es/m/29196a1e-ed47-c7ca-9be2-b1c636816183@enterprisedb.com
* Fix missed cases in libpq's error handling.Tom Lane2022-04-21
| | | | | | | | | | | | | | | | | | | | Commit 618c16707 invented an "error_result" flag in PGconn, which intends to represent the state that we have an error condition and need to build a PGRES_FATAL_ERROR PGresult from the message text in conn->errorMessage, but have not yet done so. (Postponing construction of the error object simplifies dealing with out-of-memory conditions and with concatenation of messages for multiple errors.) For nearly all purposes, this "virtual" PGresult object should act the same as if it were already materialized. But a couple of places in fe-protocol3.c didn't get that memo, and were only testing conn->result as they used to, without also checking conn->error_result. In hopes of reducing the probability of similar mistakes in future, I invented a pgHavePendingResult() macro that includes both tests. Per report from Peter Eisentraut. Discussion: https://postgr.es/m/b52277b9-fa66-b027-4a37-fb8989c73ff8@enterprisedb.com
* Rethink method for assigning OIDs to the template0 and postgres DBs.Tom Lane2022-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit aa0105141 assigned fixed OIDs to template0 and postgres in a very ad-hoc way. Notably, instead of teaching Catalog.pm about these OIDs, the unused_oids script was just hacked to not show them as unused. That's problematic since, for example, duplicate_oids wouldn't report any future conflict. Hence, invent a macro DECLARE_OID_DEFINING_MACRO() that can be used to define an OID that is known to Catalog.pm and will participate in duplicate-detection as well as renumbering by renumber_oids.pl. (We don't anticipate renumbering these particular OIDs, but we might as well build out all the Catalog.pm infrastructure while we're here.) Another issue is that aa0105141 neglected to touch IsPinnedObject, with the result that it now claimed template0 and postgres are pinned. The right thing to do there seems to be to teach it that no database is pinned, since in fact DROP DATABASE doesn't check for pinned-ness (and at least for these cases, that is an intentional choice). It's not clear whether this wrong answer had any visible effect, but perhaps it could have resulted in erroneous management of dependency entries. In passing, rename the TemplateDbOid macro to Template1DbOid to reduce confusion (likely we should have done that way back when we invented template0, but we didn't), and rename the OID macros for template0 and postgres to have a similar style. There are no changes to postgres.bki here, so no need for a catversion bump. Discussion: https://postgr.es/m/2935358.1650479692@sss.pgh.pa.us
* Use DECLARE_TOAST_WITH_MACRO() to simplify toast-table declarations.Tom Lane2022-04-21
| | | | | | | | | | | | | | | This is needed so that renumber_oids.pl can handle renumbering shared catalog declarations, which need to provide C macros for the OIDs of the shared toast table and index. The previous method of writing a C macro separately was error-prone anyway. Also teach renumber_oids.pl about DECLARE_UNIQUE_INDEX_PKEY, as we missed doing when inventing that macro. There are no changes to postgres.bki here, so no need for a catversion bump. Discussion: https://postgr.es/m/2995325.1650487527@sss.pgh.pa.us
* vacuumlazy.c: MultiXactIds are MXIDs, not XMIDs.Peter Geoghegan2022-04-20
| | | | Oversights in commits 0b018fab and f3c15cbe.
* Fix CLUSTER tuplesorts on abbreviated expressions.Peter Geoghegan2022-04-20
| | | | | | | | | | | | | | | | | | | | CLUSTER sort won't use the datum1 SortTuple field when clustering against an index whose leading key is an expression. This makes it unsafe to use the abbreviated keys optimization, which was missed by the logic that sets up SortSupport state. Affected tuplesorts output tuples in a completely bogus order as a result (the wrong SortSupport based comparator was used for the leading attribute). This issue is similar to the bug fixed on the master branch by recent commit cc58eecc5d. But it's a far older issue, that dates back to the introduction of the abbreviated keys optimization by commit 4ea51cdfe8. Backpatch to all supported versions. Author: Peter Geoghegan <pg@bowt.ie> Author: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CA+hUKG+bA+bmwD36_oDxAoLrCwZjVtST2fqe=b4=qZcmU7u89A@mail.gmail.com Backpatch: 10-
* Disallow infinite endpoints in generate_series() for timestamps.Tom Lane2022-04-20
| | | | | | | | | | Such cases will lead to infinite loops, so they're of no practical value. The numeric variant of generate_series() already threw error for this, so borrow its message wording. Per report from Richard Wesley. Back-patch to all supported branches. Discussion: https://postgr.es/m/91B44E7B-68D5-448F-95C8-B4B3B0F5DEAF@duckdblabs.com
* Allow db.schema.table patterns, but complain about random garbage.Robert Haas2022-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | psql, pg_dump, and pg_amcheck share code to process object name patterns like 'foo*.bar*' to match all tables with names starting in 'bar' that are in schemas starting with 'foo'. Before v14, any number of extra name parts were silently ignored, so a command line '\d foo.bar.baz.bletch.quux' was interpreted as '\d bletch.quux'. In v14, as a result of commit 2c8726c4b0a496608919d1f78a5abc8c9b6e0868, we instead treated this as a request for table quux in a schema named 'foo.bar.baz.bletch'. That caused problems for people like Justin Pryzby who were accustomed to copying strings of the form db.schema.table from messages generated by PostgreSQL itself and using them as arguments to \d. Accordingly, revise things so that if an object name pattern contains more parts than we're expecting, we throw an error, unless there's exactly one extra part and it matches the current database name. That way, thisdb.myschema.mytable is accepted as meaning just myschema.mytable, but otherdb.myschema.mytable is an error, and so is some.random.garbage.myschema.mytable. Mark Dilger, per report from Justin Pryzby and discussion among various people. Discussion: https://www.postgresql.org/message-id/20211013165426.GD27491%40telsasoft.com
* Fix incorrect format placeholdersPeter Eisentraut2022-04-20
|
* set_deparse_plan: Reuse variable to appease CoverityAlvaro Herrera2022-04-20
| | | | | | | | | | | | | | Coverity complains that dpns->outer_plan is deferenced (to obtain ->targetlist) when possibly NULL. We can avoid this by using dpns->outer_tlist instead, which was already obtained a few lines up. The fact that we end up with dpns->inner_tlist = dpns->outer_tlist is a bit suspicious-looking and maybe worthy of more investigation, but I'll leave that for another day. Reviewed-by: Michaël Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/202204191345.qerjy3kxi3eb@alvherre.pgsql
* Move ModifyTableContext->lockmode to UpdateContextAlvaro Herrera2022-04-20
| | | | | | | | | Should have been done this way to start with, but I failed to notice This way we avoid some pointless initialization, and better contains the variable to exist in the scope where it is really used. Reviewed-by: Michaël Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/202204191345.qerjy3kxi3eb@alvherre.pgsql
* ExecModifyTable: use context.planSlot instead of planSlotAlvaro Herrera2022-04-20
| | | | | | | | There's no reason to keep a separate local variable when we have a place for it elsewhere. This allows to simplify some code. Reviewed-by: Michaël Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/202204191345.qerjy3kxi3eb@alvherre.pgsql
* Fix breakage in AlterFunction().Tom Lane2022-04-19
| | | | | | | | | | | | | | | | An ALTER FUNCTION command that tried to update both the function's proparallel property and its proconfig list failed to do the former, because it stored the new proparallel value into a tuple that was no longer the interesting one. Carelessness in 7aea8e4f2. (I did not bother with a regression test, because the only likely future breakage would be for someone to ignore the comment I added and add some other field update after the heap_modify_tuple step. A test using existing function properties could not catch that.) Per report from Bryn Llewellyn. Back-patch to all supported branches. Discussion: https://postgr.es/m/8AC9A37F-99BD-446F-A2F7-B89AD0022774@yugabyte.com
* Remove duplicated word in comment of basebackup.cMichael Paquier2022-04-20
| | | | | | | Oversight in 39969e2. Author: Martín Marqués Discussion: https://postgr.es/m/CABeG9LviA01oHC5h=ksLUuhMyXxmZR_tftRq6q3341CMT=j=4g@mail.gmail.com
* Fix extract epoch from interval calculationPeter Eisentraut2022-04-19
| | | | | | | | | | | | | | The new numeric code for extract epoch from interval accidentally truncated the DAYS_PER_YEAR value to an integer, leading to results that mismatched the floating-point interval_part calculations. The commit a2da77cdb4661826482ebf2ddba1f953bc74afe4 that introduced this actually contains the regression test change that this reverts. I suppose this was missed at the time. Reported-by: Joseph Koshakow <koshy44@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/CAAvxfHd5n%3D13NYA2q_tUq%3D3%3DSuWU-CufmTf-Ozj%3DfrEgt7pXwQ%40mail.gmail.com
* Fix aggregate logging of pgbench.Tatsuo Ishii2022-04-19
| | | | | | | | Remove meaningless "failures" column from the aggregate logging. It was just a sum of "serialization failures" and "deadlock failures". Pointed out by Tom Lane. Patch reviewed by Fabien COELHO. Discussion: https://postgr.es/m/4183048.1649536705%40sss.pgh.pa.us
* Fix the check to limit sync workers.Amit Kapila2022-04-19
| | | | | | | | | | | | | | | | | We don't allow to invoke more sync workers once we have reached the sync worker limit per subscription. But the check to enforce this also doesn't allow to launch an apply worker if it gets restarted. This code was introduced by commit de43897122 but we caught the problem only with the test added by recent commit c91f71b9dc which started failing occasionally in the buildfarm. As per buildfarm. Diagnosed-by: Amit Kapila, Masahiko Sawada, Tomas Vondra Author: Amit Kapila Backpatch-through: 10 Discussion: https://postgr.es/m/CAH2L28vddB_NFdRVpuyRBJEBWjz4BSyTB=_ektNRH8NJ1jf95g@mail.gmail.com https://postgr.es/m/f90d2b03-4462-ce95-a524-d91464e797c8@enterprisedb.com
* Add missing error handling in pg_md5_hash().Tom Lane2022-04-18
| | | | | | | | | | It failed to provide an error string as expected for the admittedly-unlikely case of OOM in pg_cryptohash_create(). Also, make it initialize *errstr to NULL for success, as pg_md5_binary() does. Also add missing comments. Readers should not have to reverse-engineer the API spec for a publicly visible routine.
* Avoid invalid array reference in transformAlterTableStmt().Tom Lane2022-04-18
| | | | | | | | | | | | | | | | | | | | | | Don't try to look at the attidentity field of system attributes, because they're not there in the TupleDescAttr array. Sometimes this is harmless because we accidentally pick up a zero, but otherwise we'll report "no owned sequence found" from an attempt to alter a system attribute. (It seems possible that a SIGSEGV could occur, too, though I've not seen it in testing.) It's not in this function's charter to complain that you can't alter a system column, so instead just hard-wire an assumption that system attributes aren't identities. I didn't bother with a regression test because the appearance of the bug is very erratic. Per bug #17465 from Roman Zharkov. Back-patch to all supported branches. (There's not actually a live bug before v12, because before that get_attidentity() did the right thing anyway. But for consistency I changed the test in the older branches too.) Discussion: https://postgr.es/m/17465-f2a554a6cb5740d3@postgresql.org
* Fix second race condition in 002_archiving.pl with archive_cleanup_commandMichael Paquier2022-04-18
| | | | | | | | | | | | | Checking the execution of archive_cleanup_command on a standby requires a valid checkpoint coming from its primary, but the logic did not check that the standby replayed up to the point of the checkpoint, causing the test checking for the execution of archive_cleanup_command to fail. This race was more visible in slow environments. Issue introduced in 46dea24, so no backpatch is needed. Author: Tom Lane Discussion: https://postgr.es/m/4015413.1649454951@sss.pgh.pa.us
* Fix race in TAP test 002_archiving.pl when restoring history fileMichael Paquier2022-04-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This test, introduced in df86e52, uses a second standby to check that it is able to remove correctly RECOVERYHISTORY and RECOVERYXLOG at the end of recovery. This standby uses the archives of the primary to restore its contents, with some of the archive's contents coming from the first standby previously promoted. In slow environments, it was possible that the test did not check what it should, as the history file generated by the promotion of the first standby may not be stored yet on the archives the second standby feeds on. So, it could be possible that the second standby selects an incorrect timeline, without restoring a history file at all. This commits adds a wait phase to make sure that the history file required by the second standby is archived before this cluster is created. This relies on poll_query_until() with pg_stat_file() and an absolute path, something not supported in REL_10_STABLE. While on it, this adds a new test to check that the history file has been restored by looking at the logs of the second standby. This ensures that a RECOVERYHISTORY, whose removal needs to be checked, is created in the first place. This should make the test more robust. This test has been introduced by df86e52, but it came in light as an effect of the bug fixed by acf1dd42, where the extra restore_command calls made the test much slower. Reported-by: Andres Freund Discussion: https://postgr.es/m/YlT23IvsXkGuLzFi@paquier.xyz Backpatch-through: 11
* Handle compression level in pg_receivewal for LZ4Michael Paquier2022-04-18
| | | | | | | | | | | | | The new option set of pg_receivewal introduced in 042a923 to control the compression method makes it now easy to pass down various options, including the compression level. The change to be able to do is simple, and requires one LZ4F_preferences_t fed to LZ4F_compressBegin(). Note that LZ4F_INIT_PREFERENCES could be used to initialize the contents of LZ4F_preferences_t as required by LZ4, but this is only available since v1.8.3. memset()'ing its structure to 0 is enough. Discussion: https://postgr.es/m/YlPQGNAAa04raObK@paquier.xyz
* Add a temp-install prerequisite to src/interfaces/ecpg "checktcp".Noah Misch2022-04-16
| | | | | | The target failed, tested $PATH binaries, or tested a stale temporary installation. Commit c66b438db62748000700c9b90b585e756dd54141 missed this. Back-patch to v10 (all supported versions).
* Don't retry restore_command while reading ahead.Thomas Munro2022-04-17
| | | | | | | | | | | | | | | | | | | | | | | | | | Suppress further attempts to read ahead in the WAL if we run out of data, until the records already decoded have been replayed. This restores the traditional behavior for continuous archive recovery, which is to retry the failing restore_command only every 5 seconds. With the coding in 5dc0418f, we would start retrying every time through the recovery loop when our WAL decoding window hit the end of the current segment and we tried to look ahead into a not-yet-available next file. That was very slow. Also change the no_readahead_until mechanism to use <= rather than <, which seems more useful. Otherwise we'd either get one extra unwanted retry of restore_command, or we'd need to add 1 to an LSN. No change in behavior for regular streaming. That was already limited by the flushedUpto variable, which won't be updated until we replay what we have already. Reported by Andres Freund while analyzing the failure of a TAP test on build farm animal skink (investigation ongoing but probably due to otherwise unrelated timing bugs triggered by this slowness magnified by valgrind). Discussion: https://postgr.es/m/20220409005910.alw46xqmmgny2sgr%40alap3.anarazel.de
* pgstat: Use correct lock level in pgstat_drop_all_entries().Andres Freund2022-04-16
| | | | | | | | | | | | | | | Previously we didn't, which lead to an assertion failure when resetting partially loaded statistics. This was encountered on the buildfarm, for as-of-yet unknown reasons. Ttighten up a validity check when reading the stats file, verifying 'E' signals the end of the file (rather than just stopping reading). That's then used in a test appending to the stats file that crashed before the fix in pgstat_drop_all_entries(). Reported by buildfarm animals mylodon and kestrel, via Tom Lane. Discussion: https://postgr.es/m/1656446.1650043715@sss.pgh.pa.us
* Fix incorrect logic in HaveRegisteredOrActiveSnapshot().Tom Lane2022-04-16
| | | | | | | | | | | | | | | | | This function gave the wrong answer when there's more than one RegisteredSnapshots entry, whether or not any of them is the CatalogSnapshot. This leads to assertion failure in some scenarios involving fetching toasted data using a cursor. (As per discussion, I'm dubious that this is the right contract to be enforcing at all; but it surely doesn't help to be enforcing it incorrectly.) Fetching toasted data using a cursor is evidently under-tested, so add a test case too. Per report from Erik Rijkers. This is new code, so no need for back-patch. Discussion: https://postgr.es/m/dc9dd229-ed30-6c62-4c41-d733ffff776b@xs4all.nl
* Build libpq test programs under MSVCAndrew Dunstan2022-04-16
| | | | This allows the newly added TAP tests to run.
* Use standard timeout, in 010_pg_basebackup.pl.Noah Misch2022-04-15
| | | | Per buildfarm member mandrill. The test is new in v15, so no back-patch.