aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* vacuumdb: Don't skip empty relations in --missing-stats-only mode.Nathan Bossart2025-04-30
| | | | | | | | | | | | | | | | | | | | Presently, --missing-stats-only skips relations with reltuples set to 0 because empty relations don't get optimizer statistics. However, before v14, a reltuples value of 0 was ambiguous: it could either mean the relation is empty, or it could mean that it hadn't yet been vacuumed or analyzed. (Commit 3d351d916b taught v14 and newer to use -1 for the latter case.) This ambiguity can cause --missing-stats-only to inadvertently skip relations that need optimizer statistics after upgrades to v18 and newer (since reltuples is now transferred from the old cluster). To fix, simply remove the check for reltuples != 0. This will cause --missing-stats-only to analyze some empty tables, but that doesn't seem too terrible a trade-off. Reported-by: Christoph Berg <myon@debian.org> Reviewed-by: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/aAjyvW5_fRGNr7yF%40msg.df7cb.de
* Further adjust guidance for running vacuumdb after pg_upgrade.Nathan Bossart2025-04-30
| | | | | | | | | | | | | | | | Since pg_upgrade does not transfer the cumulative statistics used to trigger autovacuum and autoanalyze, the server may take much longer than expected to process them post-upgrade. Currently, we recommend analyzing only relations for which optimizer statistics were not transferred by using the --analyze-in-stages and --missing-stats-only options. This commit appends another recommendation to analyze all relations to update the relevant cumulative statistics by using the --analyze-only option. This is similar to the recommendation for pg_stat_reset(). Reported-by: Christoph Berg <myon@debian.org> Reviewed-by: Christoph Berg <myon@debian.org> Discussion: https://postgr.es/m/aAfxfKC82B9NvJDj%40msg.df7cb.de
* Update time zone data files to tzdata release 2025b.Tom Lane2025-04-30
| | | | | | | | | | DST law changes in Chile: there is a new time zone America/Coyhaique for Chile's Aysén Region, to account for it changing to UTC-03 year-round and thus diverging from America/Santiago. Historical corrections for Iran. Backpatch-through: 13
* Typo and doc fixups for memory context reportingDaniel Gustafsson2025-04-30
| | | | | | | | | This fixes comment and docs typos as well as a small documentation change to make it clearer. Found via post-commit review. Author: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAH2L28vt16C9xTuK+K7QZvtA3kCNWXOEiT=gEekUw3Xxp9LVQw@mail.gmail.com
* Add missing string terminatorDaniel Gustafsson2025-04-30
| | | | | | | | | | | When copying the string strncpy won't add nul termination since the string length is equal to the length specified. Explicitly set a nul terminator after copying to properly terminate. Found via post-commit review. Author: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAH2L28vt16C9xTuK+K7QZvtA3kCNWXOEiT=gEekUw3Xxp9LVQw@mail.gmail.com
* Fix broken indentationDavid Rowley2025-04-30
| | | | | | | I forgot to run pgindent in d8555e522. Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com> Discussion: https://postgr.es/m/156083c9-eac0-418d-9667-92dec4d6d6cd@oss.nttdata.com
* Fix a couple of comment typosDavid Rowley2025-04-30
| | | | | Author: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CAEG8a3+MRwDKc4YSFKKPKq7Y+vMufVC5u94wM5KZPB2CbgCxnQ@mail.gmail.com
* Give up on running with NetBSD/OpenBSD's default semaphore settings.Tom Lane2025-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 38da053463bef32adf563ddee5277d16d2b6c5af, which attempted to preserve our ability to start with only 60 semaphores. Subsequent changes (particularly 55b454d0e) have put that idea pretty much permanently out of reach: people wishing to use Postgres v18 on OpenBSD or NetBSD will have no choice but to increase those platforms' default values of SEMMNI and SEMMNS. Hence, revert 38da05346's changes in SEMAS_PER_SET and the minimum tested value of max_connections. Adjust a comment from the subsequent patch 6d0154196, and tweak the wording in runtime.sgml to make it clear that changing SEMMNI/SEMMNS is no longer even a little bit optional on these platforms. Although 38da05346 was later back-patched into v17, leave that branch alone: it's still capable of starting with 60 semaphores, and there's no reason to break that. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://postgr.es/m/E1tuZNv-0037Gs-34@gemulon.postgresql.org Discussion: https://postgr.es/m/1052019.1745947915@sss.pgh.pa.us
* oauth: Classify oauth_client_secret as a passwordJacob Champion2025-04-29
| | | | | | | | | | | Tell UIs to hide the value of oauth_client_secret, like the other passwords. Due to the previous commit, this does not affect postgres_fdw and dblink, but add a comment to try to warn others of the hazard in the future. Reported-by: Noah Misch <noah@leadboat.com> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/20250415191435.55.nmisch%40google.com
* Fix whitespace typo in stringPeter Eisentraut2025-04-29
|
* initdb: Do not report default autovacuum_worker_slots.Nathan Bossart2025-04-29
| | | | | | | | | | | | | | | Commit 6d01541960 taught initdb to lower the default value of autovacuum_worker_slots for systems with very few semaphores. It also added a "fake" report for the chosen value, i.e., initdb prints a message about selecting the default, but the value was already selected in a previous test. Per discussion, this is not a precedent we want to set, and it seems unnecessary to report everything derived from max_connections, so let's remove the "fake" report. Reported-by: Peter Eisentraut <peter@eisentraut.org> Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/de722583-4ba4-4063-bc41-e20684978116%40eisentraut.org
* Fixes for ChangeVarNodes_walker()Alexander Korotkov2025-04-29
| | | | | | | | | | | | | | | | | This commit fixes two bug in ChangeVarNodes_walker() function. * When considering RestrictInfo, walk down to its clauses based on the presense of relid to be deleted not just in clause_relids but also in required_relids. * Incrementally adjust num_base_rels based on the change of clause_relids instead of recalculating it using clause_relids, which could contain outer-join relids. Reported-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAMbWs49PE3CvnV8vrQ0Dr%3DHqgZZmX0tdNbzVNJxqc8yg-8kDQQ%40mail.gmail.com Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
* pg_restore: Improve --help synopsisPeter Eisentraut2025-04-29
| | | | | | | The --help synopsis should only be one line. This rephrases the first line a bit to reflect the new functionality of restoring multiple databases from pg_dumpall output. Additional explanations are better kept in the man page.
* pg_restore: Put new option in consistent order in --help outputPeter Eisentraut2025-04-29
| | | | Also make the description a bit more consistent with similar options.
* Fix assertion failure during decoding from synced slots.Amit Kapila2025-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The slot synchronization skips updating the confirmed_flush LSN of the local slot if the local slot has a newer catalog_xmin or restart_lsn, but still allows updating the two_phase and two_phase_at fields of the slot. This opens up a window for the prepared transactions between old confirmed_flush LSN and two_phase_at to unexpectedly get decoded and sent to the downstream after promotion. Then, while decoding the commit prepared the assert will fail, which expects that the prepare hasn't been sent to the downstream. The fix is to skip updating the other slot fields when we are skipping to update the confirmed_flush LSN of the slot. We didn't backpatch this commit as two_phase_at was not synced in back branches, which means prepared transactions won't be unexpectedly sent to downstream. We discovered this problem while analyzing BF failure reported in the discussion link. Reliably reproducing this issue without a debugger is difficult. Given its rarity, adding specific injection point to test it doesn't seem worthwhile, so we won't be adding a dedicated test case. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OS0PR01MB5716B44052000EB91EFAE60E94BC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
* pg_verifybackup: Message style improvementsPeter Eisentraut2025-04-29
|
* test_slru: Fix incorrect format placeholdersPeter Eisentraut2025-04-29
| | | | | | | Before commit a0ed19e0a9e there was a cast around these, but the cast inadvertently changed the signedness, but that made the format placeholder correct. Commit a0ed19e0a9e removed the casts, so now the format placeholders had the wrong signedness.
* Add maintenance_io_concurrency flag to some read stream usersMelanie Plageman2025-04-28
| | | | | | | | | | | Index vacuuming and [auto]prewarm AIO concurrency should be governed by maintenance_io_concurrency. As such, pass those read stream users the READ_STREAM_MAINTENANCE flag which will calculate their read stream distance with maintenance_io_concurrency instead of effective_io_concurrency. This was an oversight in the original commits making those operations use the read stream API. Discussion: https://postgr.es/m/flat/CAAKRu_aopDxTo4b41Mt_7Zc-z0_ngocrY8SFCCY6Aph1HgwuNw%40mail.gmail.com
* Fix obsolete nbtree array advancement comment.Peter Geoghegan2025-04-28
| | | | | | | | | Checking if another primitive scan is required after all once the next leaf page was moved from _bt_checkkeys to its _bt_readpage caller by commit 9a2e2a28. Update a comment that incorrectly described the recheck mechanism as something that takes place in _bt_checkkeys. Also fix an older typo in related code comments.
* Make NULL tuple values always advance skip arrays.Peter Geoghegan2025-04-28
| | | | | | | | | | | | | | | | | | | | | | | _bt_check_compare neglected to handle a case that can arise when the scan's keys are temporarily treated as nonrequired, as an optimization: whenever a NULL tuple value was encountered that had a skip array whose current element wasn't already NULL, _bt_check_compare failed to advance the array to the NULL element. This allowed _bt_check_compare to fail to return matching tuples containing a NULL value (though only with an array column that came before a skip array column with NULLs, and only during _bt_readpage calls that set pstate.forcenonrequired=true on a page where the higher-order column also had to advance). To fix, teach _bt_check_compare to handle this case just like any other case where a skip array key is unsatisfied and must be advanced directly (due to the key being considered a nonrequired key). Oversight in commit 8a510275, which optimized nbtree search scan key comparisons with skip arrays. Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://postgr.es/m/CAHgHdKtLFWZcjr87hMH0hYDHgcifu4Tj7iHz-xh8qsJREt5cqA@mail.gmail.com
* Fix pg_dump for inherited validated not-null constraintsÁlvaro Herrera2025-04-28
| | | | | | | | | | | | When a child constraint is validated and the parent constraint it derives from isn't, pg_dump must be coerced into printing the child constraint; failing to do would result in a dump that restores the constraint as not valid, which would be incorrect. Co-authored-by: jian he <jian.universality@gmail.com> Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Reported-by: jian he <jian.universality@gmail.com> Message-id: https://postgr.es/m/CACJufxGHNNMc0E2JphUqJMzD3=bwRSuAEVBF5ekgkG8uY0Q3hg@mail.gmail.com
* pg_combinebackup: Message style improvementsPeter Eisentraut2025-04-28
|
* Restore comments in ChangeVarNodesExtended()Alexander Korotkov2025-04-28
| | | | | | | | This commit restores comments in ChangeVarNodesExtended(), which were accidentally removed by fc069a3a6319. Reported-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAMbWs49PE3CvnV8vrQ0Dr%3DHqgZZmX0tdNbzVNJxqc8yg-8kDQQ%40mail.gmail.com
* Fix xmin advancement during fast_forward decoding.Amit Kapila2025-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During logical decoding, we advance catalog_xmin of logical too early in fast_forward mode, resulting in required catalog data being removed by vacuum. This mode is normally used to advance the slot without processing the changes, but we still can't let the slot's xmin to advance to an incorrect value. Commit f49a80c481 fixed a similar issue where the logical slot's catalog_xmin was getting advanced prematurely during non-fast-forward mode. During xl_running_xacts processing, instead of directly advancing the slot's xmin to the oldest running xid in the record, it allowed the xmin to be held back for snapshots that can be used for not-yet-replayed transactions, as those might consider older txns as running too. However, it missed the fact that the same problem can happen during fast_forward mode decoding, as we won't build a base snapshot in that mode, and the future call to get_changes from the same slot can miss seeing the required catalog changes leading to incorrect reslts. This commit allows building the base snapshot even in fast_forward mode to prevent the early advancement of xmin. Reported-by: Amit Kapila <amit.kapila16@gmail.com> Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 13 Discussion: https://postgr.es/m/CAA4eK1LqWncUOqKijiafe+Ypt1gQAQRjctKLMY953J79xDBgAg@mail.gmail.com Discussion: https://postgr.es/m/OS0PR01MB57163087F86621D44D9A72BF94BB2@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Remove circular #include's between wait_event.h and wait_event_types.hMichael Paquier2025-04-28
| | | | | | | | | | | | | | | | | | | | wait_event_types.h is generated by the code, and included wait_event.h. wait_event.h did the opposite move, including wait_event_types.h, causing a circular dependency between both. wait_event_types.h only needs to now about the wait event classes, so this information is moved into its own file, and wait_event_types.h uses this new header so as it does not depend anymore on wait_event.h. Note that such errors can be found with clang-tidy, with commands like this one: clang-tidy source_file.c --checks=misc-header-include-cycle -- \ -I/install/path/include/ -I/install/path/include/server/ Issue introduced by fa88928470b5. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/350192.1745768770@sss.pgh.pa.us
* Disallow removing placeholders during Self-Join Elimination.Alexander Korotkov2025-04-28
| | | | | | | | | | | | | | | | | | | | | fc069a3a6319 implements Self-Join Elimination (SJE), which can remove base relations when appropriate. However, regressions tests for SJE only cover the case when placeholder variables (PHVs) are evaluated and needed only in a single base rel. If this baserel is removed due to SJE, its clauses, including PHVs, will be transferred to the keeping relation. Removing these PHVs may trigger an error on plan creation -- thanks to the b3ff6c742f6c for detecting that. This commit skips removal of PHVs during SJE. This might also happen that we skip the removal of some PHVs that could be removed. However, the overhead of extra PHVs is small compared to the complexity of analysis needed to remove them. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Alena Rybakina <a.rybakina@postgrespro.ru> Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com>
* Remove inappropriate inclusions of c.h and postgres_fe.h.Tom Lane2025-04-27
| | | | | | | | | Per our usual policy, Postgres header files should not include these; the decision as to which one to use is to be made in the calling .c file instead. These errors aren't particularly new, but I'm not feeling a need to back-patch these changes; it's mostly just neatnik-ism.
* Don't use double-quotes in #include's of system headers, redux.Tom Lane2025-04-27
| | | | | | | | | | | | | | | | | | | | | | | This cleans up some loose ends left by commit e8ca9ed1d. I hadn't looked closely enough at these places before, but now I have. The use of double-quoted #includes for Perl headers in plperl_system.h seems to be simply a mistake introduced in 6c944bf3c and faithfully copied forward since then. (I had thought possibly it was required by some weird Windows build setup, but there's no evidence of that in our history.) The occurrences in SectionMemoryManager.h and SectionMemoryManager.cpp evidently stem from those files' origin as LLVM code. It's understandable that LLVM would treat their own files as needing double-quoted #includes; but they're still system headers to us. I also applied the same check to *.c files, and found a few other random incorrect usages in both directions. Our ECPG headers and test files routinely use angle brackets to refer to ECPG headers. I left those usages alone, since it seems reasonable for an ECPG user to regard those headers as system headers.
* Remove circular #include's between plpython.h and plpy_util.h.Tom Lane2025-04-27
| | | | | | | | | | | | | | | | | | | | | plpython.h included plpy_util.h, simply on the grounds that "it's easier to just include it everywhere". However, plpy_util.h must include plpython.h, or it won't pass headerscheck. While the resulting circularity doesn't have any immediate bad effect, it's poor design. We have seen serious messes arise in the past from overly-broad inclusion footprints created by such circularities, so let's establish a project policy against it. To fix, just replace *.c files' inclusions of plpython.h with plpy_util.h. They'll pull in plpython.h indirectly; indeed, almost all have already done so via inclusions of other plpy_xxx.h headers. (Any extensions using plpython.h can do likewise without breaking the compatibility of their code with prior Postgres versions.) Reported-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aAxQ6fcY5QQV1lo3@ip-10-97-1-34.eu-west-3.compute.internal
* Don't use double-quotes in #include's of system headers.Tom Lane2025-04-26
| | | | | | | | | | | | | While few if any C compilers will complain about this, it's inconsistent with our other #include's of the same headers. There are some other questionable usages in src/include/jit/SectionMemoryManager.h and src/pl/plperl/plperl_system.h, but perhaps those have a reason to be like that. I can't see that these do. Noticed while fooling around with a script to do analysis of our header cross-inclusions.
* Eliminate divide in new fast-path locking codeDavid Rowley2025-04-27
| | | | | | | | | | | | | | | | | | | | | | | c4d5cb71d2 adjusted the fast-path locking code to allow some configuration of the number of fast-path locking slots via the max_locks_per_transaction GUC. In that commit the FAST_PATH_REL_GROUP() macro used integer division to determine the fast-path locking group slot to use for the lock. The divisor in this case is always a power-of-two value. Here we swap out the divide by a bitwise-AND, which is a significantly faster operation to perform. In passing, adjust the code that's setting FastPathLockGroupsPerBackend so that it's more clear that the value being set is a power-of-two. Also, adjust some comments in the area which contained some magic numbers. It seems better to justify the 1024 upper limit in the location where the #define is made instead of where it is used. Author: David Rowley <drowleyml@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Discussion: https://postgr.es/m/CAApHDvodr3bcnpxcs7+k-3cFwYR0tP-BYhyd2PpDhe-bCx9i=g@mail.gmail.com
* Match parameter in new function to earlier equivalentsJohn Naylor2025-04-27
| | | | Oversight in commit 3c6e8c123.
* aio: Improve debug logging around waiting for IOsAndres Freund2025-04-25
| | | | | | | Trying to investigate a bug report by Alexander Lakhin made it apparent that the debug logging around waiting for IO completion is insufficient. Fix that. Discussion: https://postgr.es/m/h4in2db37vepagmi2oz5vvqymjasc5gyb4lpqkunj4eusu274i@37jpd3c2spd3
* Fix bug allowing io_combine_limit > io_max_combine_combine limitAndres Freund2025-04-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | 10f66468475 intended to limit the value of io_combine_limit to the minimum of io_combine_limit and io_max_combine_limit. To avoid issues with interdependent GUCs, it introduced io_combine_limit_guc and set io_combine_limit in assign hooks. That plan was thwarted by guc_tables.c accidentally still referencing io_combine_limit, instead of io_combine_limit_guc. That lead to the GUC machinery overriding the work done in the assign hooks, potentially leaving io_combine_limit with a too high value. The consequence of this bug was that when running with io_combine_limit > io_combine_limit_guc the AIO machinery would not have reserved large enough iovec and IO data arrays, with one IO's arrays overlapping with another IO's, leading to total confusion. To make such a problem easier to detect in the future, add assertions to pgaio_io_set_handle_data_* checking the length is smaller than io_max_combine_limit (not just PG_IOV_MAX). It'd be nice to have a few tests for this, but it's not entirely obvious how to do so portably. As remarked upon by Tom, the GUC assignment hooks really shouldn't set the underlying variable, that's the job of the GUC machinery. Change that as well. Discussion: https://postgr.es/m/c5jyqnuwrpigd35qe7xdypxsisdjrdba5iw63mhcse4mzjogxo@qdjpv22z763f
* aio: Fix crash potential for pg_aios views due to late state updateAndres Freund2025-04-25
| | | | | | | | | | | | | | | | | | | pgaio_io_reclaim() reset the fields in PgAioHandle before updating the state to IDLE or incrementing the generation. For most things that's OK, but for pg_get_aios() it is not - if it copied the PgAioHandle while fields were being reset, we wouldn't detect that and could call pgaio_io_get_target_description() with ioh->target == PGAIO_TID_INVALID, leading to a crash. Fix this issue by incrementing the generation and state earlier, before resetting. Also add an assertion to pgaio_io_get_target_description() for the target to be valid - that'd have made this case a bit easier to debug. While at it, add/update a few related assertions. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/062daca9-dfad-4750-9da8-b13388301ad9@gmail.com
* Fix incorrect format placeholdersPeter Eisentraut2025-04-25
| | | | | | | Before commit a0ed19e0a9e there was a cast around these, but the cast inadvertently changed the signedness, but that made the format placeholder correct. Commit a0ed19e0a9e removed the casts, so now the format placeholders had the wrong signedness.
* Fix terminology in comment and messagePeter Eisentraut2025-04-25
| | | | Should be "bracket" not "brace" for [].
* Small code consistency improvementPeter Eisentraut2025-04-25
| | | | | Adjust the way the increment operators are placed to be consistent throughout the function. Fixup for commit commit c1da7281060.
* psql: Fix assertion failures with pipeline modeMichael Paquier2025-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | A correct cocktail of COPY FROM, SELECT and/or DML queries and \syncpipeline was able to break the logic in charge of discarding results of a pipeline, done in discardAbortedPipelineResults(). Such sequence make the backend generate a FATAL, due to a protocol synchronization loss. This problem comes down to the fact that we did not consider the case of libpq returning a PGRES_FATAL_ERROR when discarding the results of an aborted pipeline. The discarding code is changed so as this result status is handled as a special case, with the caller of discardAbortedPipelineResults() being responsible for consuming the result. A couple of tests are added to cover the problems reported, bringing an interesting gain in coverage as there were no tests in the tree covering the case of protocol synchronization loss. Issue introduced by 41625ab8ea3d. Reported-by: Alexander Kozhemyakin <a.kozhemyakin@postgrespro.ru> Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/ebf6ce77-b180-4d6b-8eab-71f641499ddf@postgrespro.ru
* Add sanity check for dshash entries when reading pgstats fileMichael Paquier2025-04-24
| | | | | | | | | | | | | | | | | | | | Not having this check would produce a core dump at startup when running pgstat_read_statsfile(), in the case where the information of a stats kind for an entry in the dshash could not be found. The same check already happens for fixed-numbered stats and entries that are stored with their names. This issue can be seen with custom stats kinds. Note that this problem can be reproduced what what is in the core code: - Tweak the test module injection_points to not load the fixed-numbered stats part, leaving only the variable-numbered stats. - Create an instance with injection_points defined in shared_preload_libraries. - Create a pgstats entry by attaching and running a point. - Restart the server without shared_preload_libraries. The startup process detects that something is wrong and reports a WARNING. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/aAieZAvM+K1d89R2@ip-10-97-1-34.eu-west-3.compute.internal
* Avoid possibly-theoretical OOM crash hazard in hash_create().Tom Lane2025-04-23
| | | | | | | | | | | | | | | | | | | | | One place in hash_create() used DynaHashAlloc() as a convenient shorthand for MemoryContextAlloc(). That was fine when it was written, but it stopped being fine when 9c911ec06 changed DynaHashAlloc() to use MCXT_ALLOC_NO_OOM (mea culpa). Change the code to call plain MemoryContextAlloc() as intended. I think that this bug may be unreachable in practice, since we now always create AllocSets with some space already allocated, so that an OOM failure here for a non-shared hash table should be impossible (with a hash table name of reasonable length anyway). And there aren't enough shared hash tables to make a crash for one of those probable. Nonetheless it's clearly not operating as designed, so back-patch to v16 where 9c911ec06 came in. Reported-by: Maksim Korotkov <m.korotkov@postgrespro.ru> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/219bdccd460510efaccf90b57e5e5ef2@postgrespro.ru Backpatch-through: 16
* oauth: Support Python 3.6 in testsJacob Champion2025-04-23
| | | | | | | | | | | | RHEL8 ships a patched 3.6.8 as its base Python version, and I accidentally let some newer Python-isms creep into oauth_server.py during development. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Tested-by: Renan Alves Fonseca <renanfonseca@gmail.com> Tested-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/16098.1745079444%40sss.pgh.pa.us
* Maintain RelIdToTypeIdCacheHash in TypeCacheOpcCallback()Alexander Korotkov2025-04-23
| | | | | | | | | | | | | | | | | | b85a9d046efd introduced a new RelIdToTypeIdCacheHash, whose entries should exist for typecache entries with TCFLAGS_HAVE_PG_TYPE_DATA flag set or any of TCFLAGS_OPERATOR_FLAGS set or tupDesc set. However, TypeCacheOpcCallback(), which resets TCFLAGS_OPERATOR_FLAGS, was forgotten to update RelIdToTypeIdCacheHash. This commit adds a delete_rel_type_cache_if_needed() call to the TypeCacheOpcCallback() function to maintain RelIdToTypeIdCacheHash after resetting TCFLAGS_OPERATOR_FLAGS. Also, this commit fixes the name of the delete_rel_type_cache_if_needed() function in its mentions in the comments. Reported-by: Noah Misch Discussion: https://postgr.es/m/20250411203241.e9.nmisch%40google.com
* Properly prepare varinfos in estimate_multivariate_bucketsize()Alexander Korotkov2025-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | | To estimate with extended statistics, we need to clear the varnullingrels field in the expression, and duplicates are not allowed in the GroupVarInfo list. We might re-use add_unique_group_var(), but we don't do so for two reasons. 1) We must keep the origin_rinfos list ordered exactly the same way as varinfos. 2) add_unique_group_var() is designed for estimate_num_groups(), where a larger number of groups is worse. While estimating the number of hash buckets, we have the opposite: a lesser number of groups is worse. Therefore, we don't have to remove "known equal" vars: the removed var may valuably contribute to the multivariate statistics to grow the number of groups. This commit adds custom code to estimate_multivariate_bucketsize() to initialize varinfos properly. Reported-by: Robins Tharakan <tharakan@gmail.com> Discussion: https://postgr.es/m/18885-da51324078588253%40postgresql.org Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
* Change the names generated for child foreign key constraints.Tom Lane2025-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a foreign key constraint is placed on a partitioned table, we actually make two pg_constraint entries associated with that table. (I have my doubts about the wisdom of that, but it's been like that since v12 and post-feature-freeze is no time to be messing with such entrenched decisions.) The second "child" entry always had a name generated according to the default rule, "table_column(s)_fkey[nnn]", even if the primary entry had an unrelated user-specified name. The trouble with doing that is that the default name could collide with the user-specified name of some other constraint on the same table. While we were willing to adjust the generated name to avoid collisions, that only helps if it's made second; if it's made first then creation of the other constraint would fail, potentially causing dump/reload or pg_upgrade failures. The core of the problem here is that we're infringing on user namespace, so I doubt that there's any 100% solution other than to find a way to not need the "child" entry. In the meantime, it seems like it'd be an improvement to make the child's name be the name of the parent constraint with an underscore and digit(s) appended as necessary to make it unique. This rule can in theory fail in the same way, but it seems much less probable; for one thing, this rule is guaranteed not to match primary entries having auto-generated names. (While an auto-generated primary name isn't user-specified to begin with, it acts like that during dump/reload, so collisions against such names are definitely possible.) An additional bonus, visible in some of the regression test cases that change here, arises from the fact that some error messages cite the child constraint's name not the parent's. In the previous approach the two names could be completely unrelated, leading to user confusion --- the more so since psql's \d command hides child constraints. With this approach it's hopefully much clearer which constraint-the-user-knows-about is failing. However, that does mean that there's user-visible behavior change occurring here, making it seem like not something to back-patch. I feel it's not too late for v18, though. Reported-by: Kirill Reshke <reshkekirill@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CALdSSPhGitjpTfzEMJN-Y2x+Q-5QChSxAsmSJ1-E8mQJLkHOqQ@mail.gmail.com
* Allocate JsonLexContexts on the heap to avoid warningsDaniel Gustafsson2025-04-23
| | | | | | | | | | | | | | The stack allocated JsonLexContexts, in combination with codepaths using goto, were causing warnings when compiling with LTO enabled as the optimizer is unable to figure out that is safe. Rather than contort the code with workarounds for this simply heap allocate the structs instead as these are not in any performance critical paths. Author: Daniel Gustafsson <daniel@yesql.se> Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/2074634.1744839761@sss.pgh.pa.us
* psql: Rework TAP routine psql_fails_like() to define WAL sender contextMichael Paquier2025-04-23
| | | | | | | | | | | | | | | The routine was coded so as a WAL sender was always used, state required only for one failure test related to START_REPLICATION. This test is changed so as a WAL sender is used by passing a replication option to psql_fails_like(), instead of forcing the use of a WAL sender for all the tests. This has come up as useful in the context of a separate bug fix where we are looking at extending tests for some failure scenarios. These tests need to happen in the context of a normal backend, and not a WAL sender where the extended query protocol cannot be used. Discussion: https://postgr.es/m/aAXkJIOildLUA7vQ@paquier.xyz
* Fix an oversight in 3f28b2fcac.Amit Kapila2025-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | Commit 3f28b2fcac tried to ensure that the replication origin shouldn't be advanced in case of an ERROR in the apply worker, so that it can request the same data again after restart. However, it is possible that an ERROR was caught and handled by a (say PL/pgSQL) function, and the apply worker continues to apply further changes, in which case, we shouldn't reset the replication origin. Ensure to reset the origin only when the apply worker exits after an ERROR. Commit 3f28b2fcac added new function geterrlevel, which we removed in HEAD as part of this commit, but kept it in backbranches to avoid breaking any applications. A separate case can be made to have such a function even for HEAD. Reported-by: Shawn McCoy <shawn.the.mccoy@gmail.com> Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 16, where it was introduced Discussion: https://postgr.es/m/CALsgZNCGARa2mcYNVTSj9uoPcJo-tPuWUGECReKpNgTpo31_Pw@mail.gmail.com
* Remove assertion based on pending_since in pgstat_report_stat()Michael Paquier2025-04-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This assertion, based on pending_since (timestamp used to prevent stats reports to be too frequent or should a partial flush happen), is reached when it is found that no data can be flushed but a previous call of pgstat_report_stat() determined that some stats data has been found as in need of a flush. So pending_since is set when some stats data is pending (in non-force mode) or if report attempts are too frequent, and reset to 0 once all stats have been flushed. Since 5cbbe70a9cc6, WAL senders have begun to report their stats on a periodic basis for IO stats in v16~ and backend stats on HEAD, creating some friction with the concurrent pgstat_report_stat() calls that can happen in the context of a WAL sender (shutdown callback doing a final report or backend-related code paths). This problem is the cause of spurious failures in the TAP tests. In theory, this assertion can be also reached in v15, even if that's very unlikely. For example, a process, say a background worker, could do periodic and direct stats flushes with concurrent calls of pgstat_report_stat() that could cause conflicting values of pending_since. This can be done with WAL or SLRU stats flushes using pgstat_flush_wal() or pgstat_slru_flush(). HEAD makes this situation easier to happen with custom cumulative stats. This commit removes the assertion altogether, per discussion, as it is more useful to keep the state of things as they are for the WAL sender. The assertion could use a special state based on for example am_walsender, but I doubt that this would be meaningful in the long run based on the other arguments raised while discussing this issue. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/1489124.1744685908@sss.pgh.pa.us Discussion: https://postgr.es/m/dwrkeszz6czvtkxzr5mqlciy652zau5qqnm3cp5f3p2po74ppk@omg4g3cc6dgq Backpatch-through: 15
* Re-enable SSL connect_fails tests, and fix related race conditions.Tom Lane2025-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cluster.pm's connect_fails routine has long had the ability to sniff the postmaster log file for expected messages after a connection failure. However, that's always had a race condition: on some platforms it's possible for psql to exit and the test script to slurp up the postmaster log before the backend process has been able to write out its final log messages. Back in commit 55828a6b6 we disabled a bunch of tests after discovering that, and the aim of this patch is to re-enable them. (The sibling function connect_ok doesn't seem to have a similar problem, mainly because the messages we look for come out during the authentication handshake, so that if psql reports successful connection they should certainly have been emitted already.) The solution used here is borrowed from 002_connection_limits.pl's connect_fails_wait routine: set the server's log_min_messages setting to DEBUG2 so that the postmaster will log child-process exit, and then wait till we see that log entry before checking for the messages we are actually interested in. If a TAP test uses connect_fails' log_like or log_unlike options, and forgets to set log_min_messages, those connect_fails calls will now hang until timeout. Fixing up the existing callers shows that we had several other TAP tests that were in theory vulnerable to the same problem. It's unclear whether the lack of failures is just luck, or lack of buildfarm coverage, or perhaps there is some obscure timing effect that only manifests in SSL connections. In any case, this change should in principle make those other call sites more robust. I'm not inclined to back-patch though, unless sometime we observe an actual failure in one of them. Reported-by: Andrew Dunstan <andrew@dunslane.net> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/984fca80-85a8-4c6f-a5cc-bb860950b435@dunslane.net