aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAge
...
* Include schema/table publications even with exclude options in dump.Amit Kapila2025-02-20
| | | | | | | | | | | | | | | | | | | | The current implementation inconsistently includes public schema but not information_schema when those are specified in FOR TABLES IN SCHMEA ... Apart from that, the current behavior for publications w.r.t exclude table and schema (--exclude-table, --exclude-schema) option differs from what we do at other places. We try to avoid including publications for corresponding tables or schemas when an exclude-table or exclude-schema option is given, unlike what we do for views using functions defined in a particular schema or a subscription pointing to publications with their corresponding exclude options. I decided not to backpatch this as it leads to a behavior change and we don't see any field report for current behavior. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/1270733.1734134272@sss.pgh.pa.us
* Fix FATAL message for invalid recovery timeline at beginning of recoveryMichael Paquier2025-02-20
| | | | | | | | | | | | | | | | | | | | If the requested recovery timeline is not reachable, the logged checkpoint and timeline should to be the values read from the backup_label when it is defined. The message generated used the values from the control file in this case, which is fine when recovering from the control file without a backup_label, but not if there is a backup_label. Issue introduced in ee994272ca50. v15 has introduced xlogrecovery.c and more simplifications in this area (4a92a1c3d1c3, a27048cbcb58), making this change a bit simpler to think about, so backpatch only down to this version. Author: David Steele <david@pgbackrest.org> Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru> Reviewed-by: Benoit Lobréau <benoit.lobreau@dalibo.com> Discussion: https://postgr.es/m/c3d617d4-1696-4aa7-8a4d-5a7d19cc5618@pgbackrest.org Backpatch-through: 15
* pgbench: Increase RLIMIT_NOFILE if necessaryAndres Freund2025-02-19
| | | | | | | | | | | | | | | | | | | | pgbench already had code to check if the soft rlimit is too low for the specified number of connections. If too low, it errored out, telling the user to increase the limit. However, we can do better: If the hard limit allows, increase the soft limit to be sufficiently for the number of connections. It is common for the soft limit to be considerably lower than the hard limit, due to the danger of soft limits > 1024 breaking programs that use the select(2), as explained in [1]. [1]: https://0pointer.net/blog/file-descriptor-limits.html Author: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAGECzQQh6VSy3KG4pN1d%3Dh9J%3DD1rStFCMR%2Bt7yh_Kwj-g87aLQ%40mail.gmail.com
* test_escape: Fix output of --helpMichael Paquier2025-02-20
| | | | | | | | | The short option name -f was not listed, only its long option name --force-unsupported. Author: Japin Li Discussion: https://postgr.es/m/ME0P300MB04452BD1FB1B277D4C1C20B9B6C52@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM Backpatch-through: 13
* Correct relation size estimate with low fillfactorTomas Vondra2025-02-19
| | | | | | | | | | | | | | | | | | | | | Since commit 29cf61ade3, table_block_relation_estimate_size() considers fillfactor when estimating number of rows in a relation before the first ANALYZE. The formula however did not consider tuples may be larger than available space determined by fillfactor, ending with density 0. This ultimately means the relation was estimated to contain a single row. The executor however places at least one tuple per page, even with very low fillfactor values, so the density should be at least 1. Fixed by clamping the density estimate using clamp_row_est(). Reported by Heikki Linnakangas. Fix by me, with regression test inspired by example provided by Heikki. Backpatch to 17, where the issue was introduced. Reported-by: Heikki Linnakangas Backpatch-through: 17 Discussion: https://postgr.es/m/2bf9d973-7789-4937-a7ca-0af9fb49c71e@iki.fi
* Assert that ExecOpenIndices and ExecCloseIndices are not repeated.Tom Lane2025-02-19
| | | | | | | | | | | | | | | | | | | | These functions should be called at most once per ResultRelInfo; it's wasteful to do otherwise, and certainly the pattern of opening twice and then closing twice is a bad idea. Moreover, aminsertcleanup functions might not be prepared to be called twice, as the just-hardened code in BRIN demonstrates. This amounts to an API change, since such coding patterns were safe even if wasteful before v17. Hence, apply to HEAD only. (Extension code violating this new rule faces some risk in v17, but we just fixed brininsertcleanup and there are probably few other aminsertcleanup functions as yet. So the odds of breaking usable code seem higher than the odds of doing something useful with a back-patch.) Bug: #18815 Reported-by: Sergey Belyashov <sergey.belyashov@gmail.com> Discussion: https://postgr.es/m/18815-2a0407cc7f40b327@postgresql.org
* Fix crash in brininsertcleanup during logical replication.Tom Lane2025-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Logical replication crashes if the subscriber's partitioned table has a BRIN index. There are two independently blamable causes, and this patch fixes both: 1. brininsertcleanup fails if called twice for the same IndexInfo, because it half-destroys its BrinInsertState but leaves it still linked from ii_AmCache. brininsert would also fail in that state, so it's pretty hard to see any advantage to this coding. Fully remove the BrinInsertState, instead, so that a new brininsert call would create a new cache. 2. A logical replication subscriber sometimes does ExecOpenIndices twice on the same ResultRelInfo, followed by doing ExecCloseIndices twice; the second call reaches the brininsertcleanup bug. Quite aside from tickling unexpected cases in aminsertcleanup methods, this seems very wasteful, because the IndexInfos built in the first ExecOpenIndices call are just lost during the second call, and have to be rebuilt at possibly-nontrivial cost. We should establish a coding rule that you don't do that. The problematic coding is that when the target table is partitioned, apply_handle_tuple_routing calls ExecFindPartition which does ExecOpenIndices (and expects that ExecCleanupTupleRouting will close the indexes again). Using the ResultRelInfo made by ExecFindPartition, it calls apply_handle_delete_internal or apply_handle_insert_internal, both of which think they need to do ExecOpenIndices/ExecCloseIndices for themselves. They do in the main non-partitioned code paths, but not here. The simplest fix is to pull their ExecOpenIndices/ExecCloseIndices calls out and put them in the call sites for the non-partitioned cases. (We could have refactored apply_handle_update_internal similarly, but I did not do so today because there's no bug there: the partitioned code path doesn't call it.) Also, remove the always-duplicative open/close calls within apply_handle_tuple_routing itself. Since brininsertcleanup and indeed the whole aminsertcleanup mechanism are new in v17, there's no observable bug in older branches. A case could be made for trying to avoid these duplicative open/close calls in the older branches, but for now it seems not worth the trouble and risk of new bugs. Bug: #18815 Reported-by: Sergey Belyashov <sergey.belyashov@gmail.com> Discussion: https://postgr.es/m/18815-2a0407cc7f40b327@postgresql.org Backpatch-through: 17
* Consider BufFiles when adjusting hashjoin parametersTomas Vondra2025-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now ExecChooseHashTableSize() considered only the size of the in-memory hash table, and ignored the memory needed for the batch files. Which can be a significant amount, because each batch needs two BufFiles (each with a BLCKSZ buffer). The same issue applies to increasing the number of batches during execution. It's also possible to trigger a "batch explosion", e.g. due to duplicate values or skew. We've seen reports of joins with hundreds of thousands (or even millions) of batches, consuming gigabytes of memory, triggering OOM errors. These cases may be fairly rare, but it's clearly possible to hit them. These issues can't be prevented during planning. Even if we improve that, it does not help with execution-time batch explosion. We can however reduce the impact and use as little memory as possible. This patch improves the behavior by adjusting how the memory is divided between the hash table and batch files. It may be better to use fewer batch files, even if it means the hash table will exceed the limit. The capacity of the hash node may be increased either by doubling he number of batches, or doubling the size of the in-memory hash table. The outcome is the same, but the memory usage may be very different. For low nbatch values it's better to add batches, for high nbatch values it's better to allow a larger hash table. The patch considers both options, both during the initial sizing and then during execution, to minimize how much the limit gets exceeded. It might seem this patch is relaxing the memory limit - allowing it to be exceeded. But that's not really the case. It has always been like that, except the memory used by batches was ignored. Allowing the hash table to grow may also prevent the batch explosion. If there's a large batch that can't be split (due to hash collisions or duplicate values), at some point the memory limit will increase enough for the batch to fit into the hash table. This patch was in the works for a long time. The early versions were posted in 2019, and revived every year or two when we happened to get the next report of OOM due to a hashjoin batch explosion. Each of those patch versions were reviewed by a couple people. I'm mentioning only Melanie Plageman and Robert Haas, because they reviewed the last version, and the older patches are very different. Reviewed-by: Melanie Plageman, Robert Haas Discussion: https://postgr.es/m/7bed6c08-72a0-4ab9-a79c-e01fcdd0940f@vondra.me Discussion: https://postgr.es/m/20190504003414.bulcbnge3rhwhcsh%40development Discussion: https://postgr.es/m/20190428141901.5dsbge2ka3rxmpk6%40development
* tests: BackgroundPsql: Fix potential for lost errors on windowsAndres Freund2025-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This addresses various corner cases in BackgroundPsql: - On windows stdout and stderr may arrive out of order, leading to errors not being reported, or attributed to the wrong statement. To fix, emit the "query-separation banner" on both stdout and stderr and wait for both. - Very occasionally the "query-separation banner" would not get removed, because we waited until the banner arrived, but then replaced the banner plus newline. To fix, wait for banner and newline. - For interactive psql replacing $banner\n is not sufficient, interactive psql outputs \r\n. - For interactive psql, where commands are echoed to stdout, the \echo command, rather than its output, would be matched. This would sometimes lead to output from the prior query, or wait_connect(), being returned in the next command. This also affected wait_connect(), leading to sometimes sending queries to psql before the connection actually was established. While debugging these issues I also found that it's hard to know whether a query separation banner was attributed to the right query. Make that easier by counting the queries each BackgroundPsql instance has emitted and include the number in the banner. Also emit psql stdout/stderr in query() and wait_connect() as Test::More notes, without that it's rather hard to debug some issues in CI and buildfarm. As this can cause issues not just to-be-added tests, but also existing ones, backpatch the fix to all supported versions. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/wmovm6xcbwh7twdtymxuboaoarbvwj2haasd3sikzlb3dkgz76@n45rzycluzft Backpatch-through: 13
* Add ATAlterConstraint struct for ALTER .. CONSTRAINTÁlvaro Herrera2025-02-19
| | | | | | | | | | | | | | | | | | | | | | Replace the use of Constraint with a new ATAlterConstraint struct, which allows us to pass additional information. No functionality is added by this commit. This is necessary for future work that allows altering constraints in other ways. I (Álvaro) took the liberty of restructuring the code for ALTER CONSTRAINT beyond what Amul did. The original coding before Amul's patch was unnecessarily baroque, and this change makes things simpler by removing one level of subroutine. Also, partly remove the assumption that only partitioned tables are relevant (by passing sensible 'recurse' arguments) and no longer ignore whether ONLY was specified. I say 'partly' because the current coding only walks down via the 'conparentid' relationship, which is only used for partitioned tables; but future patches could handle ONLY or not for other types of constraint changes for legacy inheritance trees too. Author: Amul Sul <sulamul@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/CAAJ_b94bfgPV-8Mw_HwSBeheVwaK9=5s+7+KbBj_NpwXQFgDGg@mail.gmail.com
* Improve statistics estimation for single-column GROUP BY in sub-queriesAlexander Korotkov2025-02-19
| | | | | | | | | | | This commit follows the idea of the 4767bc8ff2. If sub-query has only one GROUP BY column, we can consider its output variable as being unique. We can employ this fact in the statistics to make more precise estimations in the upper query block. Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
* Add a test for commit ac0e33136a using the injection point.Amit Kapila2025-02-19
| | | | | | | | | | | | | This test uses an injection point to bypass the time overhead caused by the idle_replication_slot_timeout GUC, which has a minimum value of one minute. Author: Hayato Kuroda <kuroda.hayato@fujitsu.com> Author: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com
* Add support for LIKE in CREATE FOREIGN TABLEMichael Paquier2025-02-19
| | | | | | | | | | | | | | | | | | LIKE enables the creation of foreign tables based on the column definitions, constraints and objects of the defined source relation(s). This feature mirrors the behavior of CREATE TABLE LIKE, but ignores the INCLUDING sub-options that do not make sense for foreign tables: INDEXES, COMPRESSION, IDENTITY and STORAGE. The supported sub-options are COMMENTS, CONSTRAINTS, DEFAULTS, GENERATED and STATISTICS, mapping with the clauses already supported by the command. Note that the restriction with LIKE in CREATE FOREIGN TABLE was added in a0c6dfeecfcc. Author: Zhang Mingli Reviewed-by: Álvaro Herrera, Sami Imseih, Michael Paquier Discussion: https://postgr.es/m/42d3f855-2275-4361-a42a-826172ca2dc4@Spark
* Invalidate inactive replication slots.Amit Kapila2025-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit introduces idle_replication_slot_timeout GUC that allows inactive slots to be invalidated at the time of checkpoint. Because checkpoints happen checkpoint_timeout intervals, there can be some lag between when the idle_replication_slot_timeout was exceeded and when the slot invalidation is triggered at the next checkpoint. To avoid such lags, users can force a checkpoint to promptly invalidate inactive slots. Note that the idle timeout invalidation mechanism is not applicable for slots that do not reserve WAL or for slots on the standby server that are synced from the primary server (i.e., standby slots having 'synced' field 'true'). Synced slots are always considered to be inactive because they don't perform logical decoding to produce changes. The slots can become inactive for a long period if a subscriber is down due to a system error or inaccessible because of network issues. If such a situation persists, it might be more practical to recreate the subscriber rather than attempt to recover the node and wait for it to catch up which could be time-consuming. Then, external tools could create replication slots (e.g., for migrations or upgrades) that may fail to remove them if an error occurs, leaving behind unused slots that take up space and resources. Manually cleaning them up can be tedious and error-prone, and without intervention, these lingering slots can cause unnecessary WAL retention and system bloat. As the duration of idle_replication_slot_timeout is in minutes, any test using that would be time-consuming. We are planning to commit a follow up patch for tests by using the injection point framework. Author: Nisha Moond <nisha.moond412@gmail.com> Author: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/CALj2ACW4aUe-_uFQOjdWCEN-xXoLGhmvRFnL8SNw_TZ5nJe+aw@mail.gmail.com Discussion: https://postgr.es/m/OS0PR01MB5716C131A7D80DAE8CB9E88794FC2@OS0PR01MB5716.jpnprd01.prod.outlook.com
* Update to latest Snowball sources.Tom Lane2025-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | It's been some time since we did this, partly because the upstream snowball project hasn't formally tagged a new release since 2021. The main motivation for doing it now is to absorb a bug fix (their commit e322673a841d9abd69994ae8cd20e191090b6ef4), which prevents a null pointer dereference crash if SN_create_env() gets a malloc failure at just the wrong point. We'll patch the back branches with only that change, but we might as well do the full sync dance on HEAD. Aside from a bunch of mostly-minor tweaks to existing stemmers, this update adds a new stemmer for Estonian. It also removes the existing stemmer for Romanian using ISO-8859-2 encoding. Upstream apparently concluded that ISO-8859-2 doesn't provide an adequate representation of some Romanian characters, and the UTF-8 implementation should be used instead. While at it, update the README's instructions for doing a sync, which have not been adjusted during the addition of meson tooling. Thanks to Maksim Korotkov for discovering the null-pointer bug and submitting the fix to upstream snowball. Reported-by: Maksim Korotkov <m.korotkov@postgrespro.ru> Discussion: https://postgr.es/m/1d1a46-67ab1000-21-80c451@83151435
* Fix unsafe access to BufferDescriptorsRichard Guo2025-02-19
| | | | | | | | | | | | | | | | When considering a local buffer, the GetBufferDescriptor() call in BufferGetLSNAtomic() would be retrieving a shared buffer with a bad buffer ID. Since the code checks whether the buffer is shared before using the retrieved BufferDesc, this issue did not lead to any malfunction. Nonetheless this seems like trouble waiting to happen, so fix it by ensuring that GetBufferDescriptor() is only called when we know the buffer is shared. Author: Tender Wang <tndrwang@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/CAHewXNku-o46-9cmUgyv6LkSZ25doDrWq32p=oz9kfD8ovVJMg@mail.gmail.com Backpatch-through: 13
* Fix freeing a child join's SpecialJoinInfoRichard Guo2025-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In try_partitionwise_join, we try to break down the join between two partitioned relations into joins between matching partitions. To achieve this, we iterate through each pair of partitions from the two joining relations and create child join relations for them. To reduce memory accumulation during each iteration, one step we take is freeing the SpecialJoinInfos created for the child joins. A child join's SpecialJoinInfo is a copy of the parent join's SpecialJoinInfo, with some members being translated copies of their counterparts in the parent. However, when freeing the bitmapset members in a child join's SpecialJoinInfo, we failed to check whether they were translated copies. As a result, we inadvertently freed the members that were still in use by the parent SpecialJoinInfo, leading to crashes when those freed members were accessed. To fix, check if each member of the child join's SpecialJoinInfo is a translated copy and free it only if that's the case. This requires passing the parent join's SpecialJoinInfo as a parameter to free_child_join_sjinfo. Back-patch to v17 where this bug crept in. Bug: #18806 Reported-by: 孟令彬 <m_lingbin@126.com> Diagnosed-by: Tender Wang <tndrwang@gmail.com> Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/18806-d70b0c9fdf63dcbf@postgresql.org Backpatch-through: 17
* test_escape: Fix handling of short options in getopt_long()Michael Paquier2025-02-19
| | | | | | | | | | | | | This addresses two errors in the module, based on the set of options supported: - '-c', for --conninfo, was not listed. - '-f', for --force-unsupported, was not listed. While on it, these are now listed in an alphabetical order. Author: Japin Li Discussion: https://postgr.es/m/ME0P300MB04451FB20CE0346A59C25CADB6FA2@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM Backpatch-through: 13
* Make the description of some GUCs more consistentMichael Paquier2025-02-19
| | | | | | | | | | | | | This commit improves the description of a couple of GUCs, to be more consistent with the style of their surroundings: * array_nulls * enable_self_join_elimination * optimize_bounded_sort * row_security * synchronize_seqscans Author: Kyotaro Horiguchi Discussion: https://postgr.es/m/20250218.103240.1422205966404509831.horikyota.ntt@gmail.com
* Update outdated comments in nodeAgg.c.Jeff Davis2025-02-18
| | | | | | Author: Zhang Mingli Reviewed-by: Richard Guo Discussion: https://postgr.es/m/198a8d1e-0792-4e7f-828e-902aa342f36e@Spark
* Reduce scope of heap vacuum per_buffer_dataMelanie Plageman2025-02-18
| | | | | | | | | | | | | | | Move lazy_scan_heap()'s per_buffer_data variable into a tighter scope. In lazy_scan_heap()'s phase I heap vacuuming, the read stream API returns a pointer to the next block number to vacuum. As long as read_stream_next_buffer() returns a valid buffer, per_buffer_data should always be valid. Move per_buffer_data into a tighter scope and make sure it is reset to NULL on each iteration so that we get a core dump instead of bogus data from a previous block if something goes wrong in the read stream API. Suggested-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/626104.1739729538%40sss.pgh.pa.us
* Add PGErrorVerbosity to typedefs.listDaniel Gustafsson2025-02-18
| | | | | | | | | | PGErrorVerbosity was missing which resulted in incorrect whitespace alignment going back all the way to e3860ffa4dd0. No backpatch for this though since we don't pgindent backbranches. Author: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAGECzQTVi8n-HW4Q27je-b9ckQk7zf6bS_it42gNvQu+DX0NCQ@mail.gmail.com
* Fix poorly written regression testDavid Rowley2025-02-19
| | | | | | | | | | | | | | | | | | | bd10ec529 added code to allow redundant functionally dependent GROUP BY columns to be removed using unique indexes and NOT NULL constraints as proofs of functional dependency. In that commit, I (David) added a test to ensure that when there are multiple indexes available to remove columns that we pick the index that allows us to remove the most columns. This test was faulty as it assumed the t3 table's primary key index was valid to use as functional dependency proof, but that's not the case since that's defined as deferrable. Here we adjust the tests added by that commit to use the t2 table instead. That's defined with a non-deferrable primary key. Author: songjinzhou <tsinghualucky912@foxmail.com> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Discussion: https://postgr.es/m/tencent_CD414C79D39668455DF80D35143B87634C08@qq.com
* Raise a WARNING for max_slot_wal_keep_size in pg_createsubscriber.Amit Kapila2025-02-18
| | | | | | | | | | | | | | | | During the pg_createsubscriber execution, it is possible that the required WAL is removed from the primary/publisher node due to 'max_slot_wal_keep_size'. This patch raises a WARNING during the '--dry-run' mode if the 'max_slot_wal_keep_size' is set to a non-default value on the primary/publisher node. Author: Shubham Khanna <khannashubham1197@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Vignesh C <vignesh21@gmail.com> Discussion: https://postgr.es/m/CAHv8Rj+deqsQXOMa7Tck8CBQUbsua=+4AuMVQ2=MPM0f-ZHbjA@mail.gmail.com
* Fix typo in 2a8a0067.Thomas Munro2025-02-18
| | | | | | Builds configured with Valgrind but without assertions would fail due to a typo in the recent change. This should be included when back-patching 2a8a0067 into v17.
* Fix translator notes in commentsDaniel Gustafsson2025-02-17
| | | | | | | | | | | | | The translator comments detailing what a %s inclusion refers to were accidentally including too many address types. In practice this is not a problem since it's not a translated string, but to minimize any risk of confusion let's fix them anwyays. Even though this exists in backbranches there is little use for backpatch as the translation work has already happened there, so let's avoid the churn. Author: Japin Li <japinli@hotmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/ME0P300MB04458DE627480614ABE639D2B6FB2@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
* Add tab completion for ALTER USER/ROLE RESETTomas Vondra2025-02-17
| | | | | | | | | | | Currently tab completion for ALTER USER RESET shows a list of all configuration parameters that may be set on a role, irrespectively of which parameters are actually set. This patch improves tab completion to offer only parameters that are set. Author: Robins Tharakan Reviewed-By: Tomas Vondra Discussion: https://postgr.es/m/CAEP4nAzqiT6VbVC5r3nq5byLTnPzjniVGzEMpYcnAHQyNzEuaw%40mail.gmail.com
* Add tab completion for ALTER DATABASE RESETTomas Vondra2025-02-17
| | | | | | | | | | | Currently tab completion for ALTER DATABASE RESET shows a list of all configuration parameters that may be set on a database, irrespectively of which parameters are actually set. This patch improves tab completion to offer only parameters that are set. Author: Robins Tharakan Reviewed-By: Tomas Vondra Discussion: https://postgr.es/m/CAEP4nAzqiT6VbVC5r3nq5byLTnPzjniVGzEMpYcnAHQyNzEuaw%40mail.gmail.com
* Implement Self-Join EliminationAlexander Korotkov2025-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Self-Join Elimination (SJE) feature removes an inner join of a plain table to itself in the query tree if it is proven that the join can be replaced with a scan without impacting the query result. Self-join and inner relation get replaced with the outer in query, equivalence classes, and planner info structures. Also, the inner restrictlist moves to the outer one with the removal of duplicated clauses. Thus, this optimization reduces the length of the range table list (this especially makes sense for partitioned relations), reduces the number of restriction clauses and, in turn, selectivity estimations, and potentially improves total planner prediction for the query. This feature is dedicated to avoiding redundancy, which can appear after pull-up transformations or the creation of an EquivalenceClass-derived clause like the below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3); SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x); SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x; In the future, we could also reduce redundancy caused by subquery pull-up after unnecessary outer join removal in cases like the one below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x); Also, it can drastically help to join partitioned tables, removing entries even before their expansion. The SJE proof is based on innerrel_is_unique() machinery. We can remove a self-join when for each outer row: 1. At most, one inner row matches the join clause; 2. Each matched inner row must be (physically) the same as the outer one; 3. Inner and outer rows have the same row mark. In this patch, we use the next approach to identify a self-join: 1. Collect all merge-joinable join quals which look like a.x = b.x; 2. Add to the list above the baseretrictinfo of the inner table; 3. Check innerrel_is_unique() for the qual list. If it returns false, skip this pair of joining tables; 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the possibility of self-join elimination, the inner and outer clauses must match exactly. The relation replacement procedure is not trivial and is partly combined with the one used to remove useless left joins. Tests covering this feature were added to join.sql. Some of the existing regression tests changed due to self-join removal logic. Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Simon Riggs <simon@2ndquadrant.com> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org> Reviewed-by: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Hywel Carver <hywel@skillerwhale.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec> Reviewed-by: Michał Kłeczek <michal@kleczek.org> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>
* Revert: Get rid of WALBufMappingLockAlexander Korotkov2025-02-17
| | | | | This commit reverts 6a2275b895. Buildfarm failure on batta spots some concurrency issue, which requires further investigation.
* Fix an oversight in cbc127917 to handle MERGE correctlyAmit Langote2025-02-17
| | | | | | | | | | | | | ExecInitModifyTable() forgot to trim MERGE-related lists to exclude entries for result relations pruned during initial pruning, so fix that. While at it, make the function's use of the pruned resultRelations list, rather than ModifyTable.resultRelations, more consistent. Reported-by: Alexander Lakhin <exclusion@gmail.com> (via sqlsmith) Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/e72c94d9-e5f9-4753-9bc1-69d72bd54b8a@gmail.com
* Add information about WAL buffers full to VACUUM/ANALYZE (VERBOSE)Michael Paquier2025-02-17
| | | | | | | | | | | | | | | | This commit adds the information about the number of times WAL buffers have been full to the logs generated by VACUUM/ANALYZE (VERBOSE) and in the logs generated by autovacuum, complementing the existing information stored by WalUsage. This is the last part of the backend code where the value of wal_buffers_full can be reported, similarly to all the other fields of WalUsage. 320545bfcfee and ce5bcc4a9f26 have done the same for EXPLAIN and pgss. Author: Bertrand Drouvot Reviewed-by: Ilia Evdokimov Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal
* Add information about WAL buffers being full to EXPLAIN (WAL)Michael Paquier2025-02-17
| | | | | | | | | | This is similar to ce5bcc4a9f26, relying on the addition of wal_buffers_full to WalUsage. This time, the information is added to the output generated by EXPLAIN (WAL). Author: Bertrand Drouvot Reviewed-by: Ilia Evdokimov Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal
* Move wal_buffers_full from PgStat_PendingWalStats to WalUsageMichael Paquier2025-02-17
| | | | | | | | | | | | | | | | | | | | | | | | wal_buffers_full has been introduced in pg_stat_wal in 8d9a935965f, as some information providing metrics for the tuning of the GUC wal_buffers. WalUsage has been introduced before that in df3b181499. Moving this field is proving to be beneficial for several reasons: - This information can now be made available in more layers, providing more granularity than just pg_stat_wal, on a per-query basis: EXPLAIN, pgss and VACUUM/ANALYZE logs. - A patch is under discussion to provide statistics for WAL at backend level, and this move simplifies a bit the handling of pending statistics. The remaining data in PgStat_PendingWalStats now relates to write/sync counters and times, with equivalents present in pg_stat_io, that backend statistics are able to already track. So this should cut all the dependencies between PgStat_PendingWalStats and WAL stats at backend level. As of this change, wal_buffers_full only shows in pg_stat_wal. Author: Bertrand Drouvot Reviewed-by: Ilia Evdokimov Discussion: https://postgr.es/m/Z6SOha5YFFgvpwQY@ip-10-97-1-34.eu-west-3.compute.internal
* Get rid of WALBufMappingLockAlexander Korotkov2025-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | Allow multiple backends to initialize WAL buffers concurrently. This way `MemSet((char *) NewPage, 0, XLOG_BLCKSZ);` can run in parallel without taking a single LWLock in exclusive mode. The new algorithm works as follows: * reserve a page for initialization using XLogCtl->InitializeReserved, * ensure the page is written out, * once the page is initialized, try to advance XLogCtl->InitializedUpTo and signal to waiters using XLogCtl->InitializedUpToCondVar condition variable, * repeat previous steps until we reserve initialization up to the target WAL position, * wait until concurrent initialization finishes using a XLogCtl->InitializedUpToCondVar. Now, multiple backends can, in parallel, concurrently reserve pages, initialize them, and advance XLogCtl->InitializedUpTo to point to the latest initialized page. Author: Yura Sokolov <y.sokolov@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>
* Adjust tuples estimate for appendrelsRichard Guo2025-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | In set_append_rel_size(), we currently set rel->tuples to rel->rows for an appendrel. Generally, rel->tuples is the raw number of tuples in the relation and rel->rows is the estimated number of tuples after the relation's restriction clauses have been applied. Although an appendrel itself doesn't directly enforce any quals today, its child relations may. Therefore, setting rel->tuples equal to rel->rows for an appendrel isn't always appropriate. Doing so can lead to issues in cost estimates in some cases. For instance, when estimating the number of distinct values from an appendrel, we would not be able to adjust the estimate based on the restriction selectivity. This patch addresses this by setting an appendrel's tuples to the total number of tuples accumulated from each live child, which better aligns with reality. This is arguably a bug, but nobody has complained about that until now, so no back-patch. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Tender Wang <tndrwang@gmail.com> Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru> Discussion: https://postgr.es/m/CAMbWs4_TG_+kVn6fjG-5GYzzukrNK57=g9eUo4gsrUG26OFawg@mail.gmail.com
* In fmtIdEnc(), handle failure of enlargePQExpBuffer().Tom Lane2025-02-16
| | | | | | | | | | | | | | | | | | Coverity complained that we weren't doing that, and it's right. This fix just makes fmtIdEnc() honor the general convention that OOM causes a PQExpBuffer to become marked "broken", without any immediate error. In the pretty-unlikely case that we actually did hit OOM here, the end result would be to return an empty string to the caller, probably resulting in invalid SQL syntax in an issued command (if nothing else went wrong, which is even more unlikely). It's tempting to throw an "out of memory" error if the buffer becomes broken, but there's not a lot of point in doing that only here and not in hundreds of other PQExpBuffer-using places in pg_dump and similar callers. The whole issue could do with some non-time-crunched redesign, perhaps. This is a followup to the fixes for CVE-2025-1094, and should be included if cherry-picking those fixes.
* Make escaping functions retain trailing bytes of an invalid character.Tom Lane2025-02-15
| | | | | | | | | | | | | | | | | | | | | | Instead of dropping the trailing byte(s) of an invalid or incomplete multibyte character, replace only the first byte with a known-invalid sequence, and process the rest normally. This seems less likely to confuse incautious callers than the behavior adopted in 5dc1e42b4. While we're at it, adjust PQescapeStringInternal to produce at most one bleat about invalid multibyte characters per string. This matches the behavior of PQescapeInternal, and avoids the risk of producing tons of repetitive junk if a long string is simply given in the wrong encoding. This is a followup to the fixes for CVE-2025-1094, and should be included if cherry-picking those fixes. Author: Andres Freund <andres@anarazel.de> Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Jeff Davis <pgsql@j-davis.com> Discussion: https://postgr.es/m/20250215012712.45@rfd.leadboat.com Backpatch-through: 13
* Fix explicit valgrind interaction in read_stream.c.Thomas Munro2025-02-15
| | | | | | | | | | | | | | | | | | | | | By calling wipe_mem() on per-buffer data memory that has been released, we are also telling Valgrind that the memory is "noaccess". We need to set it to "undefined" before giving it to the registered callback to fill in, when a slot is reused. As discovered by build farm animal skink when the VACUUM streamification patches landed (the first users of per-buffer data). Pushing to master only for now, to clear the error on skink. It's also possible that external code might discover the per-buffer data feature in v17, and reasonable to expect Valgrind not to produce spurious memcheck reports, but the back-patch is deferred until after the imminent minor release is out of the way. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Tested-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKG%2Bg6aXpi2FEHqeLOzE%2BxYw%3DOV%2B-N5jhOEnnV%2BF0USM9xA%40mail.gmail.com
* Fix PQescapeLiteral()/PQescapeIdentifier() length handlingAndres Freund2025-02-14
| | | | | | | | | | | | | | | | | | | | | | | | | | In 5dc1e42b4fa I fixed bugs in various escape functions, unfortunately as part of that I introduced a new bug in PQescapeLiteral()/PQescapeIdentifier(). The bug is that I made PQescapeInternal() just use strlen(), rather than taking the specified input length into account. That's bad, because it can lead to including input that wasn't intended to be included (in case len is shorter than null termination of the string) and because it can lead to reading invalid memory if the input string is not null terminated. Expand test_escape to this kind of bug: a) for escape functions with length support, append data that should not be escaped and check that it is not b) add valgrind requests to detect access of bytes that should not be touched Author: Tom Lane <tgl@sss.pgh.pa.us> Author: Andres Freund <andres@anarazel.de Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/Z64jD3u46gObCo1p@pryzbyj2023 Backpatch: 13
* Add delay time to VACUUM/ANALYZE (VERBOSE) and autovacuum logs.Nathan Bossart2025-02-14
| | | | | | | | | | | Commit bb8dff9995 added this information to the pg_stat_progress_vacuum and pg_stat_progress_analyze system views. This commit adds the same information to the output of VACUUM and ANALYZE with the VERBOSE option and to the autovacuum logs. Suggested-by: Masahiro Ikeda <ikedamsh@oss.nttdata.com> Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://postgr.es/m/ZmaXmWDL829fzAVX%40ip-10-97-1-34.eu-west-3.compute.internal
* Use PqMsg_Progress macro in HandleParallelMessage().Nathan Bossart2025-02-14
| | | | | | | Commit a99cc6c6b4 introduced the PqMsg_Progress macro but missed updating HandleParallelMessage() accordingly. Backpatch-through: 17
* Use streaming read I/O in VACUUM's third phaseMelanie Plageman2025-02-14
| | | | | | | | | | | | | Make vacuum's third phase (its second pass over the heap), which reaps dead items collected in the first phase and marks them as reusable, use the read stream API. This commit adds a new read stream callback, vacuum_reap_lp_read_stream_next(), that looks ahead in the TidStore and returns the next block number to read for vacuum. Author: Melanie Plageman <melanieplageman@gmail.com> Co-authored-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGKN3oy0bN_3yv8hd78a4%2BM1tJC9z7mD8%2Bf%2ByA%2BGeoFUwQ%40mail.gmail.com
* Use streaming read I/O in VACUUM's first phaseMelanie Plageman2025-02-14
| | | | | | | | | | Make vacuum's first phase, which prunes and freezes tuples and records dead TIDs, use the read stream API by by converting heap_vac_scan_next_block() to a read stream callback. Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CAAKRu_aLwANZpxHc0tC-6OT0OQT4TftDGkKAO5yigMUOv_Tcsw%40mail.gmail.com
* Convert heap_vac_scan_next_block() boolean parameters to flagsMelanie Plageman2025-02-14
| | | | | | | | | | | | | | | | The read stream API only allows one piece of extra per block state to be passed back to the API user (per_buffer_data). lazy_scan_heap() needs two pieces of per-buffer data: whether or not the block was all-visible in the visibility map and whether or not it was eagerly scanned. Convert these two pieces of information to flags so that they can be populated by heap_vac_scan_next_block() and returned to lazy_scan_heap(). A future commit will turn heap_vac_scan_next_block() into the read stream callback for heap phase I vacuuming. Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CAAKRu_bmx33jTqATP5GKNFYwAg02a9dDtk4U_ciEjgBHZSVkOQ%40mail.gmail.com
* Describe special values in GUC descriptions more consistently.Nathan Bossart2025-02-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many GUCs accept special values like -1 or an empty string to disable the feature, use a system default, etc. While the documentation consistently lists these special values, the GUC descriptions do not. Many such descriptions fail to mention the special values, and those that do vary in phrasing and placement. This commit aims to bring some consistency to this area by applying the following rules: * Special values should be listed at the end of the long description. * Descriptions should use numerals (e.g., "0") instead of words (e.g., "zero"). * Special value mentions should be concise and direct (e.g., "0 disables the timeout.", "An empty string means use the operating system setting."). * Multiple special values should be listed in ascending order. Of course, there are exceptions, such as max_pred_locks_per_relation and search_path, whose special values are too complex to include. And there are cases like listen_addresses, where the meaning of an empty string is arguably too obvious to include. In those cases, I've refrained from adding special value information to the GUC description. Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/Z6aIy4aywxUZHAo6%40nathan
* Fix assertion on dereferenced objectDaniel Gustafsson2025-02-14
| | | | | | | | | | | | | Commit 27cc7cd2bc8a accidentally placed the assertion ensuring that the pointer isn't NULL after it had already been accessed. Fix by moving the pointer dereferencing to after the assertion. Backpatch to all supported branches. Author: Dmitry Koval <d.koval@postgrespro.ru> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/1618848d-cdc7-414b-9c03-08cf4bef4408@postgrespro.ru Backpatch-through: 13
* Remove obsolete comment.Thomas Munro2025-02-14
| | | | | | Commit 755a4c10d19d prevented StartReadBuffers() from crossing md.c segment boundaries in one operation, but a comment about that possibility remained.
* Remove unused parameter from execute_extension_script().Nathan Bossart2025-02-13
| | | | | | | | | This function's schemaOid parameter appears to have never been used for anything. Author: Yugo Nagata <nagata@sraoss.co.jp> Reviewed-by: Fabrízio de Royes Mello <fabriziomello@gmail.com> Discussion: https://postgr.es/m/20250214010218.550ebe4ec1a7c7811a7fa2bb%40sraoss.co.jp
* Remove unnecessary (char *) casts [xlog]Peter Eisentraut2025-02-13
| | | | | | | | Remove (char *) casts no longer needed after XLogRegisterData() and XLogRegisterBufData() argument type change. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org