aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/mmgr
Commit message (Collapse)AuthorAge
* Fix AlignedAllocRealloc to cope sanely with OOM.Tom Lane2025-05-23
| | | | | | | | | | | | | | | | | | | | | | If the inner allocation call returns NULL, we should restore the previous state and return NULL. Previously this code pfree'd the old chunk anyway, which is surely wrong. Also, make it call MemoryContextAllocationFailure rather than summarily returning NULL. The fact that we got control back from the inner call proves that MCXT_ALLOC_NO_OOM was passed, so this change is just cosmetic, but someday it might be less so. This is just a latent bug at present: AFAICT no in-core callers use this function at all, let alone call it with MCXT_ALLOC_NO_OOM. Still, it's the kind of bug that might bite back-patched code pretty hard someday, so let's back-patch to v17 where the bug was introduced (by commit 743112a2e). Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us Backpatch-through: 17
* Revert function to get memory context stats for processesDaniel Gustafsson2025-05-23
| | | | | | | | | Due to concerns raised about the approach, and memory leaks found in sensitive contexts the functionality is reverted. This reverts commits 45e7e8ca9, f8c115a6c, d2a1ed172, 55ef7abf8 and 042a66291 for v18 with an intent to revisit this patch for v19. Discussion: https://postgr.es/m/594293.1747708165@sss.pgh.pa.us
* Revert "Don't lock partitions pruned by initial pruning"Amit Langote2025-05-22
| | | | | | | | | | | | | | | | As pointed out by Tom Lane, the patch introduced fragile and invasive design around plan invalidation handling when locking of prunable partitions was deferred from plancache.c to the executor. In particular, it violated assumptions about CachedPlan immutability and altered executor APIs in ways that are difficult to justify given the added complexity and overhead. This also removes the firstResultRels field added to PlannedStmt in commit 28317de72, which was intended to support deferred locking of certain ModifyTable result relations. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/605328.1747710381@sss.pgh.pa.us
* Fix incorrect "return NULL" in BumpAllocLarge().Tom Lane2025-05-10
| | | | | | | | | | | This must be "return MemoryContextAllocationFailure(context, size, flags)" instead. The effect of this oversight is that if we got a malloc failure right here, the code would act as though MCXT_ALLOC_NO_OOM had been specified, whether it was or not. That would likely lead to a null-pointer-dereference crash at the unsuspecting call site. Noted while messing with a patch to improve our Valgrind leak detection support. Back-patch to v17 where this code came in.
* Convert strncpy to strlcpyDaniel Gustafsson2025-04-30
| | | | | | | | | We try to avoid using strncpy() due to the ease of which it can be misused. Convert this callsite to use strlcpy() instead to match similar codepaths in this file. Suggested-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/2a796830-de2d-4030-b480-d673f6cc5d94@eisentraut.org
* Typo and doc fixups for memory context reportingDaniel Gustafsson2025-04-30
| | | | | | | | | This fixes comment and docs typos as well as a small documentation change to make it clearer. Found via post-commit review. Author: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAH2L28vt16C9xTuK+K7QZvtA3kCNWXOEiT=gEekUw3Xxp9LVQw@mail.gmail.com
* Add missing string terminatorDaniel Gustafsson2025-04-30
| | | | | | | | | | | When copying the string strncpy won't add nul termination since the string length is equal to the length specified. Explicitly set a nul terminator after copying to properly terminate. Found via post-commit review. Author: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Discussion: https://postgr.es/m/CAH2L28vt16C9xTuK+K7QZvtA3kCNWXOEiT=gEekUw3Xxp9LVQw@mail.gmail.com
* MemoryContextCreate: assert parent is valid and different from node.Tom Lane2025-04-21
| | | | | | | | | | | | | The case of "node == parent" might seem impossible, since we just allocated the new node. But it's possible if parent is a dangling reference to a recently-deleted context. In fact, given aset.c's habit of recycling contexts, it's actually rather likely if that's so. If we'd had this assertion before, it would have simplified debugging a recently-identified walsender issue. Reported-by: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAO6_XqoJA7-_G6t7Uqe5nWF3nj+QBGn4F6Ptp=rUGDr0zo+KvA@mail.gmail.com
* Fix a few duplicate words in commentsDavid Rowley2025-04-21
| | | | | | | These are all new to v18 Author: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAApHDvrMcr8XD107H3NV=WHgyBcu=sx5+7=WArr-n_cWUqdFXQ@mail.gmail.com
* Fix typos and grammar in the codeMichael Paquier2025-04-19
| | | | | | | | The large majority of these have been introduced by recent commits done in the v18 development cycle. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/9a7763ab-5252-429d-a943-b28941e0e28b@gmail.com
* Make levels 1-based in pg_log_backend_memory_contexts()David Rowley2025-04-18
| | | | | | | | | | | | | | | | | Both pg_get_process_memory_contexts() and pg_backend_memory_contexts have 1-based levels, whereas pg_log_backend_memory_contexts() was using 0-based levels. Align these. This results in slightly saner behavior from MemoryContextStatsDetail() in regards to the max_level. Previously it would stop at 1 level before the maximum requested level rather than at that level. Reported-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Author: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Author: David Rowley <drowleyml@gmail.com Reviewed-by: Melih Mutlu <m.melihmutlu@gmail.com> Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Discussion: https://postgr.es/m/395ea5d4fe190480efa95bf533485c70@oss.nttdata.com
* Harmonize function parameter names for Postgres 18.Peter Geoghegan2025-04-12
| | | | | | | | | | Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced during Postgres 18 development. This commit was written with help from clang-tidy, by mechanically applying the same rules as similar clean-up commits (the earliest such commit was commit 035ce1fe).
* Rename global variable backing DSA areaDaniel Gustafsson2025-04-10
| | | | | | | | | | | | The global variable backing the DSA area for Memory Context stats reporting had a too generic name, rename to be more descriptive. Independently reported by Peter and Laurenz. Author: Daniel Gustafsson <daniel@yesql.se> Reported-by: Peter Eisentraut <peter@eisentraut.org> Reported-by: Laurenz Albe <laurenz.albe@cybertec.at> Discussion: https://postgr.es/m/d51172bd4e7f4b07a18a0288ca1b1c28a71a5f6a.camel@cybertec.at Discussion: https://postgr.es/m/25095db5-b595-4b85-9100-d358907c25b5@eisentraut.org
* Add function to get memory context stats for processesDaniel Gustafsson2025-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a function for retrieving memory context statistics and information from backends as well as auxiliary processes. The intended usecase is cluster debugging when under memory pressure or unanticipated memory usage characteristics. When calling the function it sends a signal to the specified process to submit statistics regarding its memory contexts into dynamic shared memory. Each memory context is returned in detail, followed by a cumulative total in case the number of contexts exceed the max allocated amount of shared memory. Each process is limited to use at most 1Mb memory for this. A summary can also be explicitly requested by the user, this will return the TopMemoryContext and a cumulative total of all lower contexts. In order to not block on busy processes the caller specifies the number of seconds during which to retry before timing out. In the case where no statistics are published within the set timeout, the last known statistics are returned, or NULL if no previously published statistics exist. This allows dash- board type queries to continually publish even if the target process is temporarily congested. Context records contain a timestamp to indicate when they were submitted. Author: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Discussion: https://postgr.es/m/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com
* Use PRI?64 instead of "ll?" in format strings (continued).Peter Eisentraut2025-03-29
| | | | | | | Continuation of work started in commit 15a79c73, after initial trial. Author: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/b936d2fb-590d-49c3-a615-92c3a88c6c19%40eisentraut.org
* pg_noreturn to replace pg_attribute_noreturn()Peter Eisentraut2025-03-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to support a "noreturn" decoration on more compilers besides just GCC-compatible ones, but for that we need to move the decoration in front of the function declaration instead of either behind it or wherever, which is the current style afforded by GCC-style attributes. Also rename the macro to "pg_noreturn" to be similar to the C11 standard "noreturn". pg_noreturn is now supported on all compilers that support C11 (using _Noreturn), as well as GCC-compatible ones (using __attribute__, as before), as well as MSVC (using __declspec). (When PostgreSQL requires C11, the latter two variants can be dropped.) Now, all supported compilers effectively support pg_noreturn, so the extra code for !HAVE_PG_ATTRIBUTE_NORETURN can be dropped. This also fixes a possible problem if third-party code includes stdnoreturn.h, because then the current definition of #define pg_attribute_noreturn() __attribute__((noreturn)) would cause an error. Note that the C standard does not support a noreturn attribute on function pointer types. So we have to drop these here. There are only two instances at this time, so it's not a big loss. In one case, we can make up for it by adding the pg_noreturn to a wrapper function and adding a pg_unreachable(), in the other case, the latter was already done before. Reviewed-by: Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/pxr5b3z7jmkpenssra5zroxi7qzzp6eswuggokw64axmdixpnk@zbwxuq7gbbcw
* Add more assertions in palloc0() and palloc_extended()Michael Paquier2025-03-04
| | | | | | | | | | | | | | palloc() includes an assertion checking that an alloc() implementation never returns NULL for all MemoryContextMethods. This commit adds a similar assertion in palloc0(). In palloc_extend(), a different assertion is added, checking that MCXT_ALLOC_NO_OOM is set when an alloc() routine returns NULL. These additions can be useful to catch errors when implementing a new set of MemoryContextMethods routines. Author: Andreas Karlsson <andreas@proxel.se> Discussion: https://postgr.es/m/507e8eba-2035-4a12-a777-98199a66beb8@proxel.se
* Don't lock partitions pruned by initial pruningAmit Langote2025-02-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before executing a cached generic plan, AcquireExecutorLocks() in plancache.c locks all relations in a plan's range table to ensure the plan is safe for execution. However, this locks runtime-prunable relations that will later be pruned during "initial" runtime pruning, introducing unnecessary overhead. This commit defers locking for such relations to executor startup and ensures that if the CachedPlan is invalidated due to concurrent DDL during this window, replanning is triggered. Deferring these locks avoids unnecessary locking overhead for pruned partitions, resulting in significant speedup, particularly when many partitions are pruned during initial runtime pruning. * Changes to locking when executing generic plans: AcquireExecutorLocks() now locks only unprunable relations, that is, those found in PlannedStmt.unprunableRelids (introduced in commit cbc127917e), to avoid locking runtime-prunable partitions unnecessarily. The remaining locks are taken by ExecDoInitialPruning(), which acquires them only for partitions that survive pruning. This deferral does not affect the locks required for permission checking in InitPlan(), which takes place before initial pruning. ExecCheckPermissions() now includes an Assert to verify that all relations undergoing permission checks, none of which can be in the set of runtime-prunable relations, are properly locked. * Plan invalidation handling: Deferring locks introduces a window where prunable relations may be altered by concurrent DDL, invalidating the plan. A new function, ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle invalidation caused by deferred locking. If invalidation occurs, ExecutorStartCachedPlan() updates CachedPlan using the new UpdateCachedPlan() function and retries execution with the updated plan. To ensure all code paths that may be affected by this handle invalidation properly, all callers of ExecutorStart that may execute a PlannedStmt from a CachedPlan have been updated to use ExecutorStartCachedPlan() instead. UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new CachedPlan.stmt_context, created as a child of CachedPlan.context, allows freeing old PlannedStmts while preserving the CachedPlan structure and its statement list. This ensures that loops over statements in upstream callers of ExecutorStartCachedPlan() remain intact. ExecutorStart() and ExecutorStart_hook implementations now return a boolean value indicating whether plan initialization succeeded with a valid PlanState tree in QueryDesc.planstate, or false otherwise, in which case QueryDesc.planstate is NULL. Hook implementations are required to call standard_ExecutorStart() at the beginning, and if it returns false, they should do the same without proceeding. * Testing: To verify these changes, the delay_execution module tests scenarios where cached plans become invalid due to changes in prunable relations after deferred locks. * Note to extension authors: ExecutorStart_hook implementations must verify plan validity after calling standard_ExecutorStart(), as explained earlier. For example: if (prev_ExecutorStart) plan_valid = prev_ExecutorStart(queryDesc, eflags); else plan_valid = standard_ExecutorStart(queryDesc, eflags); if (!plan_valid) return false; <extension-code> return true; Extensions accessing child relations, especially prunable partitions, via ExecGetRangeTableRelation() must now ensure their RT indexes are present in es_unpruned_relids (introduced in commit cbc127917e), or they will encounter an error. This is a strict requirement after this change, as only relations in that set are locked. The idea of deferring some locks to executor startup, allowing locks for prunable partitions to be skipped, was first proposed by Tom Lane. Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions) Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions) Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions) Reviewed-by: Tomas Vondra <tomas@vondra.me> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
* Update copyright for 2025Bruce Momjian2025-01-01
| | | | Backpatch-through: 13
* Remove useless casts to (void *)Peter Eisentraut2024-11-28
| | | | | | | | Many of them just seem to have been copied around for no real reason. Their presence causes (small) risks of hiding actual type mismatches or silently discarding qualifiers Discussion: https://www.postgresql.org/message-id/flat/461ea37c-8b58-43b4-9736-52884e862820@eisentraut.org
* Ignore not-yet-defined Portals in pg_cursors view.Tom Lane2024-10-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | pg_cursor() supposed that any Portal it finds in the hash table must have sourceText set up, but there's an edge case where that is not so. A newly-created Portal has sourceText = NULL, and that doesn't change until PortalDefineQuery is called. In SPI_cursor_open_internal, we perform GetCachedPlan between CreatePortal and PortalDefineQuery, and it's possible for user-defined code to execute during that planning and cause a fetch from the pg_cursors view, resulting in a null-pointer-dereference crash. (It looks like the same could happen in exec_bind_message, but I've not tried to provoke a failure there.) I considered trying to fix this by setting sourceText sooner, but there may be instances of this same calling pattern in extensions, and we couldn't be sure they'd get the memo promptly. It seems better to redefine pg_cursor as not showing Portals that have not yet had PortalDefineQuery called on them, which we can do by just skipping them if sourceText is still NULL. (Before a1c692358, pg_cursor would instead return a row with NULL in the statement column. We could revert to that behavior but it doesn't really seem like a better definition, especially since our documentation doesn't suggest that the column could be NULL.) Per report from PetSerAl. Back-patch to all supported branches. Discussion: https://postgr.es/m/CAKygsHTBXLXjwV43kpZa+Cs+XTiaeeJiZdL4cPBm9f4MTdw7wg@mail.gmail.com
* Fix incorrect sentinel byte logic in GenerationRealloc()David Rowley2024-07-06
| | | | | | | | | | | | | | | | | | | | | This only affects MEMORY_CONTEXT_CHECKING builds. This fixes an off-by-one issue in GenerationRealloc() where the fast-path code which tries to reuse the existing allocation if the existing chunk is >= the new requested size. The code there thought it was always ok to use the existing chunk, but when oldsize == size there isn't enough space to store the sentinel byte. If both sizes matched exactly set_sentinel() would overwrite the first byte beyond the chunk and then subsequent GenerationRealloc() calls could then fail the Assert(chunk->requested_size < oldsize) check which is trying to ensure the chunk is large enough to store the sentinel. The same issue does not exist in aset.c as the sentinel checking code only adds a sentinel byte if there's enough space in the chunk. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/49275921-7b39-41af-5eb8-97b50ce3312e@gmail.com Backpatch-through: 16, where the problem was introduced by 0e480385e
* Preserve CurrentMemoryContext across Start/CommitTransactionCommand.Tom Lane2024-07-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up to now, committing a transaction has caused CurrentMemoryContext to get set to TopMemoryContext. Most callers did not pay any particular heed to this, which is problematic because TopMemoryContext is a long-lived context that never gets reset. If the caller assumes it can leak memory because it's running in a limited-lifespan context, that behavior translates into a session-lifespan memory leak. The first-reported instance of this involved ProcessIncomingNotify, which is called from the main processing loop that normally runs in MessageContext. That outer-loop code assumes that whatever it allocates will be cleaned up when we're done processing the current client message --- but if we service a notify interrupt, then whatever gets allocated before the next switch to MessageContext will be permanently leaked in TopMemoryContext. sinval catchup interrupts have a similar problem, and I strongly suspect that some places in logical replication do too. To fix this in a generic way, let's redefine the behavior as "CommitTransactionCommand restores the memory context that was current at entry to StartTransactionCommand". This clearly fixes the issue for the notify and sinval cases, and it seems to match the mental model that's in use in the logical replication code, to the extent that anybody thought about it there at all. For consistency, likewise make subtransaction exit restore the context that was current at subtransaction start (rather than always selecting the CurTransactionContext of the parent transaction level). This case has less risk of resulting in a permanent leak than the outer-level behavior has, but it would not meet the principle of least surprise for some CommitTransactionCommand calls to restore the previous context while others don't. While we're here, also change xact.c so that we reset TopTransactionContext at transaction exit and then re-use it in later transactions, rather than dropping and recreating it in each cycle. This probably doesn't save a lot given the context recycling mechanism in aset.c, but it should save a little bit. Per suggestion from David Rowley. (Parenthetically, the text in src/backend/utils/mmgr/README implies that this is how I'd planned to implement it as far back as commit 1aebc3618 --- but the code actually added in that commit just drops and recreates it each time.) This commit also cleans up a few places outside xact.c that were needlessly making TopMemoryContext current, and thus risking more leaks of the same kind. I don't think any of them represent significant leak risks today, but let's deal with them while the issue is top-of-mind. Per bug #18512 from wizardbrony. Commit to HEAD only, as this change seems to have some risk of breaking things for some callers. We'll apply a narrower fix for the known-broken cases in the back branches. Discussion: https://postgr.es/m/3478884.1718656625@sss.pgh.pa.us
* Fix typos and duplicate wordsDaniel Gustafsson2024-04-18
| | | | | | | | | | | | This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
* Update mmgr's README to mention BumpContextDavid Rowley2024-04-17
| | | | | | | | | | | | Oversight in 29f6a959c. In passing, since we now have 4 memory context types to choose from, provide a brief overview of the specialities of each memory context type. Reported-by: Amul Sul Author: Amul Sul, David Rowley Discussion: https://postgr.es/m/CAAJ_b94U2s9nHh--DEK=sPEZUQ+x7vQJ7529fF8UAH97QJ9NXg@mail.gmail.com
* Push dedicated BumpBlocks to the tail of the blocks listDavid Rowley2024-04-17
| | | | | | | | | | | | | | | | | | | | | | | BumpContext relies on using the head block from its 'blocks' field to use as the current block to allocate new chunks to. When we receive an allocation request larger than allocChunkLimit, we place these chunks on a new dedicated block and, until now, we pushed the block onto the *head* of the 'blocks' list. This behavior caused the previous bump block to no longer be available for new normal-sized (non-large) allocations and would result in blocks only being partially filled if a large allocation request arrived before the block became full. Here adjust the code to push these dedicated blocks onto the *tail* of the blocks list so that the head block remains intact and available to be used by normal allocation request sizes until it becomes full. In passing, make the elog(ERROR) calls for the unsupported callbacks consistent. Likewise for the header comments for those functions. Discussion: https://postgr.es/m/CAApHDvp9___r-ayJj0nZ6GD3MeCGwGZ0_6ZptWpwj+zqHtmwCw@mail.gmail.com Discussion: https://postgr.es/m/CAApHDvqerXpzUnuDQfUEi3DZA+9=Ud9WSt3ruxN5b6PcOosx2g@mail.gmail.com
* Remove unused BumpBlockIsValid macroDavid Rowley2024-04-10
| | | | | | | | | | | The bump allocator was recently added in 29f6a959c. Our other allocators have a similar macro to this, but seemingly the version of the macro for those allocators is only used in places where the chunk header is decoded. Since the bump allocator has no chunk header, none of those functions exist for bump therefore macro is unused. Remove it. Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/5f724fb2-96e1-4f36-b65b-47b337ad432e@eisentraut.org
* Fix incorrect KeeperBlock macro in bump.cDavid Rowley2024-04-08
| | | | | | | | | | | | | The macro was missing a MAXALIGN around the sizeof(BumpContext) which would cause problems detecting the keeper block on 32-bit systems that have a MAXALIGN value of 8. Thank you to Andres Freund, Tomas Vondra and Tom Lane for investigating and testing. Reported-by: Melanie Plageman, Tomas Vondra Discussion: https://postgr.es/m/CAAKRu_Y6dZjiJEZghgNZp0Gjar1JVq-CH7XGDqExDVHnPgDjuw@mail.gmail.com Discussion: https://postgr.es/m/a4a10b89-6ba8-4abd-b449-019aafff04fc@enterprisedb.com
* Introduce a bump memory allocatorDavid Rowley2024-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces a bump MemoryContext type. The bump context is best suited for short-lived memory contexts which require only allocations of memory and never a pfree or repalloc, which are unsupported. Memory palloc'd into a bump context has no chunk header. This makes bump a useful context type when lots of small allocations need to be done without any need to pfree those allocations. Allocation sizes are rounded up to the next MAXALIGN boundary, so with this and no chunk header, allocations are very compact indeed. Allocations are also very fast as bump does not check any freelists to try and make use of previously free'd chunks. It just checks if there is enough room on the current block, and if so it bumps the freeptr beyond this chunk and returns the value that the freeptr was previously pointing to. Simple and fast. A new block is malloc'd when there's not enough space in the current block. Code using the bump allocator must take care never to call any functions which could try to call realloc() (or variants), pfree(), GetMemoryChunkContext() or GetMemoryChunkSpace() on a bump allocated chunk. Due to lack of chunk headers, these operations are unsupported. To increase the chances of catching such issues, when compiled with MEMORY_CONTEXT_CHECKING, bump allocated chunks are given a header and any attempt to perform an unsupported operation will result in an ERROR. Without MEMORY_CONTEXT_CHECKING, code attempting an unsupported operation could result in a segfault. A follow-on commit will implement the first user of bump. Author: David Rowley Reviewed-by: Nathan Bossart Reviewed-by: Matthias van de Meent Reviewed-by: Tomas Vondra Reviewed-by: John Naylor Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
* Enlarge bit-space for MemoryContextMethodIDDavid Rowley2024-04-07
| | | | | | | | | | | | | | | | Reserve 4 bits for MemoryContextMethodID rather than 3. 3 bits did technically allow a maximum of 8 memory context types, however, we've opted to reserve some bit patterns which left us with only 4 slots, all of which were used. Here we add another bit which frees up 8 slots for future memory context types. In passing, adjust the enum names in MemoryContextMethodID to make it more clear which ones can be used and which ones are reserved. Author: Matthias van de Meent, David Rowley Discussion: https://postgr.es/m/CAApHDvqGSpCU95TmM=Bp=6xjL_nLys4zdZOpfNyWBk97Xrdj2w@mail.gmail.com
* Allow specifying initial and maximum segment sizes for DSA.Masahiko Sawada2024-03-27
| | | | | | | | | | | | | | Previously, the DSA segment size always started with 1MB and grew up to DSA_MAX_SEGMENT_SIZE. It was inconvenient in certain scenarios, such as when the caller desired a soft constraint on the total DSA segment size, limiting it to less than 1MB. This commit introduces the capability to specify the initial and maximum DSA segment sizes when creating a DSA area, providing more flexibility and control over memory usage. Reviewed-by: John Naylor, Tomas Vondra Discussion: https://postgr.es/m/CAD21AoAYGGC1ePjVX0H%2Bpp9rH%3D9vuPK19nNOiu12NprdV5TVJA%40mail.gmail.com
* Make the order of the header file includes consistentPeter Eisentraut2024-03-13
| | | | | | | | Similar to commit 7e735035f20. Author: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-WhpCFMbXCjtJ%2BFzmjfPrp7Hw1pk4p%2BZpU95Kh3ofZ1A%40mail.gmail.com
* Avoid recursion in MemoryContext functionsAlexander Korotkov2024-03-08
| | | | | | | | | | | | | | | | | | | | You might run out of stack space with recursion, which is not nice in functions that might be used e.g. at cleanup after transaction abort. MemoryContext contains pointer to parent and siblings, so we can traverse a tree of contexts iteratively, without using stack. Refactor the functions to do that. MemoryContextStats() still recurses, but it now has a limit to how deep it recurses. Once the limit is reached, it prints just a summary of the rest of the hierarchy, similar to how it summarizes contexts with lots of children. That seems good anyway, because a context dump with hundreds of nested contexts isn't very readable. Report by Egor Chindyaskin and Alexander Lakhin. Discussion: https://postgr.es/m/1672760457.940462079%40f306.i.mail.ru Author: Heikki Linnakangas Reviewed-by: Robert Haas, Andres Freund, Alexander Korotkov, Tom Lane
* Add template for adaptive radix treeJohn Naylor2024-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements a radix tree data structure based on the design in "The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases" by Viktor Leis, Alfons Kemper, and ThomasNeumann, 2013. The main technique that makes it adaptive is using several different node types, each with a different capacity of elements, and a different algorithm for accessing them. The nodes start small and grow/shrink as needed. The main advantage over hash tables is efficient sorted iteration and better memory locality when successive keys are lexicographically close together. The implementation currently assumes 64-bit integer keys, and traversing the tree is in general slower than a linear probing hash table, so this is not a general-purpose associative array. The paper describes two other techniques not implemented here, namely "path compression" and "lazy expansion". These can further reduce memory usage and speed up traversal, but the former would add significant complexity and the latter requires storing the full key with the value. We do trivially compress the path when leading bytes of the key are zeros, however. For value storage, we use "combined pointer/value slots", as recommended in the paper. Values of size equal or smaller than the the platform's pointer type are stored in the array of child pointers in the last level node, while larger values are each stored in a separate allocation. This is for now fixed at compile time, but it would be fairly trivial to allow determining at runtime how variable-length values are stored. One innovation in our implementation compared to the ART paper is decoupling the notion of node "size class" from "kind". The size classes within a given node kind have the same underlying type, but a variable capacity for children, so we can introduce additional node sizes with little additional code. To enable different use cases to specialize for different value types and for shared/local memory, we use macro-templatized code generation in the same manner as simplehash.h and sort_template.h. Future commits will use this infrastructure for storing TIDs. Patch by Masahiko Sawada and John Naylor, but a substantial amount of credit is due to Andres Freund, whose proof-of-concept was a valuable source of coding idioms and awareness of performance pitfalls, and who reviewed earlier versions. Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com
* Remove unused #include's from backend .c filesPeter Eisentraut2024-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | as determined by include-what-you-use (IWYU) While IWYU also suggests to *add* a bunch of #include's (which is its main purpose), this patch does not do that. In some cases, a more specific #include replaces another less specific one. Some manual adjustments of the automatic result: - IWYU currently doesn't know about includes that provide global variable declarations (like -Wmissing-variable-declarations), so those includes are being kept manually. - All includes for port(ability) headers are being kept for now, to play it safe. - No changes of catalog/pg_foo.h to catalog/pg_foo_d.h, to keep the patch from exploding in size. Note that this patch touches just *.c files, so nothing declared in header files changes in hidden ways. As a small example, in src/backend/access/transam/rmgr.c, some IWYU pragma annotations are added to handle a special case there. Discussion: https://www.postgresql.org/message-id/flat/af837490-6b2f-46df-ba05-37ea6a6653fc%40eisentraut.org
* Optimize GenerationAlloc() and SlabAlloc()David Rowley2024-03-04
| | | | | | | | | | | | | | | | | | | | | | | | In a similar effort to 413c18401, separate out the hot and cold paths in GenerationAlloc() and SlabAlloc() to avoid having to setup the stack frame for the hot path. This additionally adjusts how we use the GenerationContext's freeblock. Freeblock, when set, is now always empty and we only switch to using it when the current allocation request finds the current block does not have enough space and the freeblock is large enough to accomodate the allocation. This commit also adjusts GenerationFree() so that if we pfree the final allocation in the current generation block, we now mark that block as empty and keep it as the current block. Previously we free'd that block and set the current block to NULL. Doing that meant we needed a special case in GenerationAlloc to check if GenerationContext.block was NULL. So this both reduces free/malloc calls and reduces the work done in GenerationAlloc(). In passing, improve some comments in aset.c Discussion: https://postgr.es/m/CAApHDvpHVSJqqb4B4OZLixr=CotKq-eKkbwZqvZVo_biYvUvQA@mail.gmail.com
* Fix integer underflow in shared memory debuggingDaniel Gustafsson2024-02-29
| | | | | | | | | | | dsa_dump would print a large negative number instead of zero for segment bin 0. Fix by explicitly checking for underflow and add special case for bin 0. Backpatch to all supported versions. Author: Ian Ilyasov <ianilyasov@outlook.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://postgr.es/m/GV1P251MB1004E0D09D117D3CECF9256ECD502@GV1P251MB1004.EURP251.PROD.OUTLOOK.COM Backpatch-through: v12
* Refactor AllocSetAlloc(), separating hot and cold pathsDavid Rowley2024-02-28
| | | | | | | | | | | | | | | | | | Allocating from a free list or from a block which contains enough space already, we deem to be common code paths and want to optimize for those. Having to allocate a new block, either a normal block or a dedicated one for a large allocation, we deem to be less common, therefore we class that as "cold". Both cold paths require a malloc so are going to be slower as a result of that regardless. The main motivation here is to remove the calls to malloc() in the hot path and because of this, the compiler is now free to not bother setting up the stack frame in AllocSetAlloc(), thus making the hot path much cheaper. Author: Andres Freund Reviewed-by: David Rowley Discussion: https://postgr.es/m/20210719195950.gavgs6ujzmjfaiig@alap3.anarazel.de
* Adjust memory allocation functions to allow sibling callsDavid Rowley2024-02-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many modern compilers are able to optimize function calls to functions where the parameters of the called function match a leading subset of the calling function's parameters. If there are no instructions in the calling function after the function is called, then the compiler is free to avoid any stack frame setup and implement the function call as a "jmp" rather than a "call". This is called sibling call optimization. Here we adjust the memory allocation functions in mcxt.c to allow this optimization. This requires moving some responsibility into the memory context implementations themselves. It's now the responsibility of the MemoryContext to check for malloc failures. This is good as it both allows the sibling call optimization, but also because most small and medium allocations won't call malloc and just allocate memory to an existing block. That can't fail, so checking for NULLs in that case isn't required. Also, traditionally it's been the responsibility of palloc and the other allocation functions in mcxt.c to check for invalid allocation size requests. Here we also move the responsibility of checking that into the MemoryContext. This isn't to allow the sibling call optimization, but more because most of our allocators handle large allocations separately and we can just add the size check when doing large allocations. We no longer check this for non-large allocations at all. To make checking the allocation request sizes and ERROR handling easier, add some helper functions to mcxt.c for the allocators to use. Author: Andres Freund Reviewed-by: David Rowley Discussion: https://postgr.es/m/20210719195950.gavgs6ujzmjfaiig@alap3.anarazel.de
* Add EXPLAIN (MEMORY) to report planner memory consumptionAlvaro Herrera2024-01-29
| | | | | | | | | | | | | | | | | | | | This adds a new "Memory:" line under the "Planning:" group (which currently only has "Buffers:") when the MEMORY option is specified. In order to make the reporting reasonably accurate, we create a separate memory context for planner activities, to be used only when this option is given. The total amount of memory allocated by that context is reported as "allocated"; we subtract memory in the context's freelists from that and report that result as "used". We use MemoryContextStatsInternal() to obtain the quantities. The code structure to show buffer usage during planning was not in amazing shape, so I (Álvaro) modified the patch a bit to clean that up in passing. Author: Ashutosh Bapat Reviewed-by: David Rowley, Andrey Lepikhov, Jian He, Andy Fan Discussion: https://www.postgresql.org/message-id/CAExHW5sZA=5LJ_ZPpRO-w09ck8z9p7eaYAqq3Ks9GDfhrxeWBw@mail.gmail.com
* Update comment, generation mem contexts have a "keeper" blockHeikki Linnakangas2024-01-26
| | | | | The keeper block was introduced in commit 1b0d9aa4f7, but it forgot to update this comment.
* Update copyright for 2024Bruce Momjian2024-01-03
| | | | | | | | Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12
* Simplify newNode() by removing special casesHeikki Linnakangas2023-12-19
| | | | | | | | | | | | | | | | | | | | | - Remove MemoryContextAllocZeroAligned(). It was supposed to be a faster version of MemoryContextAllocZero(), but modern compilers turn the MemSetLoop() into a call to memset() anyway, making it more or less identical to MemoryContextAllocZero(). That was the only user of MemSetTest, MemSetLoop, so remove those too, as well as palloc0fast(). - Convert newNode() to a static inline function. When this was originally originally written, it was written as a macro because testing showed that gcc didn't inline the size check as we intended. Modern compiler versions do, and now that it just calls palloc0() there is no size-check to inline anyway. One nice effect is that the palloc0() takes one less argument than MemoryContextAllocZeroAligned(), which saves a few instructions in the callers of newNode(). Reviewed-by: Peter Eisentraut, Tom Lane, John Naylor, Thomas Munro Discussion: https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi
* Fix dsa.c with different resource owners.Heikki Linnakangas2023-11-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The comments in dsa.c suggested that areas were owned by resource owners, but it was not in fact tracked explicitly. The DSM attachments held by the dsa were owned by resource owners, but not the area itself. That led to confusion if you used one resource owner to attach or create the area, but then switched to a different resource owner before allocating or even just accessing the allocations in the area with dsa_get_address(). The additional DSM segments associated with the area would get owned by a different resource owner than the initial segment. To fix, add an explicit 'resowner' field to dsa_area. It replaces the 'mapping_pinned' flag; resowner == NULL now indicates that the mapping is pinned. This is arguably a bug fix, but I'm not backpatching because it doesn't seem to be a live bug in the back branches. In 'master', it is a bug because commit b8bff07daa made ResourceOwners more strict so that you are no longer allowed to remember new resources in a ResourceOwner after you have started to release it. Merely accessing a dsa pointer might need to attach a new DSM segment, and before this commit it was temporarily remembered in the current owner for a very brief period even if the DSA was pinned. And that could happen in AtEOXact_PgStat(), which is called after the owner is already released. Reported-by: Alexander Lakhin Reviewed-by: Alexander Lakhin, Thomas Munro, Andres Freund Discussion: https://www.postgresql.org/message-id/11b70743-c5f3-3910-8e5b-dd6c115ff829%40gmail.com
* Shrink memory contexts struct sizesDavid Rowley2023-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | Here we reduce the block size fields in AllocSetContext, GenerationContext and SlabContext from Size down to uint32. Ever since c6e0fe1f2, blocks for non-dedicated palloc chunks can no longer be larger than 1GB, so there's no need to store the various block size fields as 64-bit values. 32 bits are enough to store 2^30. Here we also further reduce the memory context struct sizes by getting rid of the 'keeper' field which stores a pointer to the context's keeper block. All the context types which have this field always allocate the keeper block in the same allocation as the memory context itself, so the keeper block always comes right at the end of the context struct. Add some macros to calculate that address rather than storing it in the context. Overall, in AllocSetContext and GenerationContext, this saves 20 bytes on 64-bit builds which for ALLOCSET_SMALL_SIZES can sometimes mean the difference between having to allocate a 2nd block and storing all the required allocations on the keeper block alone. Such contexts are used in relcache to store cache entries for indexes, of which there can be a large number in a single backend. Author: Melih Mutlu Reviewed-by: David Rowley Discussion: https://postgr.es/m/CAGPVpCSOW3uJ1QJmsMR9_oE3X7fG_z4q0AoU4R_w+2RzvroPFg@mail.gmail.com
* Re-bin segment when memory pages are freed.Thomas Munro2023-07-04
| | | | | | | | | | | | | | | | | | | | | It's OK to be lazy about re-binning memory segments when allocating, because that can only leave segments in a bin that's too high. We'll search higher bins if necessary while allocating next time, and also eventually re-bin, so no memory can become unreachable that way. However, when freeing memory, the largest contiguous range of free pages might go up, so we should re-bin eagerly to make sure we don't leave the segment in a bin that is too low for get_best_segment() to find. The re-binning code is moved into a function of its own, so it can be called whenever free pages are returned to the segment's free page map. Back-patch to all supported releases. Author: Dongming Liu <ldming101@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier version) Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://postgr.es/m/CAL1p7e8LzB2LSeAXo2pXCW4%2BRya9s0sJ3G_ReKOU%3DAjSUWjHWQ%40mail.gmail.com
* Pre-beta mechanical code beautification.Tom Lane2023-05-19
| | | | | | | | | | | | | | | Run pgindent, pgperltidy, and reformat-dat-files. This set of diffs is a bit larger than typical. We've updated to pg_bsd_indent 2.1.2, which properly indents variable declarations that have multi-line initialization expressions (the continuation lines are now indented one tab stop). We've also updated to perltidy version 20230309 and changed some of its settings, which reduces its desire to add whitespace to lines to make assignments etc. line up. Going forward, that should make for fewer random-seeming changes to existing code. Discussion: https://postgr.es/m/20230428092545.qfb3y5wcu4cm75ur@alvherre.pgsql
* Fix various typos and incorrect/outdated name referencesDavid Rowley2023-04-19
| | | | | Author: Alexander Lakhin Discussion: https://postgr.es/m/699beab4-a6ca-92c9-f152-f559caf6dc25@gmail.com
* Fix various typosDavid Rowley2023-04-18
| | | | | | | | | | | | This fixes many spelling mistakes in comments, but a few references to invalid parameter names, function names and option names too in comments and also some in string constants Also, fix an #undef that was undefining the incorrect definition Author: Alexander Lakhin Reviewed-by: Justin Pryzby Discussion: https://postgr.es/m/d5f68d19-c0fc-91a9-118d-7c6a5a3f5fad@gmail.com
* Adjust Valgrind macro usage to protect chunk headersDavid Rowley2023-04-15
| | | | | | | | | | | | | | Prior to this commit we only ever protected MemoryChunk's requested_size field with Valgrind NOACCESS. This means that if the hdrmask field is ever accessed accidentally then we're not going to get any warnings from Valgrind about it. Valgrind would have warned us about the problem fixed in 92957ed98 had we already been doing this. Per suggestion from Tom Lane Reviewed-by: Richard Guo Discussion: https://postgr.es/m/1650235.1672694719@sss.pgh.pa.us Discussion: https://postgr.es/m/CAApHDvr=FZNGbj252Z6M9BSFKoq6BMxgkQ2yEAGUYoo7RquqZg@mail.gmail.com