1 files changed, 145 insertions, 173 deletions
diff --git a/src/backend/utils/mmgr/README b/src/backend/utils/mmgr/README
index f97d7653de0..b83b29c268f 100644
--- a/src/backend/utils/mmgr/README
+++ b/src/backend/utils/mmgr/README
@@ -1,15 +1,7 @@
 src/backend/utils/mmgr/README
 
-Notes About Memory Allocation Redesign
-======================================
-
-Up through version 7.0, Postgres had serious problems with memory leakage
-during large queries that process a lot of pass-by-reference data.  There
-was no provision for recycling memory until end of query.  This needed to be
-fixed, even more so with the advent of TOAST which allows very large chunks
-of data to be passed around in the system.  This document describes the new
-memory management system implemented in 7.1.
-
+Memory Context System Design Overview
+=====================================
 
 Background
 ----------
@@ -38,10 +30,10 @@ to or get more memory from the same context the chunk was originally
 allocated in.
 
 At all times there is a "current" context denoted by the
-CurrentMemoryContext global variable.  The backend macro palloc()
-implicitly allocates space in that context.  The MemoryContextSwitchTo()
-operation selects a new current context (and returns the previous context,
-so that the caller can restore the previous context before exiting).
+CurrentMemoryContext global variable.  palloc() implicitly allocates space
+in that context.  The MemoryContextSwitchTo() operation selects a new current
+context (and returns the previous context, so that the caller can restore the
+previous context before exiting).
 
 The main advantage of memory contexts over plain use of malloc/free is
 that the entire contents of a memory context can be freed easily, without
@@ -60,8 +52,10 @@ The behavior of palloc and friends is similar to the standard C library's
 malloc and friends, but there are some deliberate differences too.  Here
 are some notes to clarify the behavior.
 
-* If out of memory, palloc and repalloc exit via elog(ERROR).  They never
-return NULL, and it is not necessary or useful to test for such a result.
+* If out of memory, palloc and repalloc exit via elog(ERROR).  They
+never return NULL, and it is not necessary or useful to test for such
+a result.  With palloc_extended() that behavior can be overridden
+using the MCXT_ALLOC_NO_OOM flag.
 
 * palloc(0) is explicitly a valid operation.  It does not return a NULL
 pointer, but a valid chunk of which no bytes may be used.  However, the
@@ -71,28 +65,18 @@ error.  Similarly, repalloc allows realloc'ing to zero size.
 * pfree and repalloc do not accept a NULL pointer.  This is intentional.
 
 
-pfree/repalloc No Longer Depend On CurrentMemoryContext
--------------------------------------------------------
-
-Since Postgres 7.1, pfree() and repalloc() can be applied to any chunk
-whether it belongs to CurrentMemoryContext or not --- the chunk's owning
-context will be invoked to handle the operation, regardless.  This is a
-change from the old requirement that CurrentMemoryContext must be set
-to the same context the memory was allocated from before one can use
-pfree() or repalloc().
-
-There was some consideration of getting rid of CurrentMemoryContext entirely,
-instead requiring the target memory context for allocation to be specified
-explicitly.  But we decided that would be too much notational overhead ---
-we'd have to pass an appropriate memory context to called routines in
-many places.  For example, the copyObject routines would need to be passed
-a context, as would function execution routines that return a
-pass-by-reference datatype.  And what of routines that temporarily
-allocate space internally, but don't return it to their caller?  We
-certainly don't want to clutter every call in the system with "here is
-a context to use for any temporary memory allocation you might want to
-do".  So there'd still need to be a global variable specifying a suitable
-temporary-allocation context.  That might as well be CurrentMemoryContext.
+The Current Memory Context
+--------------------------
+
+Because it would be too much notational overhead to always pass an
+appropriate memory context to called routines, there always exists the
+notion of the current memory context CurrentMemoryContext.  Without it,
+for example, the copyObject routines would need to be passed a context, as
+would function execution routines that return a pass-by-reference
+datatype.  Similarly for routines that temporarily allocate space
+internally, but don't return it to their caller?  We certainly don't
+want to clutter every call in the system with "here is a context to
+use for any temporary memory allocation you might want to do".
 
 The upshot of that reasoning, though, is that CurrentMemoryContext should
 generally point at a short-lifespan context if at all possible.  During
@@ -102,42 +86,83 @@ context having greater than transaction lifespan, since doing so risks
 permanent memory leaks.
 
 
-Additions to the Memory-Context Mechanism
------------------------------------------
-
-Before 7.1 memory contexts were all independent, but it was too hard to
-keep track of them; with lots of contexts there needs to be explicit
-mechanism for that.
-
-We solved this by creating a tree of "parent" and "child" contexts.  When
-creating a memory context, the new context can be specified to be a child
-of some existing context.  A context can have many children, but only one
-parent.  In this way the contexts form a forest (not necessarily a single
-tree, since there could be more than one top-level context; although in
-current practice there is only one top context, TopMemoryContext).
-
-We then say that resetting or deleting any particular context resets or
-deletes all its direct and indirect children as well.  This feature allows
-us to manage a lot of contexts without fear that some will be leaked; we
-only need to keep track of one top-level context that we are going to
-delete at transaction end, and make sure that any shorter-lived contexts
-we create are descendants of that context.  Since the tree can have
-multiple levels, we can deal easily with nested lifetimes of storage,
-such as per-transaction, per-statement, per-scan, per-tuple.  Storage
-lifetimes that only partially overlap can be handled by allocating
-from different trees of the context forest (there are some examples
-in the next section).
-
-Actually, it turns out that resetting a given context should almost
-always imply deleting, not just resetting, any child contexts it has.
-So MemoryContextReset() means that, and if you really do want a tree of
-empty contexts you need to call MemoryContextResetOnly() plus
-MemoryContextResetChildren().
+pfree/repalloc Do Not Depend On CurrentMemoryContext
+----------------------------------------------------
+
+pfree() and repalloc() can be applied to any chunk whether it belongs
+to CurrentMemoryContext or not --- the chunk's owning context will be
+invoked to handle the operation, regardless.
+
+
+"Parent" and "Child" Contexts
+-----------------------------
+
+If all contexts were independent, it'd be hard to keep track of them,
+especially in error cases.  That is solved this by creating a tree of
+"parent" and "child" contexts.  When creating a memory context, the
+new context can be specified to be a child of some existing context.
+A context can have many children, but only one parent.  In this way
+the contexts form a forest (not necessarily a single tree, since there
+could be more than one top-level context; although in current practice
+there is only one top context, TopMemoryContext).
+
+Deleting a context deletes all its direct and indirect children as
+well.  When resetting a context it's almost always more useful to
+delete child contexts, thus MemoryContextReset() means that, and if
+you really do want a tree of empty contexts you need to call
+MemoryContextResetOnly() plus MemoryContextResetChildren().
+
+These features allow us to manage a lot of contexts without fear that
+some will be leaked; we only need to keep track of one top-level
+context that we are going to delete at transaction end, and make sure
+that any shorter-lived contexts we create are descendants of that
+context.  Since the tree can have multiple levels, we can deal easily
+with nested lifetimes of storage, such as per-transaction,
+per-statement, per-scan, per-tuple.  Storage lifetimes that only
+partially overlap can be handled by allocating from different trees of
+the context forest (there are some examples in the next section).
 
 For convenience we also provide operations like "reset/delete all children
 of a given context, but don't reset or delete that context itself".
 
 
+Memory Context Reset/Delete Callbacks
+-------------------------------------
+
+A feature introduced in Postgres 9.5 allows memory contexts to be used
+for managing more resources than just plain palloc'd memory.  This is
+done by registering a "reset callback function" for a memory context.
+Such a function will be called, once, just before the context is next
+reset or deleted.  It can be used to give up resources that are in some
+sense associated with an object allocated within the context.  Possible
+use-cases include
+* closing open files associated with a tuplesort object;
+* releasing reference counts on long-lived cache objects that are held
+  by some object within the context being reset;
+* freeing malloc-managed memory associated with some palloc'd object.
+That last case would just represent bad programming practice for pure
+Postgres code; better to have made all the allocations using palloc,
+in the target context or some child context.  However, it could well
+come in handy for code that interfaces to non-Postgres libraries.
+
+Any number of reset callbacks can be established for a memory context;
+they are called in reverse order of registration.  Also, callbacks
+attached to child contexts are called before callbacks attached to
+parent contexts, if a tree of contexts is being reset or deleted.
+
+The API for this requires the caller to provide a MemoryContextCallback
+memory chunk to hold the state for a callback.  Typically this should be
+allocated in the same context it is logically attached to, so that it
+will be released automatically after use.  The reason for asking the
+caller to provide this memory is that in most usage scenarios, the caller
+will be creating some larger struct within the target context, and the
+MemoryContextCallback struct can be made "for free" without a separate
+palloc() call by including it in this larger struct.
+
+
+Memory Contexts in Practice
+===========================
+
 Globally Known Contexts
 -----------------------
 
@@ -325,83 +350,64 @@ copy step.
 Mechanisms to Allow Multiple Types of Contexts
 ----------------------------------------------
 
-We may want several different types of memory contexts with different
-allocation policies but similar external behavior.  To handle this,
-memory allocation functions will be accessed via function pointers,
-and we will require all context types to obey the conventions given here.
-(As of 2015, there's actually still just one context type; but interest in
-creating other types has never gone away entirely, so we retain this API.)
-
-A memory context is represented by an object like
-
-typedef struct MemoryContextData
-{
-    NodeTag        type;           /* identifies exact kind of context */
-    MemoryContextMethods methods;
-    MemoryContextData *parent;     /* NULL if no parent (toplevel context) */
-    MemoryContextData *firstchild; /* head of linked list of children */
-    MemoryContextData *nextchild;  /* next child of same parent */
-    char          *name;           /* context name (just for debugging) */
-} MemoryContextData, *MemoryContext;
-
-This is essentially an abstract superclass, and the "methods" pointer is
-its virtual function table.  Specific memory context types will use
+To efficiently allow for different allocation patterns, and for
+experimentation, we allow for different types of memory contexts with
+different allocation policies but similar external behavior.  To
+handle this, memory allocation functions are accessed via function
+pointers, and we require all context types to obey the conventions
+given here.
+
+A memory context is represented by struct MemoryContextData (see
+memnodes.h). This struct identifies the exact type of the context, and
+contains information common between the different types of
+MemoryContext like the parent and child contexts, and the name of the
+context.
+
+This is essentially an abstract superclass, and the behavior is
+determined by the "methods" pointer is its virtual function table
+(struct MemoryContextMethods).  Specific memory context types will use
 derived structs having these fields as their first fields.  All the
-contexts of a specific type will have methods pointers that point to the
-same static table of function pointers, which look like
-
-typedef struct MemoryContextMethodsData
-{
-    Pointer     (*alloc) (MemoryContext c, Size size);
-    void        (*free_p) (Pointer chunk);
-    Pointer     (*realloc) (Pointer chunk, Size newsize);
-    void        (*reset) (MemoryContext c);
-    void        (*delete) (MemoryContext c);
-} MemoryContextMethodsData, *MemoryContextMethods;
-
-Alloc, reset, and delete requests will take a MemoryContext pointer
-as parameter, so they'll have no trouble finding the method pointer
-to call.  Free and realloc are trickier.  To make those work, we
-require all memory context types to produce allocated chunks that
-are immediately preceded by a standard chunk header, which has the
-layout
-
-typedef struct StandardChunkHeader
-{
-    MemoryContext mycontext;         /* Link to owning context object */
-    Size          size;              /* Allocated size of chunk */
-};
-
-It turns out that the pre-existing aset.c memory context type did this
-already, and probably any other kind of context would need to have the
-same data available to support realloc, so this is not really creating
-any additional overhead.  (Note that if a context type needs more per-
-allocated-chunk information than this, it can make an additional
-nonstandard header that precedes the standard header.  So we're not
-constraining context-type designers very much.)
-
-Given this, the pfree routine looks something like
-
-    StandardChunkHeader * header =
-        (StandardChunkHeader *) ((char *) p - sizeof(StandardChunkHeader));
-
-    (*header->mycontext->methods->free_p) (p);
+contexts of a specific type will have methods pointers that point to
+the same static table of function pointers.
+
+While operations like allocating from and resetting a context take the
+relevant MemoryContext as a parameter, operations like free and
+realloc are trickier.  To make those work, we require all memory
+context types to produce allocated chunks that are immediately,
+without any padding, preceded by a pointer to the corresponding
+MemoryContext.
+
+If a type of allocator needs additional information about its chunks,
+like e.g. the size of the allocation, that information can in turn
+precede the MemoryContext.  This means the only overhead implied by
+the memory context mechanism is a pointer to its context, so we're not
+constraining context-type designers very much.
+
+Given this, routines like pfree their corresponding context with an
+operation like (although that is usually encapsulated in
+GetMemoryChunkContext())
+
+    MemoryContext context = *(MemoryContext*) (((char *) pointer) - sizeof(void *));
+
+and then invoke the corresponding method for the context
+
+    (*context->methods->free_p) (p);
 
 
 More Control Over aset.c Behavior
 ---------------------------------
 
-Previously, aset.c always allocated an 8K block upon the first allocation
-in a context, and doubled that size for each successive block request.
-That's good behavior for a context that might hold *lots* of data, and
-the overhead wasn't bad when we had only a few contexts in existence.
-With dozens if not hundreds of smaller contexts in the system, we need
-to be able to fine-tune things a little better.
+By default aset.c always allocates an 8K block upon the first
+allocation in a context, and doubles that size for each successive
+block request.  That's good behavior for a context that might hold
+*lots* of data.  But if there are dozens if not hundreds of smaller
+contexts in the system, we need to be able to fine-tune things a
+little better.
 
-The creator of a context is now able to specify an initial block size
-and a maximum block size.  Selecting smaller values can prevent wastage
-of space in contexts that aren't expected to hold very much (an example is
-the relcache's per-relation contexts).
+The creator of a context is able to specify an initial block size and
+a maximum block size.  Selecting smaller values can prevent wastage of
+space in contexts that aren't expected to hold very much (an example
+is the relcache's per-relation contexts).
 
 Also, it is possible to specify a minimum context size.  If this
 value is greater than zero then a block of that size will be grabbed
@@ -414,37 +420,3 @@ will not allocate very much space per tuple cycle.  To make this usage
 pattern cheap, the first block allocated in a context is not given
 back to malloc() during reset, but just cleared.  This avoids malloc
 thrashing.
-
-
-Memory Context Reset/Delete Callbacks
--------------------------------------
-
-A feature introduced in Postgres 9.5 allows memory contexts to be used
-for managing more resources than just plain palloc'd memory.  This is
-done by registering a "reset callback function" for a memory context.
-Such a function will be called, once, just before the context is next
-reset or deleted.  It can be used to give up resources that are in some
-sense associated with an object allocated within the context.  Possible
-use-cases include
-* closing open files associated with a tuplesort object;
-* releasing reference counts on long-lived cache objects that are held
-  by some object within the context being reset;
-* freeing malloc-managed memory associated with some palloc'd object.
-That last case would just represent bad programming practice for pure
-Postgres code; better to have made all the allocations using palloc,
-in the target context or some child context.  However, it could well
-come in handy for code that interfaces to non-Postgres libraries.
-
-Any number of reset callbacks can be established for a memory context;
-they are called in reverse order of registration.  Also, callbacks
-attached to child contexts are called before callbacks attached to
-parent contexts, if a tree of contexts is being reset or deleted.
-
-The API for this requires the caller to provide a MemoryContextCallback
-memory chunk to hold the state for a callback.  Typically this should be
-allocated in the same context it is logically attached to, so that it
-will be released automatically after use.  The reason for asking the
-caller to provide this memory is that in most usage scenarios, the caller
-will be creating some larger struct within the target context, and the
-MemoryContextCallback struct can be made "for free" without a separate
-palloc() call by including it in this larger struct.