diff options
author | Peter Geoghegan <pg@bowt.ie> | 2020-07-29 14:14:58 -0700 |
---|---|---|
committer | Peter Geoghegan <pg@bowt.ie> | 2020-07-29 14:14:58 -0700 |
commit | d6c08e29e7bc8bc3bf49764192c4a9c71fc0b097 (patch) | |
tree | 8d0d2cdb7d18504b50a49433f9181130f74186c4 /doc/src | |
parent | 6023b7ea717ca04cf1bd53709d9c862db07eaefb (diff) | |
download | postgresql-d6c08e29e7bc8bc3bf49764192c4a9c71fc0b097.tar.gz postgresql-d6c08e29e7bc8bc3bf49764192c4a9c71fc0b097.zip |
Add hash_mem_multiplier GUC.
Add a GUC that acts as a multiplier on work_mem. It gets applied when
sizing executor node hash tables that were previously size constrained
using work_mem alone.
The new GUC can be used to preferentially give hash-based nodes more
memory than the generic work_mem limit. It is intended to enable admin
tuning of the executor's memory usage. Overall system throughput and
system responsiveness can be improved by giving hash-based executor
nodes more memory (especially over sort-based alternatives, which are
often much less sensitive to being memory constrained).
The default value for hash_mem_multiplier is 1.0, which is also the
minimum valid value. This means that hash-based nodes continue to apply
work_mem in the traditional way by default.
hash_mem_multiplier is generally useful. However, it is being added now
due to concerns about hash aggregate performance stability for users
that upgrade to Postgres 13 (which added disk-based hash aggregation in
commit 1f39bce0). While the old hash aggregate behavior risked
out-of-memory errors, it is nevertheless likely that many users actually
benefited. Hash agg's previous indifference to work_mem during query
execution was not just faster; it also accidentally made aggregation
resilient to grouping estimate problems (at least in cases where this
didn't create destabilizing memory pressure).
hash_mem_multiplier can provide a certain kind of continuity with the
behavior of Postgres 12 hash aggregates in cases where the planner
incorrectly estimates that all groups (plus related allocations) will
fit in work_mem/hash_mem. This seems necessary because hash-based
aggregation is usually much slower when only a small fraction of all
groups can fit. Even when it isn't possible to totally avoid hash
aggregates that spill, giving hash aggregation more memory will reliably
improve performance (the same cannot be said for external sort
operations, which appear to be almost unaffected by memory availability
provided it's at least possible to get a single merge pass).
The PostgreSQL 13 release notes should advise users that increasing
hash_mem_multiplier can help with performance regressions associated
with hash aggregation. That can be taken care of by a later commit.
Author: Peter Geoghegan
Reviewed-By: Álvaro Herrera, Jeff Davis
Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de
Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com
Backpatch: 13-, where disk-based hash aggregation was introduced.
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/config.sgml | 60 | ||||
-rw-r--r-- | doc/src/sgml/ref/postgres-ref.sgml | 8 | ||||
-rw-r--r-- | doc/src/sgml/runtime.sgml | 10 |
3 files changed, 61 insertions, 17 deletions
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 822bbf1f272..427947cf496 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1690,22 +1690,64 @@ include_dir 'conf.d' </term> <listitem> <para> - Sets the maximum amount of memory to be used by a query operation + Sets the base maximum amount of memory to be used by a query operation (such as a sort or hash table) before writing to temporary disk files. If this value is specified without units, it is taken as kilobytes. The default value is four megabytes (<literal>4MB</literal>). Note that for a complex query, several sort or hash operations might be - running in parallel; each operation will be allowed to use as much memory - as this value specifies before it starts to write data into temporary - files. Also, several running sessions could be doing such operations - concurrently. Therefore, the total memory used could be many - times the value of <varname>work_mem</varname>; it is necessary to - keep this fact in mind when choosing the value. Sort operations are - used for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>, and - merge joins. + running in parallel; each operation will generally be allowed + to use as much memory as this value specifies before it starts + to write data into temporary files. Also, several running + sessions could be doing such operations concurrently. + Therefore, the total memory used could be many times the value + of <varname>work_mem</varname>; it is necessary to keep this + fact in mind when choosing the value. Sort operations are used + for <literal>ORDER BY</literal>, <literal>DISTINCT</literal>, + and merge joins. Hash tables are used in hash joins, hash-based aggregation, and hash-based processing of <literal>IN</literal> subqueries. </para> + <para> + Hash-based operations are generally more sensitive to memory + availability than equivalent sort-based operations. The + memory available for hash tables is computed by multiplying + <varname>work_mem</varname> by + <varname>hash_mem_multiplier</varname>. This makes it + possible for hash-based operations to use an amount of memory + that exceeds the usual <varname>work_mem</varname> base + amount. + </para> + </listitem> + </varlistentry> + + <varlistentry id="guc-hash-mem-multiplier" xreflabel="hash_mem_multiplier"> + <term><varname>hash_mem_multiplier</varname> (<type>floating point</type>) + <indexterm> + <primary><varname>hash_mem_multiplier</varname> configuration parameter</primary> + </indexterm> + </term> + <listitem> + <para> + Used to compute the maximum amount of memory that hash-based + operations can use. The final limit is determined by + multiplying <varname>work_mem</varname> by + <varname>hash_mem_multiplier</varname>. The default value is + 1.0, which makes hash-based operations subject to the same + simple <varname>work_mem</varname> maximum as sort-based + operations. + </para> + <para> + Consider increasing <varname>hash_mem_multiplier</varname> in + environments where spilling by query operations is a regular + occurrence, especially when simply increasing + <varname>work_mem</varname> results in memory pressure (memory + pressure typically takes the form of intermittent out of + memory errors). A setting of 1.5 or 2.0 may be effective with + mixed workloads. Higher settings in the range of 2.0 - 8.0 or + more may be effective in environments where + <varname>work_mem</varname> has already been increased to 40MB + or more. + </para> </listitem> </varlistentry> diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml index 5e5794bc90d..6e62f54c597 100644 --- a/doc/src/sgml/ref/postgres-ref.sgml +++ b/doc/src/sgml/ref/postgres-ref.sgml @@ -338,10 +338,10 @@ PostgreSQL documentation <term><option>-S</option> <replaceable class="parameter">work-mem</replaceable></term> <listitem> <para> - Specifies the amount of memory to be used by internal sorts and hashes - before resorting to temporary disk files. See the description of the - <varname>work_mem</varname> configuration parameter in <xref - linkend="runtime-config-resource-memory"/>. + Specifies the base amount of memory to be used by sorts and + hash tables before resorting to temporary disk files. See the + description of the <varname>work_mem</varname> configuration + parameter in <xref linkend="runtime-config-resource-memory"/>. </para> </listitem> </varlistentry> diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index e09cb55efcd..c8698898f32 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -1326,10 +1326,12 @@ Out of Memory: Killed process 12345 (postgres). system running out of memory, you can avoid the problem by changing your configuration. In some cases, it may help to lower memory-related configuration parameters, particularly - <link linkend="guc-shared-buffers"><varname>shared_buffers</varname></link> - and <link linkend="guc-work-mem"><varname>work_mem</varname></link>. In - other cases, the problem may be caused by allowing too many connections - to the database server itself. In many cases, it may be better to reduce + <link linkend="guc-shared-buffers"><varname>shared_buffers</varname></link>, + <link linkend="guc-work-mem"><varname>work_mem</varname></link>, and + <link linkend="guc-hash-mem-multiplier"><varname>hash_mem_multiplier</varname></link>. + In other cases, the problem may be caused by allowing too many + connections to the database server itself. In many cases, it may + be better to reduce <link linkend="guc-max-connections"><varname>max_connections</varname></link> and instead make use of external connection-pooling software. </para> |