postgresql - postgresql mirror

diff options

author	Tomas Vondra <tomas.vondra@postgresql.org>	2025-02-19 20:29:26 +0100
committer	Tomas Vondra <tomas.vondra@postgresql.org>	2025-02-19 21:08:20 +0100
commit	a1b4f289beec19534e5e34e1a962c5828b306ce2 (patch)
tree	6069fb5bdf39c9b1185a9cd9f8bcdc5955c19e28 /src/backend/executor/execIndexing.c
parent	8b886a4e34886bf2d1dd4f5e7d95f4ddadc3afa8 (diff)
download	postgresql-a1b4f289beec19534e5e34e1a962c5828b306ce2.tar.gz postgresql-a1b4f289beec19534e5e34e1a962c5828b306ce2.zip

Consider BufFiles when adjusting hashjoin parameters

Until now ExecChooseHashTableSize() considered only the size of the in-memory hash table, and ignored the memory needed for the batch files. Which can be a significant amount, because each batch needs two BufFiles (each with a BLCKSZ buffer). The same issue applies to increasing the number of batches during execution. It's also possible to trigger a "batch explosion", e.g. due to duplicate values or skew. We've seen reports of joins with hundreds of thousands (or even millions) of batches, consuming gigabytes of memory, triggering OOM errors. These cases may be fairly rare, but it's clearly possible to hit them. These issues can't be prevented during planning. Even if we improve that, it does not help with execution-time batch explosion. We can however reduce the impact and use as little memory as possible. This patch improves the behavior by adjusting how the memory is divided between the hash table and batch files. It may be better to use fewer batch files, even if it means the hash table will exceed the limit. The capacity of the hash node may be increased either by doubling he number of batches, or doubling the size of the in-memory hash table. The outcome is the same, but the memory usage may be very different. For low nbatch values it's better to add batches, for high nbatch values it's better to allow a larger hash table. The patch considers both options, both during the initial sizing and then during execution, to minimize how much the limit gets exceeded. It might seem this patch is relaxing the memory limit - allowing it to be exceeded. But that's not really the case. It has always been like that, except the memory used by batches was ignored. Allowing the hash table to grow may also prevent the batch explosion. If there's a large batch that can't be split (due to hash collisions or duplicate values), at some point the memory limit will increase enough for the batch to fit into the hash table. This patch was in the works for a long time. The early versions were posted in 2019, and revived every year or two when we happened to get the next report of OOM due to a hashjoin batch explosion. Each of those patch versions were reviewed by a couple people. I'm mentioning only Melanie Plageman and Robert Haas, because they reviewed the last version, and the older patches are very different. Reviewed-by: Melanie Plageman, Robert Haas Discussion: https://postgr.es/m/7bed6c08-72a0-4ab9-a79c-e01fcdd0940f@vondra.me Discussion: https://postgr.es/m/20190504003414.bulcbnge3rhwhcsh%40development Discussion: https://postgr.es/m/20190428141901.5dsbge2ka3rxmpk6%40development

Diffstat (limited to 'src/backend/executor/execIndexing.c')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: