aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2013-08-21 13:38:16 -0400
committerTom Lane <tgl@sss.pgh.pa.us>2013-08-21 13:38:34 -0400
commit3454876314f0711894599f56e42ac99082b4e38f (patch)
tree2c85e0a9c4389ce0b61363ee8114b57574ac3811
parent5dcc48c2c76cf4b2b17c8e14fe3e588ae0c8eff3 (diff)
downloadpostgresql-3454876314f0711894599f56e42ac99082b4e38f.tar.gz
postgresql-3454876314f0711894599f56e42ac99082b4e38f.zip
Fix hash table size estimation error in choose_hashed_distinct().
We should account for the per-group hashtable entry overhead when considering whether to use a hash aggregate to implement DISTINCT. The comparable logic in choose_hashed_grouping() gets this right, but I think I omitted it here in the mistaken belief that there would be no overhead if there were no aggregate functions to be evaluated. This can result in more than 2X underestimate of the hash table size, if the tuples being aggregated aren't very wide. Per report from Tomas Vondra. This bug is of long standing, but per discussion we'll only back-patch into 9.3. Changing the estimation behavior in stable branches seems to carry too much risk of destabilizing plan choices for already-tuned applications.
-rw-r--r--src/backend/optimizer/plan/planner.c4
1 files changed, 4 insertions, 0 deletions
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bcc0d451f0d..99284cb6de1 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -2848,7 +2848,11 @@ choose_hashed_distinct(PlannerInfo *root,
* Don't do it if it doesn't look like the hashtable will fit into
* work_mem.
*/
+
+ /* Estimate per-hash-entry space at tuple width... */
hashentrysize = MAXALIGN(path_width) + MAXALIGN(sizeof(MinimalTupleData));
+ /* plus the per-hash-entry overhead */
+ hashentrysize += hash_agg_entry_size(0);
if (hashentrysize * dNumDistinctRows > work_mem * 1024L)
return false;