Fix ndistinct estimates with system attributes

When estimating the number of groups using extended statistics, the code was discarding information about system attributes. This led to strange situation that SELECT 1 FROM t GROUP BY ctid; could have produced higher estimate (equal to pg_class.reltuples) than SELECT 1 FROM t GROUP BY a, b, ctid; with extended statistics on (a,b). Fixed by retaining information about the system attribute. Backpatch all the way to 10, where extended statistics were introduced. Author: Tomas Vondra Backpatch-through: 10
author: Tomas Vondra <tomas.vondra@postgresql.org> 2021-03-26 22:34:53 +0100
committer: Tomas Vondra <tomas.vondra@postgresql.org> 2021-03-26 22:34:58 +0100
commit: 33e52ad9a32929a6d14dfd98a8440d57028f2e3e (patch)
tree: fbf8df234e380eee684ed792a7e8ef9b1082bec4 /src
parent: a14a0118a1fecf4066e53af52ed0f188607d0c4b (diff)
download: postgresql-33e52ad9a32929a6d14dfd98a8440d57028f2e3e.tar.gz
postgresql-33e52ad9a32929a6d14dfd98a8440d57028f2e3e.zip
2 files changed, 4 insertions, 4 deletions
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 52314d3aa1c..2348d4a772a 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3987,11 +3987,11 @@ estimate_multivariate_ndistinct(PlannerInfo *root, RelOptInfo *rel,
 
 			attnum = ((Var *) varinfo->var)->varattno;
 
-			if (!AttrNumberIsForUserDefinedAttr(attnum))
+			if (AttrNumberIsForUserDefinedAttr(attnum) &&
+				bms_is_member(attnum, matched))
 				continue;
 
-			if (!bms_is_member(attnum, matched))
-				newlist = lappend(newlist, varinfo);
+			newlist = lappend(newlist, varinfo);
 		}
 
 		*varinfos = newlist;
diff --git a/src/test/regress/expected/stats_ext.out b/src/test/regress/expected/stats_ext.out
index 431b3fa3de1..d80e6a3907c 100644
--- a/src/test/regress/expected/stats_ext.out
+++ b/src/test/regress/expected/stats_ext.out
@@ -260,7 +260,7 @@ SELECT s.stxkind, d.stxdndistinct
 SELECT * FROM check_estimated_rows('SELECT COUNT(*) FROM ndistinct GROUP BY ctid, a, b');
  estimated | actual 
 -----------+--------
-        11 |   1000
+      1000 |   1000
 (1 row)
 
 -- Hash Aggregate, thanks to estimates improved by the statistic
author	Tomas Vondra <tomas.vondra@postgresql.org>	2021-03-26 22:34:53 +0100
committer	Tomas Vondra <tomas.vondra@postgresql.org>	2021-03-26 22:34:58 +0100
commit	33e52ad9a32929a6d14dfd98a8440d57028f2e3e (patch)
tree	fbf8df234e380eee684ed792a7e8ef9b1082bec4 /src
parent	a14a0118a1fecf4066e53af52ed0f188607d0c4b (diff)
download	postgresql-33e52ad9a32929a6d14dfd98a8440d57028f2e3e.tar.gz postgresql-33e52ad9a32929a6d14dfd98a8440d57028f2e3e.zip