Use Limit instead of Unique to implement DISTINCT, when possible

When all of the query's DISTINCT pathkeys have been marked as redundant due to EquivalenceClasses existing which contain constants, we can just implement the DISTINCT operation on a query by just limiting the number of returned rows to 1 instead of performing a Unique on all of the matching (duplicate) rows. This applies in cases such as: SELECT DISTINCT col,col2 FROM tab WHERE col = 1 AND col2 = 10; If there are any matching rows, then they must all be {1,10}. There's no point in fetching all of those and running a Unique operator on them to leave only a single row. Here we effectively just find the first row and then stop. We are obviously unable to apply this optimization if either the col = 1 or col2 = 10 were missing from the WHERE clause or if there were any additional columns in the SELECT clause. Such queries are probably not all that common, but detecting when we can apply this optimization amounts to checking if the distinct_pathkeys are NULL, which is very cheap indeed. Nothing is done here to check if the query already has a LIMIT clause. If it does then the plan may end up with 2 Limits nodes. There's no harm in that and it's probably not worth the complexity to unify them into a single Limit node. Author: David Rowley Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvqS0j8RUWRUSgCAXxOqnYjHUXmKwspRj4GzVfOO25ByHA@mail.gmail.com Discussion: https://postgr.es/m/MEYPR01MB7101CD5DA0A07C9DE2B74850A4239@MEYPR01MB7101.ausprd01.prod.outlook.com
author: David Rowley <drowley@postgresql.org> 2022-10-28 23:04:38 +1300
committer: David Rowley <drowley@postgresql.org> 2022-10-28 23:04:38 +1300
commit: 5543677ec90a15c73dab5ed4f0902b3b920f0b87 (patch)
tree: 002a4bc852b2c04d10e48e09cbce0d23e89a7c3d /src/backend/optimizer/plan/planner.c
parent: b1099eca8f38ff5cfaf0901bb91cb6a22f909bc6 (diff)
download: postgresql-5543677ec90a15c73dab5ed4f0902b3b920f0b87.tar.gz
postgresql-5543677ec90a15c73dab5ed4f0902b3b920f0b87.zip
1 files changed, 66 insertions, 11 deletions
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 78a81745348..493a3af0fa3 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4780,11 +4780,46 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 
 			if (pathkeys_contained_in(needed_pathkeys, path->pathkeys))
 			{
-				add_path(distinct_rel, (Path *)
-						 create_upper_unique_path(root, distinct_rel,
-												  path,
-												  list_length(root->distinct_pathkeys),
-												  numDistinctRows));
+				/*
+				 * distinct_pathkeys may have become empty if all of the
+				 * pathkeys were determined to be redundant.  If all of the
+				 * pathkeys are redundant then each DISTINCT target must only
+				 * allow a single value, therefore all resulting tuples must
+				 * be identical (or at least indistinguishable by an equality
+				 * check).  We can uniquify these tuples simply by just taking
+				 * the first tuple.  All we do here is add a path to do "LIMIT
+				 * 1" atop of 'path'.  When doing a DISTINCT ON we may still
+				 * have a non-NIL sort_pathkeys list, so we must still only do
+				 * this with paths which are correctly sorted by
+				 * sort_pathkeys.
+				 */
+				if (root->distinct_pathkeys == NIL)
+				{
+					Node	   *limitCount;
+
+					limitCount = (Node *) makeConst(INT8OID, -1, InvalidOid,
+													sizeof(int64),
+													Int64GetDatum(1), false,
+													FLOAT8PASSBYVAL);
+
+					/*
+					 * If the query already has a LIMIT clause, then we could
+					 * end up with a duplicate LimitPath in the final plan.
+					 * That does not seem worth troubling over too much.
+					 */
+					add_path(distinct_rel, (Path *)
+							 create_limit_path(root, distinct_rel, path, NULL,
+											   limitCount, LIMIT_OPTION_COUNT,
+											   0, 1));
+				}
+				else
+				{
+					add_path(distinct_rel, (Path *)
+							 create_upper_unique_path(root, distinct_rel,
+													  path,
+													  list_length(root->distinct_pathkeys),
+													  numDistinctRows));
+				}
 			}
 		}
 
@@ -4805,13 +4840,33 @@ create_final_distinct_paths(PlannerInfo *root, RelOptInfo *input_rel,
 			path = (Path *) create_sort_path(root, distinct_rel,
 											 path,
 											 needed_pathkeys,
-											 -1.0);
+											 root->distinct_pathkeys == NIL ?
+											 1.0 : -1.0);
 
-		add_path(distinct_rel, (Path *)
-				 create_upper_unique_path(root, distinct_rel,
-										  path,
-										  list_length(root->distinct_pathkeys),
-										  numDistinctRows));
+		/*
+		 * As above, use a LimitPath instead of a UniquePath when all of the
+		 * distinct_pathkeys are redundant and we're only going to get a
+		 * series of tuples all with the same values anyway.
+		 */
+		if (root->distinct_pathkeys == NIL)
+		{
+			Node	   *limitCount = (Node *) makeConst(INT8OID, -1, InvalidOid,
+														sizeof(int64),
+														Int64GetDatum(1), false,
+														FLOAT8PASSBYVAL);
+
+			add_path(distinct_rel, (Path *)
+					 create_limit_path(root, distinct_rel, path, NULL,
+									   limitCount, LIMIT_OPTION_COUNT, 0, 1));
+		}
+		else
+		{
+			add_path(distinct_rel, (Path *)
+					 create_upper_unique_path(root, distinct_rel,
+											  path,
+											  list_length(root->distinct_pathkeys),
+											  numDistinctRows));
+		}
 	}
 
 	/*
author	David Rowley <drowley@postgresql.org>	2022-10-28 23:04:38 +1300
committer	David Rowley <drowley@postgresql.org>	2022-10-28 23:04:38 +1300
commit	5543677ec90a15c73dab5ed4f0902b3b920f0b87 (patch)
tree	002a4bc852b2c04d10e48e09cbce0d23e89a7c3d /src/backend/optimizer/plan/planner.c
parent	b1099eca8f38ff5cfaf0901bb91cb6a22f909bc6 (diff)
download	postgresql-5543677ec90a15c73dab5ed4f0902b3b920f0b87.tar.gz postgresql-5543677ec90a15c73dab5ed4f0902b3b920f0b87.zip