From be45be9c33a85e72cdaeb9967e9f6d2d00199e09 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Thu, 18 Mar 2021 17:45:38 +0100 Subject: Implement GROUP BY DISTINCT With grouping sets, it's possible that some of the grouping sets are duplicate. This is especially common with CUBE and ROLLUP clauses. For example GROUP BY CUBE (a,b), CUBE (b,c) is equivalent to GROUP BY GROUPING SETS ( (a, b, c), (a, b, c), (a, b, c), (a, b), (a, b), (a, b), (a), (a), (a), (c, a), (c, a), (c, a), (c), (b, c), (b), () ) Some of the grouping sets are calculated multiple times, which is mostly unnecessary. This commit implements a new GROUP BY DISTINCT feature, as defined in the SQL standard, which eliminates the duplicate sets. Author: Vik Fearing Reviewed-by: Erik Rijkers, Georgios Kokolatos, Tomas Vondra Discussion: https://postgr.es/m/bf3805a8-d7d1-ae61-fece-761b7ff41ecc@postgresfriends.org --- doc/src/sgml/queries.sgml | 54 ++++++++++++++++++++++++++++++++++++++++++++ doc/src/sgml/ref/select.sgml | 9 +++++--- 2 files changed, 60 insertions(+), 3 deletions(-) (limited to 'doc/src') diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml index bc0b3cc9fe1..834b83b5098 100644 --- a/doc/src/sgml/queries.sgml +++ b/doc/src/sgml/queries.sgml @@ -1372,6 +1372,55 @@ GROUP BY GROUPING SETS ( + + + ALL + GROUP BY ALL + + + DISTINCT + GROUP BY DISTINCT + + When specifying multiple grouping items together, the final set of grouping + sets might contain duplicates. For example: + +GROUP BY ROLLUP (a, b), ROLLUP (a, c) + + is equivalent to + +GROUP BY GROUPING SETS ( + (a, b, c), + (a, b), + (a, b), + (a, c), + (a), + (a), + (a, c), + (a), + () +) + + If these duplicates are undesirable, they can be removed using the + DISTINCT clause directly on the GROUP BY. + Therefore: + +GROUP BY DISTINCT ROLLUP (a, b), ROLLUP (a, c) + + is equivalent to + +GROUP BY GROUPING SETS ( + (a, b, c), + (a, b), + (a, c), + (a), + () +) + + This is not the same as using SELECT DISTINCT because the output + rows may still contain duplicates. If any of the ungrouped columns contains NULL, + it will be indistinguishable from the NULL used when that same column is grouped. + + The construct (a, b) is normally recognized in expressions as @@ -1560,8 +1609,13 @@ SELECT a "from", b + c AS sum FROM ... <literal>DISTINCT</literal> + + ALL + SELECT ALL + DISTINCT + SELECT DISTINCT diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index ab911055994..9c5cf50ef0c 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -37,7 +37,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ [ AS ] output_name ] [, ...] ] [ FROM from_item [, ...] ] [ WHERE condition ] - [ GROUP BY grouping_element [, ...] ] + [ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ] [ HAVING condition ] [ WINDOW window_name AS ( window_definition ) [, ...] ] [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ] @@ -778,7 +778,7 @@ WHERE condition The optional GROUP BY clause has the general form -GROUP BY grouping_element [, ...] +GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] @@ -802,7 +802,10 @@ GROUP BY grouping_element [, ...] independent grouping sets. The effect of this is equivalent to constructing a UNION ALL between subqueries with the individual grouping sets as their - GROUP BY clauses. For further details on the handling + GROUP BY clauses. The optional DISTINCT + clause removes duplicate sets before processing; it does not + transform the UNION ALL into a UNION DISTINCT. + For further details on the handling of grouping sets see . -- cgit v1.2.3