Add support for multivariate MCV lists

Introduce a third extended statistic type, supported by the CREATE STATISTICS command - MCV lists, a generalization of the statistic already built and used for individual columns. Compared to the already supported types (n-distinct coefficients and functional dependencies), MCV lists are more complex, include column values and allow estimation of much wider range of common clauses (equality and inequality conditions, IS NULL, IS NOT NULL etc.). Similarly to the other types, a new pseudo-type (pg_mcv_list) is used. Author: Tomas Vondra Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
author: Tomas Vondra <tomas.vondra@postgresql.org> 2019-03-27 18:32:18 +0100
committer: Tomas Vondra <tomas.vondra@postgresql.org> 2019-03-27 18:32:18 +0100
commit: 7300a699502fe5432b05fbc75baca534b080bebb (patch)
tree: 2fa5740b9cf8363068e8a575ae569ca172ffb66a /doc/src/sgml/ref/create_statistics.sgml
parent: 333ed246c6f351c4e8fe22c764b97793c4101b00 (diff)
download: postgresql-7300a699502fe5432b05fbc75baca534b080bebb.tar.gz
postgresql-7300a699502fe5432b05fbc75baca534b080bebb.zip
1 files changed, 33 insertions, 2 deletions
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml
index 539f5bded54..ae1d8024a4e 100644
--- a/doc/src/sgml/ref/create_statistics.sgml
+++ b/doc/src/sgml/ref/create_statistics.sgml
@@ -81,9 +81,10 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="parameter">statistics_na
      <para>
       A statistics kind to be computed in this statistics object.
       Currently supported kinds are
-      <literal>ndistinct</literal>, which enables n-distinct statistics, and
+      <literal>ndistinct</literal>, which enables n-distinct statistics,
       <literal>dependencies</literal>, which enables functional
-      dependency statistics.
+      dependency statistics, and <literal>mcv</literal> which enables
+      most-common values lists.
       If this clause is omitted, all supported statistics kinds are
       included in the statistics object.
       For more information, see <xref linkend="planner-stats-extended"/>
@@ -164,6 +165,36 @@ EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0);
    conditions are redundant and does not underestimate the row count.
   </para>
 
+  <para>
+   Create table <structname>t2</structname> with two perfectly correlated columns
+   (containing identical data), and a MCV list on those columns:
+
+<programlisting>
+CREATE TABLE t2 (
+    a   int,
+    b   int
+);
+
+INSERT INTO t2 SELECT mod(i,100), mod(i,100)
+                 FROM generate_series(1,1000000) s(i);
+
+CREATE STATISTICS s2 (mcv) ON (a, b) FROM t2;
+
+ANALYZE t2;
+
+-- valid combination (found in MCV)
+EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1);
+
+-- invalid combination (not found in MCV)
+EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2);
+</programlisting>
+
+   The MCV list gives the planner more detailed information about the
+   specific values that commonly appear in the table, as well as an upper
+   bound on the selectivities of combinations of values that do not appear
+   in the table, allowing it to generate better estimates in both cases.
+  </para>
+
  </refsect1>
 
  <refsect1>
author	Tomas Vondra <tomas.vondra@postgresql.org>	2019-03-27 18:32:18 +0100
committer	Tomas Vondra <tomas.vondra@postgresql.org>	2019-03-27 18:32:18 +0100
commit	7300a699502fe5432b05fbc75baca534b080bebb (patch)
tree	2fa5740b9cf8363068e8a575ae569ca172ffb66a /doc/src/sgml/ref/create_statistics.sgml
parent	333ed246c6f351c4e8fe22c764b97793c4101b00 (diff)
download	postgresql-7300a699502fe5432b05fbc75baca534b080bebb.tar.gz postgresql-7300a699502fe5432b05fbc75baca534b080bebb.zip