diff options
author | Tomas Vondra <tomas.vondra@postgresql.org> | 2019-03-27 18:32:18 +0100 |
---|---|---|
committer | Tomas Vondra <tomas.vondra@postgresql.org> | 2019-03-27 18:32:18 +0100 |
commit | 7300a699502fe5432b05fbc75baca534b080bebb (patch) | |
tree | 2fa5740b9cf8363068e8a575ae569ca172ffb66a /doc/src/sgml/ref/create_statistics.sgml | |
parent | 333ed246c6f351c4e8fe22c764b97793c4101b00 (diff) | |
download | postgresql-7300a699502fe5432b05fbc75baca534b080bebb.tar.gz postgresql-7300a699502fe5432b05fbc75baca534b080bebb.zip |
Add support for multivariate MCV lists
Introduce a third extended statistic type, supported by the CREATE
STATISTICS command - MCV lists, a generalization of the statistic
already built and used for individual columns.
Compared to the already supported types (n-distinct coefficients and
functional dependencies), MCV lists are more complex, include column
values and allow estimation of much wider range of common clauses
(equality and inequality conditions, IS NULL, IS NOT NULL etc.).
Similarly to the other types, a new pseudo-type (pg_mcv_list) is used.
Author: Tomas Vondra
Reviewed-by: Dean Rasheed, David Rowley, Mark Dilger, Alvaro Herrera
Discussion: https://postgr.es/m/dfdac334-9cf2-2597-fb27-f0fb3753f435@2ndquadrant.com
Diffstat (limited to 'doc/src/sgml/ref/create_statistics.sgml')
-rw-r--r-- | doc/src/sgml/ref/create_statistics.sgml | 35 |
1 files changed, 33 insertions, 2 deletions
diff --git a/doc/src/sgml/ref/create_statistics.sgml b/doc/src/sgml/ref/create_statistics.sgml index 539f5bded54..ae1d8024a4e 100644 --- a/doc/src/sgml/ref/create_statistics.sgml +++ b/doc/src/sgml/ref/create_statistics.sgml @@ -81,9 +81,10 @@ CREATE STATISTICS [ IF NOT EXISTS ] <replaceable class="parameter">statistics_na <para> A statistics kind to be computed in this statistics object. Currently supported kinds are - <literal>ndistinct</literal>, which enables n-distinct statistics, and + <literal>ndistinct</literal>, which enables n-distinct statistics, <literal>dependencies</literal>, which enables functional - dependency statistics. + dependency statistics, and <literal>mcv</literal> which enables + most-common values lists. If this clause is omitted, all supported statistics kinds are included in the statistics object. For more information, see <xref linkend="planner-stats-extended"/> @@ -164,6 +165,36 @@ EXPLAIN ANALYZE SELECT * FROM t1 WHERE (a = 1) AND (b = 0); conditions are redundant and does not underestimate the row count. </para> + <para> + Create table <structname>t2</structname> with two perfectly correlated columns + (containing identical data), and a MCV list on those columns: + +<programlisting> +CREATE TABLE t2 ( + a int, + b int +); + +INSERT INTO t2 SELECT mod(i,100), mod(i,100) + FROM generate_series(1,1000000) s(i); + +CREATE STATISTICS s2 (mcv) ON (a, b) FROM t2; + +ANALYZE t2; + +-- valid combination (found in MCV) +EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 1); + +-- invalid combination (not found in MCV) +EXPLAIN ANALYZE SELECT * FROM t2 WHERE (a = 1) AND (b = 2); +</programlisting> + + The MCV list gives the planner more detailed information about the + specific values that commonly appear in the table, as well as an upper + bound on the selectivities of combinations of values that do not appear + in the table, allowing it to generate better estimates in both cases. + </para> + </refsect1> <refsect1> |