diff options
author | Tom Lane <tgl@sss.pgh.pa.us> | 2012-03-03 20:20:19 -0500 |
---|---|---|
committer | Tom Lane <tgl@sss.pgh.pa.us> | 2012-03-03 20:20:57 -0500 |
commit | 0e5e167aaea4ceb355a6e20eec96c4f7d05527ab (patch) | |
tree | 1b1b338461cba27a2d783db13b74d1b7b86b6681 /doc/src | |
parent | 34c978442c55dd13a3a8c6b90fd4380dad02f3da (diff) | |
download | postgresql-0e5e167aaea4ceb355a6e20eec96c4f7d05527ab.tar.gz postgresql-0e5e167aaea4ceb355a6e20eec96c4f7d05527ab.zip |
Collect and use element-frequency statistics for arrays.
This patch improves selectivity estimation for the array <@, &&, and @>
(containment and overlaps) operators. It enables collection of statistics
about individual array element values by ANALYZE, and introduces
operator-specific estimators that use these stats. In addition,
ScalarArrayOpExpr constructs of the forms "const = ANY/ALL (array_column)"
and "const <> ANY/ALL (array_column)" are estimated by treating them as
variants of the containment operators.
Since we still collect scalar-style stats about the array values as a
whole, the pg_stats view is expanded to show both these stats and the
array-style stats in separate columns. This creates an incompatible change
in how stats for tsvector columns are displayed in pg_stats: the stats
about lexemes are now displayed in the array-related columns instead of the
original scalar-related columns.
There are a few loose ends here, notably that it'd be nice to be able to
suppress either the scalar-style stats or the array-element stats for
columns for which they're not useful. But the patch is in good enough
shape to commit for wider testing.
Alexander Korotkov, reviewed by Noah Misch and Nathan Boley
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/catalogs.sgml | 51 |
1 files changed, 40 insertions, 11 deletions
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 180554b8e39..9564e012e66 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -5354,9 +5354,9 @@ Column data values of the appropriate kind for the <replaceable>N</>th <quote>slot</quote>, or null if the slot kind does not store any data values. Each array's element - values are actually of the specific column's data type, so there - is no way to define these columns' type more specifically than - <type>anyarray</>. + values are actually of the specific column's data type, or a related + type such as an array's element type, so there is no way to define + these columns' type more specifically than <type>anyarray</>. </entry> </row> </tbody> @@ -8291,8 +8291,6 @@ <entry> A list of the most common values in the column. (Null if no values seem to be more common than any others.) - For some data types such as <type>tsvector</>, this is a list of - the most common element values rather than values of the type itself. </entry> </row> @@ -8301,12 +8299,9 @@ <entry><type>real[]</type></entry> <entry></entry> <entry> - A list of the frequencies of the most common values or elements, + A list of the frequencies of the most common values, i.e., number of occurrences of each divided by total number of rows. (Null when <structfield>most_common_vals</structfield> is.) - For some data types such as <type>tsvector</>, it can also store some - additional information, making it longer than the - <structfield>most_common_vals</> array. </entry> </row> @@ -8338,13 +8333,47 @@ type does not have a <literal><</> operator.) </entry> </row> + + <row> + <entry><structfield>most_common_elems</structfield></entry> + <entry><type>anyarray</type></entry> + <entry></entry> + <entry> + A list of non-null element values most often appearing within values of + the column. (Null for scalar types.) + </entry> + </row> + + <row> + <entry><structfield>most_common_elem_freqs</structfield></entry> + <entry><type>real[]</type></entry> + <entry></entry> + <entry> + A list of the frequencies of the most common element values, i.e., the + fraction of rows containing at least one instance of the given value. + Two or three additional values follow the per-element frequencies; + these are the minimum and maximum of the preceding per-element + frequencies, and optionally the frequency of null elements. + (Null when <structfield>most_common_elems</structfield> is.) + </entry> + </row> + + <row> + <entry><structfield>elem_count_histogram</structfield></entry> + <entry><type>real[]</type></entry> + <entry></entry> + <entry> + A histogram of the counts of distinct non-null element values within the + values of the column, followed by the average number of distinct + non-null elements. (Null for scalar types.) + </entry> + </row> </tbody> </tgroup> </table> <para> - The maximum number of entries in the <structfield>most_common_vals</> - and <structfield>histogram_bounds</> arrays can be set on a + The maximum number of entries in the array fields can be controlled on a column-by-column basis using the <command>ALTER TABLE SET STATISTICS</> command, or globally by setting the <xref linkend="guc-default-statistics-target"> run-time parameter. |