aboutsummaryrefslogtreecommitdiff
path: root/src/backend/utils/cache
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2018-12-14 12:52:49 -0500
committerTom Lane <tgl@sss.pgh.pa.us>2018-12-14 12:52:49 -0500
commit5e09280057a4c3f5db297348ea3e044c9c5f4ef8 (patch)
treea153ceede13d3b807d48d420896b6763d44c9086 /src/backend/utils/cache
parent8fb569e978af3995f0dd6b0033758ec571aab0c1 (diff)
downloadpostgresql-5e09280057a4c3f5db297348ea3e044c9c5f4ef8.tar.gz
postgresql-5e09280057a4c3f5db297348ea3e044c9c5f4ef8.zip
Make pg_statistic and related code account more honestly for collations.
When we first put in collations support, we basically punted on teaching pg_statistic, ANALYZE, and the planner selectivity functions about that. They've just used DEFAULT_COLLATION_OID independently of the actual collation of the data. It's time to improve that, so: * Add columns to pg_statistic that record the specific collation associated with each statistics slot. * Teach ANALYZE to use the column's actual collation when comparing values for statistical purposes, and record this in the appropriate slot. (Note that type-specific typanalyze functions are now expected to fill stats->stacoll with the appropriate collation, too.) * Teach assorted selectivity functions to use the actual collation of the stats they are looking at, instead of just assuming it's DEFAULT_COLLATION_OID. This should give noticeably better results in selectivity estimates for columns with nondefault collations, at least for query clauses that use that same collation (which would be the default behavior in most cases). It's still true that comparisons with explicit COLLATE clauses different from the stored data's collation won't be well-estimated, but that's no worse than before. Also, this patch does make the first step towards doing better with that, which is that it's now theoretically possible to collect stats for a collation other than the column's own collation. Patch by me; thanks to Peter Eisentraut for review. Discussion: https://postgr.es/m/14706.1544630227@sss.pgh.pa.us
Diffstat (limited to 'src/backend/utils/cache')
-rw-r--r--src/backend/utils/cache/lsyscache.c19
-rw-r--r--src/backend/utils/cache/typcache.c1
2 files changed, 20 insertions, 0 deletions
diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c
index 7a263cc1fdc..33b5b1649c2 100644
--- a/src/backend/utils/cache/lsyscache.c
+++ b/src/backend/utils/cache/lsyscache.c
@@ -2881,6 +2881,7 @@ get_attavgwidth(Oid relid, AttrNumber attnum)
*
* If a matching slot is found, true is returned, and *sslot is filled thus:
* staop: receives the actual STAOP value.
+ * stacoll: receives the actual STACOLL value.
* valuetype: receives actual datatype of the elements of stavalues.
* values: receives pointer to an array of the slot's stavalues.
* nvalues: receives number of stavalues.
@@ -2893,6 +2894,10 @@ get_attavgwidth(Oid relid, AttrNumber attnum)
*
* If no matching slot is found, false is returned, and *sslot is zeroed.
*
+ * Note that the current API doesn't allow for searching for a slot with
+ * a particular collation. If we ever actually support recording more than
+ * one collation, we'll have to extend the API, but for now simple is good.
+ *
* The data referred to by the fields of sslot is locally palloc'd and
* is independent of the original pg_statistic tuple. When the caller
* is done with it, call free_attstatsslot to release the palloc'd data.
@@ -2927,6 +2932,20 @@ get_attstatsslot(AttStatsSlot *sslot, HeapTuple statstuple,
return false; /* not there */
sslot->staop = (&stats->staop1)[i];
+ sslot->stacoll = (&stats->stacoll1)[i];
+
+ /*
+ * XXX Hopefully-temporary hack: if stacoll isn't set, inject the default
+ * collation. This won't matter for non-collation-aware datatypes. For
+ * those that are, this covers cases where stacoll has not been set. In
+ * the short term we need this because some code paths involving type NAME
+ * do not pass any collation to prefix_selectivity and related functions.
+ * Even when that's been fixed, it's likely that some add-on typanalyze
+ * functions won't get the word right away about filling stacoll during
+ * ANALYZE, so we'll probably need this for awhile.
+ */
+ if (sslot->stacoll == InvalidOid)
+ sslot->stacoll = DEFAULT_COLLATION_OID;
if (flags & ATTSTATSSLOT_VALUES)
{
diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
index 1a96cc9b98f..c540a39c15d 100644
--- a/src/backend/utils/cache/typcache.c
+++ b/src/backend/utils/cache/typcache.c
@@ -388,6 +388,7 @@ lookup_type_cache(Oid type_id, int flags)
typentry->typtype = typtup->typtype;
typentry->typrelid = typtup->typrelid;
typentry->typelem = typtup->typelem;
+ typentry->typcollation = typtup->typcollation;
/* If it's a domain, immediately thread it into the domain cache list */
if (typentry->typtype == TYPTYPE_DOMAIN)