From 77b88cd1bb9041a735f24072150cacfa06c699a3 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Fri, 26 Mar 2021 13:35:29 +0100 Subject: BRIN bloom indexes Adds a BRIN opclass using a Bloom filter to summarize the range. Indexes using the new opclasses allow only equality queries (similar to hash indexes), but that works fine for data like UUID, MAC addresses etc. for which range queries are not very common. This also means the indexes work for data that is not well correlated to physical location within the table, or perhaps even entirely random (which is a common issue with existing BRIN minmax opclasses). It's possible to specify opclass parameters with the usual Bloom filter parameters, i.e. the desired false-positive rate and the expected number of distinct values per page range. CREATE TABLE t (a int); CREATE INDEX ON t USING brin (a int4_bloom_ops(false_positive_rate = 0.05, n_distinct_per_range = 100)); The opclasses do not operate on the indexed values directly, but compute a 32-bit hash first, and the Bloom filter is built on the hash value. Collisions should not be a huge issue though, as the number of distinct values in a page ranges is usually fairly small. Bump catversion, due to various catalog changes. Author: Tomas Vondra Reviewed-by: Alvaro Herrera Reviewed-by: Alexander Korotkov Reviewed-by: Sokolov Yura Reviewed-by: Nico Williams Reviewed-by: John Naylor Discussion: https://postgr.es/m/c1138ead-7668-f0e1-0638-c3be3237e812@2ndquadrant.com Discussion: https://postgr.es/m/5d78b774-7e9c-c94e-12cf-fef51cc89b1a%402ndquadrant.com --- doc/src/sgml/brin.sgml | 226 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 225 insertions(+), 1 deletion(-) (limited to 'doc/src') diff --git a/doc/src/sgml/brin.sgml b/doc/src/sgml/brin.sgml index 4f15081674b..9524ef55d34 100644 --- a/doc/src/sgml/brin.sgml +++ b/doc/src/sgml/brin.sgml @@ -115,7 +115,8 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was operator classes store the minimum and the maximum values appearing in the indexed column within the range. The inclusion operator classes store a value which includes the values in the indexed - column within the range. + column within the range. The bloom operator + classes build a Bloom filter for all values in the range. @@ -154,6 +155,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was |&> (box,box)|>> (box,box) + + bpchar_bloom_ops + = (character,character) + + bpchar_minmax_ops = (character,character) @@ -163,6 +169,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (character,character) >= (character,character) + + bytea_bloom_ops + = (bytea,bytea) + + bytea_minmax_ops = (bytea,bytea) @@ -172,6 +183,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (bytea,bytea) >= (bytea,bytea) + + char_bloom_ops + = ("char","char") + + char_minmax_ops = ("char","char") @@ -181,6 +197,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > ("char","char") >= ("char","char") + + date_bloom_ops + = (date,date) + + date_minmax_ops = (date,date) @@ -190,6 +211,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (date,date) >= (date,date) + + float4_bloom_ops + = (float4,float4) + + float4_minmax_ops = (float4,float4) @@ -199,6 +225,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was <= (float4,float4) >= (float4,float4) + + float8_bloom_ops + = (float8,float8) + + float8_minmax_ops = (float8,float8) @@ -218,6 +249,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was = (inet,inet) && (inet,inet) + + inet_bloom_ops + = (inet,inet) + + inet_minmax_ops = (inet,inet) @@ -227,6 +263,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (inet,inet) >= (inet,inet) + + int2_bloom_ops + = (int2,int2) + + int2_minmax_ops = (int2,int2) @@ -236,6 +277,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was <= (int2,int2) >= (int2,int2) + + int4_bloom_ops + = (int4,int4) + + int4_minmax_ops = (int4,int4) @@ -245,6 +291,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was <= (int4,int4) >= (int4,int4) + + int8_bloom_ops + = (bigint,bigint) + + int8_minmax_ops = (bigint,bigint) @@ -254,6 +305,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was <= (bigint,bigint) >= (bigint,bigint) + + interval_bloom_ops + = (interval,interval) + + interval_minmax_ops = (interval,interval) @@ -263,6 +319,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (interval,interval) >= (interval,interval) + + macaddr_bloom_ops + = (macaddr,macaddr) + + macaddr_minmax_ops = (macaddr,macaddr) @@ -272,6 +333,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (macaddr,macaddr) >= (macaddr,macaddr) + + macaddr8_bloom_ops + = (macaddr8,macaddr8) + + macaddr8_minmax_ops = (macaddr8,macaddr8) @@ -281,6 +347,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (macaddr8,macaddr8) >= (macaddr8,macaddr8) + + name_bloom_ops + = (name,name) + + name_minmax_ops = (name,name) @@ -290,6 +361,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (name,name) >= (name,name) + + numeric_bloom_ops + = (numeric,numeric) + + numeric_minmax_ops = (numeric,numeric) @@ -299,6 +375,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (numeric,numeric) >= (numeric,numeric) + + oid_bloom_ops + = (oid,oid) + + oid_minmax_ops = (oid,oid) @@ -308,6 +389,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was <= (oid,oid) >= (oid,oid) + + pg_lsn_bloom_ops + = (pg_lsn,pg_lsn) + + pg_lsn_minmax_ops = (pg_lsn,pg_lsn) @@ -335,6 +421,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was &> (anyrange,anyrange) -|- (anyrange,anyrange) + + text_bloom_ops + = (text,text) + + text_minmax_ops = (text,text) @@ -344,6 +435,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (text,text) >= (text,text) + + tid_bloom_ops + = (tid,tid) + + tid_minmax_ops = (tid,tid) @@ -353,6 +449,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was <= (tid,tid) >= (tid,tid) + + timestamp_bloom_ops + = (timestamp,timestamp) + + timestamp_minmax_ops = (timestamp,timestamp) @@ -362,6 +463,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (timestamp,timestamp) >= (timestamp,timestamp) + + timestamptz_bloom_ops + = (timestamptz,timestamptz) + + timestamptz_minmax_ops = (timestamptz,timestamptz) @@ -371,6 +477,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (timestamptz,timestamptz) >= (timestamptz,timestamptz) + + time_bloom_ops + = (time,time) + + time_minmax_ops = (time,time) @@ -380,6 +491,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (time,time) >= (time,time) + + timetz_bloom_ops + = (timetz,timetz) + + timetz_minmax_ops = (timetz,timetz) @@ -389,6 +505,11 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was > (timetz,timetz) >= (timetz,timetz) + + uuid_bloom_ops + = (uuid,uuid) + + uuid_minmax_ops = (uuid,uuid) @@ -409,6 +530,55 @@ LOG: request for BRIN range summarization for index "brin_wi_idx" page 128 was
+ + + Operator Class Parameters + + + Some of the built-in operator classes allow specifying parameters affecting + behavior of the operator class. Each operator class has its own set of + allowed parameters. Only the bloom operator class + allows specifying parameters: + + + + bloom operator classes accept these parameters: + + + + + n_distinct_per_range + + + Defines the estimated number of distinct non-null values in the block + range, used by BRIN bloom indexes for sizing of the + Bloom filter. It behaves similarly to n_distinct option + for . When set to a positive value, + each block range is assumed to contain this number of distinct non-null + values. When set to a negative value, which must be greater than or + equal to -1, the number of distinct non-null is assumed linear with + the maximum possible number of tuples in the block range (about 290 + rows per block). The default value is -0.1, and + the minimum number of distinct non-null values is 16. + + + + + + false_positive_rate + + + Defines the desired false positive rate used by BRIN + bloom indexes for sizing of the Bloom filter. The values must be + between 0.0001 and 0.25. The default value is 0.01, which is 1% false + positive rate. + + + + + + + @@ -794,6 +964,60 @@ typedef struct BrinOpcInfo function can improve index performance. + + To write an operator class for a data type that implements only an equality + operator and supports hashing, it is possible to use the bloom support procedures + alongside the corresponding operators, as shown in + . + All operator class members (procedures and operators) are mandatory. + + + + Procedure and Support Numbers for Bloom Operator Classes + + + + Operator class member + Object + + + + + Support Procedure 1 + internal function brin_bloom_opcinfo() + + + Support Procedure 2 + internal function brin_bloom_add_value() + + + Support Procedure 3 + internal function brin_bloom_consistent() + + + Support Procedure 4 + internal function brin_bloom_union() + + + Support Procedure 11 + function to compute hash of an element + + + Operator Strategy 1 + operator equal-to + + + +
+ + + Support procedure numbers 1-10 are reserved for the BRIN internal + functions, so the SQL level functions start with number 11. Support + function number 11 is the main function required to build the index. + It should accept one argument with the same data type as the operator class, + and return a hash of the value. + + Both minmax and inclusion operator classes support cross-data-type operators, though with these the dependencies become more complicated. -- cgit v1.2.3