aboutsummaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2013-04-09 01:05:55 -0400
committerTom Lane <tgl@sss.pgh.pa.us>2013-04-09 01:06:54 -0400
commit3ccae48f44d993351e1f881761bd6c556ebd6638 (patch)
treeec365a856f56aae32745644164d0a8b5889ee5fd /doc/src
parente60d20a35e436cef3c454bfeab34d8ea71b54910 (diff)
downloadpostgresql-3ccae48f44d993351e1f881761bd6c556ebd6638.tar.gz
postgresql-3ccae48f44d993351e1f881761bd6c556ebd6638.zip
Support indexing of regular-expression searches in contrib/pg_trgm.
This works by extracting trigrams from the given regular expression, in generally the same spirit as the previously-existing support for LIKE searches, though of course the details are far more complicated. Currently, only GIN indexes are supported. We might be able to make it work with GiST indexes later. The implementation includes adding API functions to backend/regex/ to provide a view of the search NFA created from a regular expression. These functions are meant to be generic enough to be supportable in a standalone version of the regex library, should that ever happen. Alexander Korotkov, reviewed by Heikki Linnakangas and Tom Lane
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/pgtrgm.sgml43
1 files changed, 38 insertions, 5 deletions
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 30e535557c3..4572750f4d7 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -9,7 +9,7 @@
<para>
The <filename>pg_trgm</filename> module provides functions and operators
- for determining the similarity of <acronym>ASCII</>
+ for determining the similarity of
alphanumeric text based on trigram matching, as
well as index operator classes that support fast searching for similar
strings.
@@ -28,7 +28,9 @@
<note>
<para>
- A string is considered to have two spaces
+ <filename>pg_trgm</filename> ignores non-word characters
+ (non-alphanumerics) when extracting trigrams from a string.
+ Each word is considered to have two spaces
prefixed and one space suffixed when determining the set
of trigrams contained in the string.
For example, the set of trigrams in the string
@@ -37,6 +39,16 @@
<quote><literal> ca</literal></quote>,
<quote><literal>cat</literal></quote>, and
<quote><literal>at </literal></quote>.
+ The set of trigrams in the string
+ <quote><literal>foo|bar</literal></quote> is
+ <quote><literal> f</literal></quote>,
+ <quote><literal> fo</literal></quote>,
+ <quote><literal>foo</literal></quote>,
+ <quote><literal>oo </literal></quote>,
+ <quote><literal> b</literal></quote>,
+ <quote><literal> ba</literal></quote>,
+ <quote><literal>bar</literal></quote>, and
+ <quote><literal>ar </literal></quote>.
</para>
</note>
</sect2>
@@ -145,9 +157,10 @@
operator classes that allow you to create an index over a text column for
the purpose of very fast similarity searches. These index types support
the above-described similarity operators, and additionally support
- trigram-based index searches for <literal>LIKE</> and <literal>ILIKE</>
- queries. (These indexes do not support equality nor simple comparison
- operators, so you may need a regular B-tree index too.)
+ trigram-based index searches for <literal>LIKE</>, <literal>ILIKE</>,
+ <literal>~</> and <literal>~*</> queries. (These indexes do not
+ support equality nor simple comparison operators, so you may need a
+ regular B-tree index too.)
</para>
<para>
@@ -203,6 +216,26 @@ SELECT * FROM test_trgm WHERE t LIKE '%foo%bar';
</para>
<para>
+ Beginning in <productname>PostgreSQL</> 9.3, <filename>pg_trgm</filename>
+ GIN indexes also support index searches for regular-expression matches
+ (<literal>~</> and <literal>~*</> operators), for example
+<programlisting>
+SELECT * FROM test_trgm WHERE t ~ '(foo|bar)';
+</programlisting>
+ The index search works by extracting trigrams from the regular expression
+ and then looking these up in the index. The more trigrams that can be
+ extracted from the regular expression, the more effective the index search
+ is. Unlike B-tree based searches, the search string need not be
+ left-anchored.
+ </para>
+
+ <para>
+ For both <literal>LIKE</> and regular-expression searches, keep in mind
+ that a pattern with no extractable trigrams will degenerate to a full-index
+ scan.
+ </para>
+
+ <para>
The choice between GiST and GIN indexing depends on the relative
performance characteristics of GiST and GIN, which are discussed elsewhere.
As a rule of thumb, a GIN index is faster to search than a GiST index, but