aboutsummaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/charset.sgml2
-rw-r--r--doc/src/sgml/func.sgml52
2 files changed, 48 insertions, 6 deletions
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index f5e115e8d6e..00e1986849a 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -1197,7 +1197,7 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
to a performance penalty. Note, in particular, that B-tree cannot use
deduplication with indexes that use a nondeterministic collation. Also,
certain operations are not possible with nondeterministic collations,
- such as pattern matching operations. Therefore, they should be used
+ such as some pattern matching operations. Therefore, they should be used
only in cases where they are specifically wanted.
</para>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 13ccbe7d78c..8b81106fa23 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5414,9 +5414,10 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation>
</caution>
<para>
- The pattern matching operators of all three kinds do not support
- nondeterministic collations. If required, apply a different collation to
- the expression to work around this limitation.
+ <function>SIMILAR TO</function> and <acronym>POSIX</acronym>-style regular
+ expressions do not support nondeterministic collations. If required, use
+ <function>LIKE</function> or apply a different collation to the expression
+ to work around this limitation.
</para>
<sect2 id="functions-like">
@@ -5463,6 +5464,46 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation>
</para>
<para>
+ <function>LIKE</function> pattern matching supports nondeterministic
+ collations (see <xref linkend="collation-nondeterministic"/>), such as
+ case-insensitive collations or collations that, say, ignore punctuation.
+ So with a case-insensitive collation, one could have:
+<programlisting>
+'AbC' LIKE 'abc' COLLATE case_insensitive <lineannotation>true</lineannotation>
+'AbC' LIKE 'a%' COLLATE case_insensitive <lineannotation>true</lineannotation>
+</programlisting>
+ With collations that ignore certain characters or in general that consider
+ strings of different lengths equal, the semantics can become a bit more
+ complicated. Consider these examples:
+<programlisting>
+'.foo.' LIKE 'foo' COLLATE ign_punct <lineannotation>true</lineannotation>
+'.foo.' LIKE 'f_o' COLLATE ign_punct <lineannotation>true</lineannotation>
+'.foo.' LIKE '_oo' COLLATE ign_punct <lineannotation>false</lineannotation>
+</programlisting>
+ The way the matching works is that the pattern is partitioned into
+ sequences of wildcards and non-wildcard strings (wildcards being
+ <literal>_</literal> and <literal>%</literal>). For example, the pattern
+ <literal>f_o</literal> is partitioned into <literal>f, _, o</literal>, the
+ pattern <literal>_oo</literal> is partitioned into <literal>_,
+ oo</literal>. The input string matches the pattern if it can be
+ partitioned in such a way that the wildcards match one character or any
+ number of characters respectively and the non-wildcard partitions are
+ equal under the applicable collation. So for example, <literal>'.foo.'
+ LIKE 'f_o' COLLATE ign_punct</literal> is true because one can partition
+ <literal>.foo.</literal> into <literal>.f, o, o.</literal>, and then
+ <literal>'.f' = 'f' COLLATE ign_punct</literal>, <literal>'o'</literal>
+ matches the <literal>_</literal> wildcard, and <literal>'o.' = 'o' COLLATE
+ ign_punct</literal>. But <literal>'.foo.' LIKE '_oo' COLLATE
+ ign_punct</literal> is false because <literal>.foo.</literal> cannot be
+ partitioned in a way that the first character is any character and the
+ rest of the string compares equal to <literal>oo</literal>. (Note that
+ the single-character wildcard always matches exactly one character,
+ independent of the collation. So in this example, the
+ <literal>_</literal> would match <literal>.</literal>, but then the rest
+ of the input string won't match the rest of the pattern.)
+ </para>
+
+ <para>
<function>LIKE</function> pattern matching always covers the entire
string. Therefore, if it's desired to match a sequence anywhere within
a string, the pattern must start and end with a percent sign.
@@ -5503,8 +5544,9 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation>
<para>
The key word <token>ILIKE</token> can be used instead of
- <token>LIKE</token> to make the match case-insensitive according
- to the active locale. This is not in the <acronym>SQL</acronym> standard but is a
+ <token>LIKE</token> to make the match case-insensitive according to the
+ active locale. (But this does not support nondeterministic collations.)
+ This is not in the <acronym>SQL</acronym> standard but is a
<productname>PostgreSQL</productname> extension.
</para>