diff options
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/charset.sgml | 2 | ||||
-rw-r--r-- | doc/src/sgml/func.sgml | 52 |
2 files changed, 48 insertions, 6 deletions
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index f5e115e8d6e..00e1986849a 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -1197,7 +1197,7 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr to a performance penalty. Note, in particular, that B-tree cannot use deduplication with indexes that use a nondeterministic collation. Also, certain operations are not possible with nondeterministic collations, - such as pattern matching operations. Therefore, they should be used + such as some pattern matching operations. Therefore, they should be used only in cases where they are specifically wanted. </para> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 13ccbe7d78c..8b81106fa23 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -5414,9 +5414,10 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation> </caution> <para> - The pattern matching operators of all three kinds do not support - nondeterministic collations. If required, apply a different collation to - the expression to work around this limitation. + <function>SIMILAR TO</function> and <acronym>POSIX</acronym>-style regular + expressions do not support nondeterministic collations. If required, use + <function>LIKE</function> or apply a different collation to the expression + to work around this limitation. </para> <sect2 id="functions-like"> @@ -5463,6 +5464,46 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation> </para> <para> + <function>LIKE</function> pattern matching supports nondeterministic + collations (see <xref linkend="collation-nondeterministic"/>), such as + case-insensitive collations or collations that, say, ignore punctuation. + So with a case-insensitive collation, one could have: +<programlisting> +'AbC' LIKE 'abc' COLLATE case_insensitive <lineannotation>true</lineannotation> +'AbC' LIKE 'a%' COLLATE case_insensitive <lineannotation>true</lineannotation> +</programlisting> + With collations that ignore certain characters or in general that consider + strings of different lengths equal, the semantics can become a bit more + complicated. Consider these examples: +<programlisting> +'.foo.' LIKE 'foo' COLLATE ign_punct <lineannotation>true</lineannotation> +'.foo.' LIKE 'f_o' COLLATE ign_punct <lineannotation>true</lineannotation> +'.foo.' LIKE '_oo' COLLATE ign_punct <lineannotation>false</lineannotation> +</programlisting> + The way the matching works is that the pattern is partitioned into + sequences of wildcards and non-wildcard strings (wildcards being + <literal>_</literal> and <literal>%</literal>). For example, the pattern + <literal>f_o</literal> is partitioned into <literal>f, _, o</literal>, the + pattern <literal>_oo</literal> is partitioned into <literal>_, + oo</literal>. The input string matches the pattern if it can be + partitioned in such a way that the wildcards match one character or any + number of characters respectively and the non-wildcard partitions are + equal under the applicable collation. So for example, <literal>'.foo.' + LIKE 'f_o' COLLATE ign_punct</literal> is true because one can partition + <literal>.foo.</literal> into <literal>.f, o, o.</literal>, and then + <literal>'.f' = 'f' COLLATE ign_punct</literal>, <literal>'o'</literal> + matches the <literal>_</literal> wildcard, and <literal>'o.' = 'o' COLLATE + ign_punct</literal>. But <literal>'.foo.' LIKE '_oo' COLLATE + ign_punct</literal> is false because <literal>.foo.</literal> cannot be + partitioned in a way that the first character is any character and the + rest of the string compares equal to <literal>oo</literal>. (Note that + the single-character wildcard always matches exactly one character, + independent of the collation. So in this example, the + <literal>_</literal> would match <literal>.</literal>, but then the rest + of the input string won't match the rest of the pattern.) + </para> + + <para> <function>LIKE</function> pattern matching always covers the entire string. Therefore, if it's desired to match a sequence anywhere within a string, the pattern must start and end with a percent sign. @@ -5503,8 +5544,9 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation> <para> The key word <token>ILIKE</token> can be used instead of - <token>LIKE</token> to make the match case-insensitive according - to the active locale. This is not in the <acronym>SQL</acronym> standard but is a + <token>LIKE</token> to make the match case-insensitive according to the + active locale. (But this does not support nondeterministic collations.) + This is not in the <acronym>SQL</acronym> standard but is a <productname>PostgreSQL</productname> extension. </para> |