aboutsummaryrefslogtreecommitdiff
path: root/doc/src
diff options
context:
space:
mode:
authorTom Lane <tgl@sss.pgh.pa.us>2019-09-07 14:21:59 -0400
committerTom Lane <tgl@sss.pgh.pa.us>2019-09-07 14:21:59 -0400
commitca70bdaefea5188066b3c2a6eaaaa1cb8cb8ce06 (patch)
tree256506db68ed8fb1dce26496d7cebfbdfecc4f1c /doc/src
parentc5bc7050aff1d1bba6532377fe37b351581661a8 (diff)
downloadpostgresql-ca70bdaefea5188066b3c2a6eaaaa1cb8cb8ce06.tar.gz
postgresql-ca70bdaefea5188066b3c2a6eaaaa1cb8cb8ce06.zip
Fix issues around strictness of SIMILAR TO.
As a result of some long-ago quick hacks, the SIMILAR TO operator and the corresponding flavor of substring() interpreted "ESCAPE NULL" as selecting the default escape character '\'. This is both surprising and not per spec: the standard is clear that these functions should return NULL for NULL input. Additionally, because of inconsistency of the strictness markings of 3-argument substring() and similar_escape(), the planner could not inline the SQL definition of substring(), resulting in a substantial performance penalty compared to the underlying POSIX substring() function. The simplest fix for this would be to change the strictness marking of similar_escape(), but if we do that we risk breaking existing views that depend on that function. Hence, leave similar_escape() as-is as a compatibility function, and instead invent a new function similar_to_escape() that comes in two strict variants. There are a couple of other behaviors in this area that are also not per spec, but they are documented and seem generally at least as sane as the spec's definition, so leave them alone. But improve the documentation to describe them fully. Patch by me; thanks to Álvaro Herrera and Andrew Gierth for review and discussion. Discussion: https://postgr.es/m/14047.1557708214@sss.pgh.pa.us
Diffstat (limited to 'doc/src')
-rw-r--r--doc/src/sgml/func.sgml50
1 files changed, 39 insertions, 11 deletions
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c878a0ba4de..f2e545ed87f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4122,6 +4122,14 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation>
</para>
<para>
+ According to the SQL standard, omitting <literal>ESCAPE</literal>
+ means there is no escape character (rather than defaulting to a
+ backslash), and a zero-length <literal>ESCAPE</literal> value is
+ disallowed. <productname>PostgreSQL</productname>'s behavior in
+ this regard is therefore slightly nonstandard.
+ </para>
+
+ <para>
The key word <token>ILIKE</token> can be used instead of
<token>LIKE</token> to make the match case-insensitive according
to the active locale. This is not in the <acronym>SQL</acronym> standard but is a
@@ -4139,9 +4147,9 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation>
</para>
<para>
- There is also the prefix operator <literal>^@</literal> and corresponding
- <function>starts_with</function> function which covers cases when only
- searching by beginning of the string is needed.
+ Also see the prefix operator <literal>^@</literal> and corresponding
+ <function>starts_with</function> function, which are useful in cases
+ where simply matching the beginning of a string is needed.
</para>
</sect2>
@@ -4172,7 +4180,7 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation>
It is similar to <function>LIKE</function>, except that it
interprets the pattern using the SQL standard's definition of a
regular expression. SQL regular expressions are a curious cross
- between <function>LIKE</function> notation and common regular
+ between <function>LIKE</function> notation and common (POSIX) regular
expression notation.
</para>
@@ -4256,18 +4264,38 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation>
</para>
<para>
- As with <function>LIKE</function>, a backslash disables the special meaning
- of any of these metacharacters; or a different escape character can
- be specified with <literal>ESCAPE</literal>.
+ As with <function>LIKE</function>, a backslash disables the special
+ meaning of any of these metacharacters. A different escape character
+ can be specified with <literal>ESCAPE</literal>, or the escape
+ capability can be disabled by writing <literal>ESCAPE ''</literal>.
+ </para>
+
+ <para>
+ According to the SQL standard, omitting <literal>ESCAPE</literal>
+ means there is no escape character (rather than defaulting to a
+ backslash), and a zero-length <literal>ESCAPE</literal> value is
+ disallowed. <productname>PostgreSQL</productname>'s behavior in
+ this regard is therefore slightly nonstandard.
+ </para>
+
+ <para>
+ Another nonstandard extension is that following the escape character
+ with a letter or digit provides access to the escape sequences
+ defined for POSIX regular expressions; see
+ <xref linkend="posix-character-entry-escapes-table"/>,
+ <xref linkend="posix-class-shorthand-escapes-table"/>, and
+ <xref linkend="posix-constraint-escapes-table"/> below.
</para>
<para>
Some examples:
<programlisting>
-'abc' SIMILAR TO 'abc' <lineannotation>true</lineannotation>
-'abc' SIMILAR TO 'a' <lineannotation>false</lineannotation>
-'abc' SIMILAR TO '%(b|d)%' <lineannotation>true</lineannotation>
-'abc' SIMILAR TO '(b|c)%' <lineannotation>false</lineannotation>
+'abc' SIMILAR TO 'abc' <lineannotation>true</lineannotation>
+'abc' SIMILAR TO 'a' <lineannotation>false</lineannotation>
+'abc' SIMILAR TO '%(b|d)%' <lineannotation>true</lineannotation>
+'abc' SIMILAR TO '(b|c)%' <lineannotation>false</lineannotation>
+'-abc-' SIMILAR TO '%\mabc\M%' <lineannotation>true</lineannotation>
+'xabcy' SIMILAR TO '%\mabc\M%' <lineannotation>false</lineannotation>
</programlisting>
</para>