Allow empty replacement strings in contrib/unaccent.

This is useful in languages where diacritic signs are represented as separate characters; it's also one step towards letting unaccent be used for arbitrary substring substitutions. In passing, improve the user documentation for unaccent, which was sadly vague about some important details. Mohammad Alhashash, reviewed by Abhijit Menon-Sen
author: Tom Lane <tgl@sss.pgh.pa.us> 2014-06-30 20:51:26 -0400
committer: Tom Lane <tgl@sss.pgh.pa.us> 2014-06-30 20:51:30 -0400
commit: 97c40ce61465582b96944e41ed6ec06c2016b95c (patch)
tree: 16f8fc36e2d2ae810f2e5ecba457b69826944298 /doc/src
parent: 55863274d98556acf57013f64f545d9a1e640bba (diff)
download: postgresql-97c40ce61465582b96944e41ed6ec06c2016b95c.tar.gz
postgresql-97c40ce61465582b96944e41ed6ec06c2016b95c.zip
1 files changed, 31 insertions, 5 deletions
diff --git a/doc/src/sgml/unaccent.sgml b/doc/src/sgml/unaccent.sgml
index af9cad5d8c7..aef0031dcbc 100644
--- a/doc/src/sgml/unaccent.sgml
+++ b/doc/src/sgml/unaccent.sgml
@@ -45,9 +45,9 @@
   <itemizedlist>
    <listitem>
     <para>
-     Each line represents a pair, consisting of a character with accent
-     followed by a character without accent.  The first is translated into
-     the second.  For example,
+     Each line represents one translation rule, consisting of a character with
+     accent followed by a character without accent.  The first is translated
+     into the second.  For example,
 <programlisting>
 &Agrave;        A
 &Aacute;        A
@@ -57,6 +57,27 @@
 &Aring;        A
 &AElig;        A
 </programlisting>
+     The two characters must be separated by whitespace, and any leading or
+     trailing whitespace on a line is ignored.
+    </para>
+   </listitem>
+
+   <listitem>
+    <para>
+     Alternatively, if only one character is given on a line, instances of
+     that character are deleted; this is useful in languages where accents
+     are represented by separate characters.
+    </para>
+   </listitem>
+
+   <listitem>
+    <para>
+     As with other <productname>PostgreSQL</> text search configuration files,
+     the rules file must be stored in UTF-8 encoding.  The data is
+     automatically translated into the current database's encoding when
+     loaded.  Any lines containing untranslatable characters are silently
+     ignored, so that rules files can contain rules that are not applicable in
+     the current encoding.
     </para>
    </listitem>
   </itemizedlist>
@@ -132,8 +153,8 @@ mydb=# select ts_headline('fr','H&ocirc;tel de la Mer',to_tsquery('fr','Hotels')
 
  <para>
   The <function>unaccent()</> function removes accents (diacritic signs) from
-  a given string.  Basically, it's a wrapper around the
-  <filename>unaccent</> dictionary, but it can be used outside normal
+  a given string.  Basically, it's a wrapper around
+  <filename>unaccent</>-type dictionaries, but it can be used outside normal
   text search contexts.
  </para>
 
@@ -146,6 +167,11 @@ unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </op
 </synopsis>
 
  <para>
+  If the <replaceable class="PARAMETER">dictionary</replaceable> argument is
+  omitted, <literal>unaccent</> is assumed.
+ </para>
+
+ <para>
   For example:
 <programlisting>
 SELECT unaccent('unaccent', 'H&ocirc;tel');
author	Tom Lane <tgl@sss.pgh.pa.us>	2014-06-30 20:51:26 -0400
committer	Tom Lane <tgl@sss.pgh.pa.us>	2014-06-30 20:51:30 -0400
commit	97c40ce61465582b96944e41ed6ec06c2016b95c (patch)
tree	16f8fc36e2d2ae810f2e5ecba457b69826944298 /doc/src
parent	55863274d98556acf57013f64f545d9a1e640bba (diff)
download	postgresql-97c40ce61465582b96944e41ed6ec06c2016b95c.tar.gz postgresql-97c40ce61465582b96944e41ed6ec06c2016b95c.zip