diff options
author | Jeff Davis <jdavis@postgresql.org> | 2024-03-02 13:37:43 -0800 |
---|---|---|
committer | Jeff Davis <jdavis@postgresql.org> | 2024-03-02 13:37:43 -0800 |
commit | 875e46a0a246e416b12a9debe084ede9d02f1b5d (patch) | |
tree | ec4b814224b9906bc730849650257159cce25831 /doc/src | |
parent | 1e013746544bd1f9df70f5547894fd72719c4b85 (diff) | |
download | postgresql-875e46a0a246e416b12a9debe084ede9d02f1b5d.tar.gz postgresql-875e46a0a246e416b12a9debe084ede9d02f1b5d.zip |
Documentation update for Standard Collations.
Correct out-of-date text that said the "default" collation is always
based on LC_COLLATE and LC_CTYPE.
Also reformat into a list to make it easier to understand and compare
the available collations, and briefly document the stability
characteristics of each one.
Discussion: https://postgr.es/m/4a69d067374d2f6bfb66f5bfb2ab9a020493d49f.camel@j-davis.com
Diffstat (limited to 'doc/src')
-rw-r--r-- | doc/src/sgml/charset.sgml | 72 |
1 files changed, 45 insertions, 27 deletions
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 74783d148fe..4fc143025ef 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -788,37 +788,19 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR"; <title>Standard Collations</title> <para> - On all platforms, the collations named <literal>default</literal>, - <literal>C</literal>, and <literal>POSIX</literal> are available. Additional - collations may be available depending on operating system support. - The <literal>default</literal> collation selects the <symbol>LC_COLLATE</symbol> - and <symbol>LC_CTYPE</symbol> values specified at database creation time. - The <literal>C</literal> and <literal>POSIX</literal> collations both specify - <quote>traditional C</quote> behavior, in which only the ASCII letters - <quote><literal>A</literal></quote> through <quote><literal>Z</literal></quote> - are treated as letters, and sorting is done strictly by character - code byte values. - </para> - - <note> - <para> - The <literal>C</literal> and <literal>POSIX</literal> locales may behave - differently depending on the database encoding. - </para> - </note> - - <para> - Additionally, two SQL standard collation names are available: + On all platforms, the following collations are supported: <variablelist> <varlistentry> <term><literal>unicode</literal></term> <listitem> <para> - This collation sorts using the Unicode Collation Algorithm with the - Default Unicode Collation Element Table. It is available in all - encodings. ICU support is required to use this collation. (This - collation has the same behavior as the ICU root locale; see <xref + This SQL standard collation sorts using the Unicode Collation + Algorithm with the Default Unicode Collation Element Table. It is + available in all encodings. ICU support is required to use this + collation, and behavior may change if Postgres is built with a + different version of ICU. (This collation has the same behavior as + the ICU root locale; see <xref linkend="collation-managing-predefined-icu-und-x-icu"/>.) </para> </listitem> @@ -828,15 +810,51 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR"; <term><literal>ucs_basic</literal></term> <listitem> <para> - This collation sorts by Unicode code point. It is only available for - encoding <literal>UTF8</literal>. (This collation has the same + This SQL standard collation sorts using the Unicode code point values + rather than natural language order, and only the ASCII letters + <quote><literal>A</literal></quote> through + <quote><literal>Z</literal></quote> are treated as letters. The + behavior is efficient and stable across all versions. Only available + for encoding <literal>UTF8</literal>. (This collation has the same behavior as the libc locale specification <literal>C</literal> in <literal>UTF8</literal> encoding.) </para> </listitem> </varlistentry> + + <varlistentry> + <term><literal>C</literal> (equivalent to <literal>POSIX</literal>)</term> + <listitem> + <para> + The <literal>C</literal> and <literal>POSIX</literal> collations are + based on <quote>traditional C</quote> behavior. They sort by byte + values rather than natural language order, and only the ASCII letters + <quote><literal>A</literal></quote> through + <quote><literal>Z</literal></quote> are treated as letters. The + behavior is efficient and stable across all versions for a given + database encoding, but behavior may vary between different database + encodings. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>default</literal></term> + <listitem> + <para> + The <literal>default</literal> collation selects the locale specified + at database creation time. + </para> + </listitem> + </varlistentry> </variablelist> </para> + + <para> + Additional collations may be available depending on operating system + support. The efficiency and stability of these additional collations + depend on the collation provider, the provider version, and the locale. + </para> </sect3> <sect3 id="collation-managing-predefined"> |