Skip to content

Commit 875e46a

Browse files
committed
Documentation update for Standard Collations.
Correct out-of-date text that said the "default" collation is always based on LC_COLLATE and LC_CTYPE. Also reformat into a list to make it easier to understand and compare the available collations, and briefly document the stability characteristics of each one. Discussion: https://postgr.es/m/4a69d067374d2f6bfb66f5bfb2ab9a020493d49f.camel@j-davis.com
1 parent 1e01374 commit 875e46a

File tree

1 file changed

+45
-27
lines changed

1 file changed

+45
-27
lines changed

doc/src/sgml/charset.sgml

+45-27
Original file line numberDiff line numberDiff line change
@@ -788,37 +788,19 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
788788
<title>Standard Collations</title>
789789

790790
<para>
791-
On all platforms, the collations named <literal>default</literal>,
792-
<literal>C</literal>, and <literal>POSIX</literal> are available. Additional
793-
collations may be available depending on operating system support.
794-
The <literal>default</literal> collation selects the <symbol>LC_COLLATE</symbol>
795-
and <symbol>LC_CTYPE</symbol> values specified at database creation time.
796-
The <literal>C</literal> and <literal>POSIX</literal> collations both specify
797-
<quote>traditional C</quote> behavior, in which only the ASCII letters
798-
<quote><literal>A</literal></quote> through <quote><literal>Z</literal></quote>
799-
are treated as letters, and sorting is done strictly by character
800-
code byte values.
801-
</para>
802-
803-
<note>
804-
<para>
805-
The <literal>C</literal> and <literal>POSIX</literal> locales may behave
806-
differently depending on the database encoding.
807-
</para>
808-
</note>
809-
810-
<para>
811-
Additionally, two SQL standard collation names are available:
791+
On all platforms, the following collations are supported:
812792

813793
<variablelist>
814794
<varlistentry>
815795
<term><literal>unicode</literal></term>
816796
<listitem>
817797
<para>
818-
This collation sorts using the Unicode Collation Algorithm with the
819-
Default Unicode Collation Element Table. It is available in all
820-
encodings. ICU support is required to use this collation. (This
821-
collation has the same behavior as the ICU root locale; see <xref
798+
This SQL standard collation sorts using the Unicode Collation
799+
Algorithm with the Default Unicode Collation Element Table. It is
800+
available in all encodings. ICU support is required to use this
801+
collation, and behavior may change if Postgres is built with a
802+
different version of ICU. (This collation has the same behavior as
803+
the ICU root locale; see <xref
822804
linkend="collation-managing-predefined-icu-und-x-icu"/>.)
823805
</para>
824806
</listitem>
@@ -828,15 +810,51 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
828810
<term><literal>ucs_basic</literal></term>
829811
<listitem>
830812
<para>
831-
This collation sorts by Unicode code point. It is only available for
832-
encoding <literal>UTF8</literal>. (This collation has the same
813+
This SQL standard collation sorts using the Unicode code point values
814+
rather than natural language order, and only the ASCII letters
815+
<quote><literal>A</literal></quote> through
816+
<quote><literal>Z</literal></quote> are treated as letters. The
817+
behavior is efficient and stable across all versions. Only available
818+
for encoding <literal>UTF8</literal>. (This collation has the same
833819
behavior as the libc locale specification <literal>C</literal> in
834820
<literal>UTF8</literal> encoding.)
835821
</para>
836822
</listitem>
837823
</varlistentry>
824+
825+
<varlistentry>
826+
<term><literal>C</literal> (equivalent to <literal>POSIX</literal>)</term>
827+
<listitem>
828+
<para>
829+
The <literal>C</literal> and <literal>POSIX</literal> collations are
830+
based on <quote>traditional C</quote> behavior. They sort by byte
831+
values rather than natural language order, and only the ASCII letters
832+
<quote><literal>A</literal></quote> through
833+
<quote><literal>Z</literal></quote> are treated as letters. The
834+
behavior is efficient and stable across all versions for a given
835+
database encoding, but behavior may vary between different database
836+
encodings.
837+
</para>
838+
</listitem>
839+
</varlistentry>
840+
841+
<varlistentry>
842+
<term><literal>default</literal></term>
843+
<listitem>
844+
<para>
845+
The <literal>default</literal> collation selects the locale specified
846+
at database creation time.
847+
</para>
848+
</listitem>
849+
</varlistentry>
838850
</variablelist>
839851
</para>
852+
853+
<para>
854+
Additional collations may be available depending on operating system
855+
support. The efficiency and stability of these additional collations
856+
depend on the collation provider, the provider version, and the locale.
857+
</para>
840858
</sect3>
841859

842860
<sect3 id="collation-managing-predefined">

0 commit comments

Comments
 (0)