1
- <!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.5 2000/12/22 21:51:57 petere Exp $ -->
1
+ <!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.6 2001/01/19 04:47:50 tgl Exp $ -->
2
2
3
3
<chapter id="charset">
4
4
<title>Localization</>
54
54
cultural preferences regarding alphabets, sorting, number
55
55
formatting, etc. <productname>PostgreSQL</> uses the standard ISO
56
56
C and POSIX-like locale facilities provided by the server operating
57
- system. For additional information refer the documentation of your
57
+ system. For additional information refer to the documentation of your
58
58
system.
59
59
</para>
60
60
61
61
<sect2>
62
62
<title>Overview</>
63
63
64
64
<para>
65
- Locale support is not build into <productname>PostgreSQL</> by
65
+ Locale support is not built into <productname>PostgreSQL</> by
66
66
default; to enable it, supply the <option>--enable-locale</> option
67
67
to the <filename>configure</> script:
68
68
<informalexample>
@@ -95,7 +95,7 @@ export LANG=sv_SE
95
95
96
96
<para>
97
97
Occasionally it is useful to mix rules from several locales, e.g.,
98
- use U.S. rules but Spanish messages. To do that a set of
98
+ use U.S. collation rules but Spanish messages. To do that a set of
99
99
environment variables exist that override the default of
100
100
<envar>LANG</> for a particular category:
101
101
@@ -141,14 +141,23 @@ export LANG=sv_SE
141
141
</para>
142
142
143
143
<para>
144
- Once you have chosen a set of localization rules this way you must
145
- keep them fixed for any particular database cluster. That means
146
- that the locales that were active when you ran <filename>initdb</>
147
- must be kept the same when you start the postmaster. Otherwise,
148
- the changed sort order can corrupt indexes or make your data
149
- disappear mysteriously. It is currently not possible to change the
150
- locales after database initialization or to use more than one set
151
- of locales for a given database cluster.
144
+ Note that the locale behavior is determined by the environment
145
+ variables seen by the server, not by the environment of any client.
146
+ Therefore, be careful to set these variables before starting the
147
+ postmaster.
148
+ </para>
149
+
150
+ <para>
151
+ The <envar>LC_COLLATE</> and <envar>LC_CTYPE</> variables affect the
152
+ sort order of indexes. Therefore, these values must be kept fixed
153
+ for any particular database cluster, or indexes on text columns will
154
+ become corrupt. <productname>Postgres</productname> enforces this
155
+ by recording the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</>
156
+ that are seen by <command>initdb</>. The server automatically adopts
157
+ those two values when it is started; only the other <envar>LC_</>
158
+ categories can be set from the environment at server startup.
159
+ In short, only one collation order can be used in a database cluster,
160
+ and it is chosen at <command>initdb</> time.
152
161
</para>
153
162
</sect2>
154
163
@@ -183,7 +192,10 @@ export LANG=sv_SE
183
192
<para>
184
193
The only severe drawback of using the locale support in
185
194
<productname>PostgreSQL</> is its speed. So use locale only if you
186
- actually need it.
195
+ actually need it. It should be noted in particular that selecting
196
+ a non-C locale disables index optimizations for <literal>LIKE</> and
197
+ <literal>~</> operators, which can make a huge difference in the
198
+ speed of searches that use those operators.
187
199
</para>
188
200
</sect2>
189
201
@@ -261,7 +273,7 @@ perl: warning: Falling back to the standard locale ("C").
261
273
262
274
<para>
263
275
<acronym>MB</acronym> also fixes some problems concerning 8-bit single byte
264
- character sets including ISO8859. (I would not say all of problems
276
+ character sets including ISO8859. (I would not say all problems
265
277
have been fixed. I just confirmed that the regression test ran fine
266
278
and a few French characters could be used with the patch. Please let
267
279
me know if you find any problem while using 8-bit characters.)
@@ -271,7 +283,7 @@ perl: warning: Falling back to the standard locale ("C").
271
283
<title>Enabling MB</title>
272
284
273
285
<para>
274
- Run configure with a multibyte option:
286
+ Run configure with the multibyte option:
275
287
276
288
<programlisting>
277
289
% ./configure --enable-multibyte[=<replaceable>encoding_system</replaceable>]
@@ -383,11 +395,11 @@ perl: warning: Falling back to the standard locale ("C").
383
395
% initdb -E EUC_JP
384
396
</programlisting>
385
397
386
- sets the default encoding to EUC_JP(Extended Unix Code for Japanese).
398
+ sets the default encoding to EUC_JP (Extended Unix Code for Japanese).
387
399
Note that you can use "--encoding" instead of "-E" if you prefer
388
400
to type longer option strings.
389
401
If no -E or --encoding option is given, the encoding
390
- specified at the compile time is used.
402
+ specified at configure time is used.
391
403
</para>
392
404
393
405
<para>
@@ -397,8 +409,8 @@ perl: warning: Falling back to the standard locale ("C").
397
409
% createdb -E EUC_KR korean
398
410
</programlisting>
399
411
400
- will create a database named "korean" with EUC_KR encoding. The
401
- another way to accomplish this is to use a SQL command:
412
+ will create a database named "korean" with EUC_KR encoding.
413
+ Another way to accomplish this is to use a SQL command:
402
414
403
415
<programlisting>
404
416
CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
@@ -527,20 +539,11 @@ char *pg_encoding_to_char(int <replaceable>encoding_id</replaceable>)
527
539
</para>
528
540
</listitem>
529
541
530
- <listitem>
531
- <para>
532
- Using <envar>PGCLIENTENCODING</envar>.
533
-
534
- If an environment variable <envar>PGCLIENTENCODING</envar> is defined in the
535
- frontend, an automatic encoding translation is done by the backend.
536
- </para>
537
- </listitem>
538
-
539
542
<listitem>
540
543
<para>
541
544
Using <command>SET CLIENT_ENCODING TO</command>.
542
545
543
- Setting the frontend side encoding can be done a SQL command:
546
+ Setting the frontend side encoding can be done by this SQL command:
544
547
545
548
<programlisting>
546
549
SET CLIENT_ENCODING TO 'encoding';
@@ -552,7 +555,7 @@ SET CLIENT_ENCODING TO 'encoding';
552
555
SET NAMES 'encoding';
553
556
</programlisting>
554
557
555
- To query the current the frontend encoding:
558
+ To query the current frontend encoding:
556
559
557
560
<programlisting>
558
561
SHOW CLIENT_ENCODING;
@@ -565,6 +568,17 @@ RESET CLIENT_ENCODING;
565
568
</programlisting>
566
569
</para>
567
570
</listitem>
571
+
572
+ <listitem>
573
+ <para>
574
+ Using <envar>PGCLIENTENCODING</envar>.
575
+
576
+ If environment variable <envar>PGCLIENTENCODING</envar> is defined
577
+ in the client's environment, that client encoding is automatically
578
+ selected when a backend connection is made. (This can subsequently
579
+ be overridden using any of the other methods mentioned above.)
580
+ </para>
581
+ </listitem>
568
582
</itemizedlist>
569
583
</para>
570
584
</sect2>
@@ -588,7 +602,7 @@ RESET CLIENT_ENCODING;
588
602
<para>
589
603
Suppose you choose EUC_JP for the backend, LATIN1 for the frontend,
590
604
then some Japanese characters could not be translated into LATIN1. In
591
- this case, a letter cannot be represented in the LATIN1 character set,
605
+ this case, a letter that cannot be represented in the LATIN1 character set
592
606
would be transformed as:
593
607
594
608
<programlisting>
@@ -601,7 +615,7 @@ RESET CLIENT_ENCODING;
601
615
<title>References</title>
602
616
603
617
<para>
604
- These are good sources to start learning various kind of encoding
618
+ These are good sources to start learning about various kinds of encoding
605
619
systems.
606
620
607
621
<itemizedlist>
@@ -724,8 +738,7 @@ Mar 1, 1998 PL1 released
724
738
<para>
725
739
<!--
726
740
[Here is a good documentation explaining how to use WIN1250 on
727
- Windows/ODBC from Pavel Behal. Please note that Installation step 1)
728
- is not necceary in 6.5.1 - Tatsuo]
741
+ Windows/ODBC from Pavel Behal]
729
742
730
743
Version: 0.91 for PgSQL 6.5
731
744
Author: Pavel Behal
@@ -815,20 +828,14 @@ Sorry for my Eglish and C code, I'm not native :-)
815
828
<title>WIN1250 on Windows/ODBC</title>
816
829
<step>
817
830
<para>
818
- Change the three relevant files in the source directories.
819
- </para>
820
- </step>
821
-
822
- <step>
823
- <para>
824
- Compile <productname>Postgres</productname> with local enabled
831
+ Compile <productname>Postgres</productname> with locale enabled
825
832
and the multibyte encoding set to <literal>LATIN2</literal>.
826
833
</para>
827
834
</step>
828
835
829
836
<step>
830
837
<para>
831
- Set up your instalation . Do not forget to create locale
838
+ Set up your installation . Do not forget to create locale
832
839
variables in your profile (environment). For example (this may
833
840
not be correct for <emphasis>your</emphasis> environment):
834
841
@@ -936,16 +943,16 @@ HostCharset <replaceable>host_spec</> <replaceable>host_charset</>
936
943
<para>
937
944
The <filename>charset.conf</> file is always processed up to the
938
945
end, so you can easily specify exceptions from the previous
939
- rules. In the src/data you will find charset.conf example and a few
940
- recoding tables.
946
+ rules. In the <filename> src/data/</> directory you will find an
947
+ example <filename>charset.conf</> and a few recoding tables.
941
948
</para>
942
949
943
950
<para>
944
951
As this solution is based on the client's IP address and character
945
952
set mapping there are obviously some restrictions as well. You
946
953
cannot use different encodings on the same host at the same
947
954
time. It is also inconvenient when you boot your client hosts into
948
- more operating systems. Nevertheless, when these restrictions are
955
+ multiple operating systems. Nevertheless, when these restrictions are
949
956
not limiting and you do not need multi-byte characters than it is a
950
957
simple and effective solution.
951
958
</para>
0 commit comments