|
| 1 | +<!-- $PostgreSQL: pgsql/doc/src/sgml/unaccent.sgml,v 1.6 2010/08/25 02:12:00 tgl Exp $ --> |
| 2 | + |
1 | 3 | <sect1 id="unaccent">
|
2 | 4 | <title>unaccent</title>
|
3 | 5 |
|
|
6 | 8 | </indexterm>
|
7 | 9 |
|
8 | 10 | <para>
|
9 |
| - <filename>unaccent</> removes accents (diacritic signs) from a lexeme. |
10 |
| - It's a filtering dictionary, that means its output is |
11 |
| - always passed to the next dictionary (if any), contrary to the standard |
12 |
| - behavior. Currently, it supports most important accents from European |
13 |
| - languages. |
| 11 | + <filename>unaccent</> is a text search dictionary that removes accents |
| 12 | + (diacritic signs) from lexemes. |
| 13 | + It's a filtering dictionary, which means its output is |
| 14 | + always passed to the next dictionary (if any), unlike the normal |
| 15 | + behavior of dictionaries. This allows accent-insensitive processing |
| 16 | + for full text search. |
14 | 17 | </para>
|
15 | 18 |
|
16 | 19 | <para>
|
17 |
| - Limitation: Current implementation of <filename>unaccent</> |
18 |
| - dictionary cannot be used as a normalizing dictionary for |
19 |
| - <filename>thesaurus</filename> dictionary. |
| 20 | + The current implementation of <filename>unaccent</> cannot be used as a |
| 21 | + normalizing dictionary for the <filename>thesaurus</filename> dictionary. |
20 | 22 | </para>
|
21 |
| - |
| 23 | + |
22 | 24 | <sect2>
|
23 | 25 | <title>Configuration</title>
|
24 | 26 |
|
25 | 27 | <para>
|
26 |
| - A <literal>unaccent</> dictionary accepts the following options: |
| 28 | + An <literal>unaccent</> dictionary accepts the following options: |
27 | 29 | </para>
|
28 | 30 | <itemizedlist>
|
29 | 31 | <listitem>
|
|
43 | 45 | <itemizedlist>
|
44 | 46 | <listitem>
|
45 | 47 | <para>
|
46 |
| - Each line represents pair: character_with_accent character_without_accent |
| 48 | + Each line represents a pair, consisting of a character with accent |
| 49 | + followed by a character without accent. The first is translated into |
| 50 | + the second. For example, |
47 | 51 | <programlisting>
|
48 | 52 | À A
|
49 | 53 | Á A
|
50 |
| -Â A |
| 54 | +Â A |
51 | 55 | Ã A
|
52 |
| -Ä A |
53 |
| -Å A |
54 |
| -Æ A |
| 56 | +Ä A |
| 57 | +Å A |
| 58 | +Æ A |
55 | 59 | </programlisting>
|
56 | 60 | </para>
|
57 | 61 | </listitem>
|
58 | 62 | </itemizedlist>
|
59 | 63 |
|
60 | 64 | <para>
|
61 |
| - Look at <filename>unaccent.rules</>, which is installed in |
62 |
| - <filename>$SHAREDIR/tsearch_data/</>, for an example. |
| 65 | + A more complete example, which is directly useful for most European |
| 66 | + languages, can be found in <filename>unaccent.rules</>, which is installed |
| 67 | + in <filename>$SHAREDIR/tsearch_data/</> when the <filename>unaccent</> |
| 68 | + module is installed. |
63 | 69 | </para>
|
64 | 70 | </sect2>
|
65 | 71 |
|
66 | 72 | <sect2>
|
67 | 73 | <title>Usage</title>
|
68 | 74 |
|
69 | 75 | <para>
|
70 |
| - Running the installation script creates a text search template |
71 |
| - <literal>unaccent</> and a dictionary <literal>unaccent</> |
| 76 | + Running the installation script <filename>unaccent.sql</> creates a text |
| 77 | + search template <literal>unaccent</> and a dictionary <literal>unaccent</> |
72 | 78 | based on it, with default parameters. You can alter the
|
73 | 79 | parameters, for example
|
74 | 80 |
|
75 | 81 | <programlisting>
|
76 |
| -=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules'); |
| 82 | +mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules'); |
77 | 83 | </programlisting>
|
78 | 84 |
|
79 | 85 | or create new dictionaries based on the template.
|
80 | 86 | </para>
|
81 | 87 |
|
82 | 88 | <para>
|
83 |
| - To test the dictionary, you can try |
84 |
| - |
| 89 | + To test the dictionary, you can try: |
85 | 90 | <programlisting>
|
86 |
| -=# select ts_lexize('unaccent','Hôtel'); |
87 |
| - ts_lexize |
| 91 | +mydb=# select ts_lexize('unaccent','Hôtel'); |
| 92 | + ts_lexize |
88 | 93 | -----------
|
89 | 94 | {Hotel}
|
90 | 95 | (1 row)
|
91 | 96 | </programlisting>
|
92 | 97 | </para>
|
93 |
| - |
| 98 | + |
94 | 99 | <para>
|
95 |
| - Filtering dictionary are useful for correct work of |
96 |
| - <function>ts_headline</function> function. |
| 100 | + Here is an example showing how to insert the |
| 101 | + <filename>unaccent</> dictionary into a text search configuration: |
97 | 102 | <programlisting>
|
98 |
| -=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french ); |
99 |
| -=# ALTER TEXT SEARCH CONFIGURATION fr |
| 103 | +mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french ); |
| 104 | +mydb=# ALTER TEXT SEARCH CONFIGURATION fr |
100 | 105 | ALTER MAPPING FOR hword, hword_part, word
|
101 | 106 | WITH unaccent, french_stem;
|
102 |
| -=# select to_tsvector('fr','Hôtels de la Mer'); |
103 |
| - to_tsvector |
| 107 | +mydb=# select to_tsvector('fr','Hôtels de la Mer'); |
| 108 | + to_tsvector |
104 | 109 | -------------------
|
105 | 110 | 'hotel':1 'mer':4
|
106 | 111 | (1 row)
|
107 | 112 |
|
108 |
| -=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels'); |
109 |
| - ?column? |
| 113 | +mydb=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels'); |
| 114 | + ?column? |
110 | 115 | ----------
|
111 | 116 | t
|
112 | 117 | (1 row)
|
113 |
| -=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels')); |
114 |
| - ts_headline |
| 118 | + |
| 119 | +mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels')); |
| 120 | + ts_headline |
115 | 121 | ------------------------
|
116 |
| - <b>Hôtel</b>de la Mer |
| 122 | + <b>Hôtel</b> de la Mer |
117 | 123 | (1 row)
|
118 |
| - |
119 | 124 | </programlisting>
|
120 | 125 | </para>
|
121 | 126 | </sect2>
|
122 | 127 |
|
123 | 128 | <sect2>
|
124 |
| - <title>Function</title> |
| 129 | + <title>Functions</title> |
125 | 130 |
|
126 | 131 | <para>
|
127 |
| - <function>unaccent</> function removes accents (diacritic signs) from |
128 |
| - argument string. Basically, it's a wrapper around |
129 |
| - <filename>unaccent</> dictionary. |
| 132 | + The <function>unaccent()</> function removes accents (diacritic signs) from |
| 133 | + a given string. Basically, it's a wrapper around the |
| 134 | + <filename>unaccent</> dictionary, but it can be used outside normal |
| 135 | + text search contexts. |
130 | 136 | </para>
|
131 | 137 |
|
132 | 138 | <indexterm>
|
133 | 139 | <primary>unaccent</primary>
|
134 | 140 | </indexterm>
|
135 | 141 |
|
136 | 142 | <synopsis>
|
137 |
| -unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) |
138 |
| -returns <type>text</type> |
| 143 | +unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) returns <type>text</type> |
139 | 144 | </synopsis>
|
140 | 145 |
|
141 | 146 | <para>
|
| 147 | + For example: |
142 | 148 | <programlisting>
|
143 |
| -SELECT unaccent('unaccent', 'Hôtel'); |
144 |
| -SELECT unaccent('Hôtel'); |
| 149 | +SELECT unaccent('unaccent', 'Hôtel'); |
| 150 | +SELECT unaccent('Hôtel'); |
145 | 151 | </programlisting>
|
146 | 152 | </para>
|
147 | 153 | </sect2>
|
|
0 commit comments