Skip to content

Commit ebc11d4

Browse files
author
Artur Zakirov
committed
shared_ispell: added documentation
1 parent 9f3a11b commit ebc11d4

File tree

3 files changed

+197
-0
lines changed

3 files changed

+197
-0
lines changed

doc/src/sgml/contrib.sgml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,7 @@ CREATE EXTENSION <replaceable>module_name</> FROM unpackaged;
137137
&postgres-fdw;
138138
&seg;
139139
&sepgsql;
140+
&shared-ispell;
140141
&contrib-spi;
141142
&sr-plan;
142143
&sslinfo;

doc/src/sgml/filelist.sgml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -142,6 +142,7 @@
142142
<!ENTITY seg SYSTEM "seg.sgml">
143143
<!ENTITY contrib-spi SYSTEM "contrib-spi.sgml">
144144
<!ENTITY sepgsql SYSTEM "sepgsql.sgml">
145+
<!ENTITY shared-ispell SYSTEM "shared-ispell.sgml">
145146
<!ENTITY sr-plan SYSTEM "sr_plan.sgml">
146147
<!ENTITY sslinfo SYSTEM "sslinfo.sgml">
147148
<!ENTITY tablefunc SYSTEM "tablefunc.sgml">

doc/src/sgml/shared-ispell.sgml

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
<!-- doc/src/sgml/shared-ispell.sgml -->
2+
3+
<sect1 id="shared-ispell" xreflabel="shared_ispell">
4+
<title>shared_ispell</title>
5+
6+
<indexterm zone="shared-ispell">
7+
<primary>shared_ispell</primary>
8+
</indexterm>
9+
10+
<para>
11+
The <filename>shared_ispell</filename> module provides a shared ispell
12+
dictionary, i.e. a dictionary that's stored in shared segment. The traditional
13+
ispell implementation means that each session initializes and stores the
14+
dictionary on it's own, which means a lot of CPU/RAM is wasted.
15+
</para>
16+
17+
<para>
18+
This extension allocates an area in shared segment (you have to choose the
19+
size in advance) and then loads the dictionary into it when it's used for the
20+
first time.
21+
</para>
22+
23+
<sect2>
24+
<title>Functions</title>
25+
26+
<para>
27+
The functions provided by the <filename>shared_ispell</filename> module
28+
are shown in <xref linkend="shared-ispell-func-table">.
29+
</para>
30+
31+
<table id="shared-ispell-func-table">
32+
<title><filename>shared_ispell</filename> Functions</title>
33+
<tgroup cols="3">
34+
<thead>
35+
<row>
36+
<entry>Function</entry>
37+
<entry>Returns</entry>
38+
<entry>Description</entry>
39+
</row>
40+
</thead>
41+
42+
<tbody>
43+
<row>
44+
<entry><function>shared_ispell_reset()</function><indexterm><primary>shared_ispell_reset</primary></indexterm></entry>
45+
<entry><type>void</type></entry>
46+
<entry>
47+
Resets the dictionaries (e.g. so that you can reload the updated files
48+
from disk). The sessions that already use the dictionaries will be forced
49+
to reinitialize them.
50+
</entry>
51+
</row>
52+
<row>
53+
<entry><function>shared_ispell_mem_used()</function><indexterm><primary>shared_ispell_mem_used</primary></indexterm></entry>
54+
<entry><type>int</type></entry>
55+
<entry>
56+
Returns a value of used memory of the shared segment by loaded shared
57+
dictionaries in bytes.
58+
</entry>
59+
</row>
60+
<row>
61+
<entry><function>shared_ispell_mem_available()</function><indexterm><primary>shared_ispell_mem_available</primary></indexterm></entry>
62+
<entry><type>int</type></entry>
63+
<entry>
64+
Returns a value of available memory of the shared segment.
65+
</entry>
66+
</row>
67+
<row>
68+
<entry><function>shared_ispell_dicts()</function><indexterm><primary>shared_ispell_dicts</primary></indexterm></entry>
69+
<entry><type>setof(dict_name varchar, affix_name varchar, words int, affixes int, bytes int)</type></entry>
70+
<entry>
71+
Returns a list of dictionaries loaded in the shared segment.
72+
</entry>
73+
</row>
74+
<row>
75+
<entry><function>shared_ispell_stoplists()</function><indexterm><primary>shared_ispell_stoplists</primary></indexterm></entry>
76+
<entry><type>setof(stop_name varchar, words int, bytes int)</type></entry>
77+
<entry>
78+
Returns a list of stopwords loaded in the shared segment.
79+
</entry>
80+
</row>
81+
</tbody>
82+
</tgroup>
83+
</table>
84+
</sect2>
85+
86+
<sect2>
87+
<title>GUC Parameters</title>
88+
89+
<variablelist>
90+
<varlistentry id="guc-shared-ispell-max-size" xreflabel="shared_ispell.max_size">
91+
<term>
92+
<varname>shared_ispell.max_size</> (<type>int</type>)
93+
<indexterm>
94+
<primary><varname>shared_ispell.max_size</> configuration parameter</primary>
95+
</indexterm>
96+
</term>
97+
<listitem>
98+
<para>
99+
Defines the maximum size of the shared segment. This is a hard limit, the
100+
shared segment is not extensible and you need to set it so that all the
101+
dictionaries fit into it and not much memory is wasted.
102+
</para>
103+
</listitem>
104+
</varlistentry>
105+
</variablelist>
106+
</sect2>
107+
108+
<sect2>
109+
<title>Using the dictionary</title>
110+
111+
<para>
112+
The module needs to allocate space in the shared memory segment. So add this
113+
to the config file (or update the current values):
114+
115+
<programlisting>
116+
# libraries to load
117+
shared_preload_libraries = 'shared_ispell'
118+
119+
# config of the shared memory
120+
shared_ispell.max_size = 32MB
121+
</programlisting>
122+
</para>
123+
124+
<para>
125+
To find out how much memory you actually need, use a large value (e.g. 200MB)
126+
and load all the dictionaries you want to use. Then use the
127+
<function>shared_ispell_mem_used()</function> function to find out how much
128+
memory was actually used (and set the <varname>shared_ispell.max_size</varname>
129+
GUC variable accordingly).
130+
</para>
131+
132+
<para>
133+
Don't set it exactly to that value, leave there some free space, so that you
134+
can reload the dictionaries without changing the GUC max_size limit
135+
(which requires a restart of the DB). Something like 512kB should be just fine.
136+
</para>
137+
138+
<para>
139+
The extension defines a <literal>shared_ispell</literal> template that you
140+
may use to define custom dictionaries. E.g. you may do this:
141+
142+
<programlisting>
143+
CREATE TEXT SEARCH DICTIONARY english_shared (
144+
TEMPLATE = shared_ispell,
145+
DictFile = en_us,
146+
AffFile = en_us,
147+
StopWords = english
148+
);
149+
150+
CREATE TEXT SEARCH CONFIGURATION public.english_shared
151+
( COPY = pg_catalog.simple );
152+
153+
ALTER TEXT SEARCH CONFIGURATION english_shared
154+
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
155+
word, hword, hword_part
156+
WITH english_shared, english_stem;
157+
</programlisting>
158+
</para>
159+
160+
<para>
161+
We can test created configuration:
162+
163+
<programlisting>
164+
SELECT * FROM ts_debug('english_shared', 'abilities');
165+
alias | description | token | dictionaries | dictionary | lexemes
166+
-----------+-----------------+-----------+-------------------------------+----------------+-----------
167+
asciiword | Word, all ASCII | abilities | {english_shared,english_stem} | english_shared | {ability}
168+
(1 row)
169+
</programlisting>
170+
</para>
171+
172+
<para>
173+
Or you can update your own text search configuration. For example, you have
174+
the <literal>public.english</literal> dictionary. You can update it to use
175+
the <literal>shared_ispell</literal> template:
176+
177+
<programlisting>
178+
ALTER TEXT SEARCH CONFIGURATION public.english
179+
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
180+
word, hword, hword_part
181+
WITH english_shared, english_stem;
182+
</programlisting>
183+
</para>
184+
185+
</sect2>
186+
187+
<sect2>
188+
<title>Author</title>
189+
190+
<para>
191+
Tomas Vondra <email>tomas.vondra@2ndquadrant.com</email>, Prague, Czech Republic
192+
</para>
193+
</sect2>
194+
195+
</sect1>

0 commit comments

Comments
 (0)