Skip to content

Commit 3b9494b

Browse files
author
Artur Zakirov
committed
shared_ispell module added
1 parent b82f06c commit 3b9494b

File tree

10 files changed

+1557
-0
lines changed

10 files changed

+1557
-0
lines changed

contrib/shared_ispell/LICENSE

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
Copyright 2012, Tomas Vondra (tv@fuzzy.cz). All rights reserved.
2+
3+
Redistribution and use in source and binary forms, with or without modification, are
4+
permitted provided that the following conditions are met:
5+
6+
1. Redistributions of source code must retain the above copyright notice, this list of
7+
conditions and the following disclaimer.
8+
9+
2. Redistributions in binary form must reproduce the above copyright notice, this list
10+
of conditions and the following disclaimer in the documentation and/or other materials
11+
provided with the distribution.
12+
13+
THIS SOFTWARE IS PROVIDED BY TOMAS VONDRA ''AS IS'' AND ANY EXPRESS OR IMPLIED
14+
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
15+
FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL TOMAS VONDRA OR
16+
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
17+
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
18+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
19+
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
20+
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
21+
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
22+
23+
The views and conclusions contained in the software and documentation are those of the
24+
authors and should not be interpreted as representing official policies, either expressed
25+
or implied, of Tomas Vondra.

contrib/shared_ispell/META.json

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
{
2+
"name": "shared_ispell",
3+
"abstract": "Provides a shared ispell dictionary - initialized once and stored in shared segment.",
4+
"description": "Allows you to allocate area within a shared segment and use it for ispell dictionaries.",
5+
"version": "1.0.0",
6+
"maintainer": "Tomas Vondra <tv@fuzzy.cz>",
7+
"license": "bsd",
8+
"prereqs": {
9+
"runtime": {
10+
"requires": {
11+
"PostgreSQL": "8.4.0"
12+
}
13+
}
14+
},
15+
"provides": {
16+
"query_histogram": {
17+
"file": "shared_ispell--1.0.0.sql",
18+
"version": "1.0.0"
19+
}
20+
},
21+
"resources": {
22+
"repository": {
23+
"url": "https://github.com:tvondra/shared_ispell.git",
24+
"web": "http://github.com/tvondra/shared_ispell",
25+
"type": "git"
26+
}
27+
},
28+
"tags" : ["ispell", "shared", "fulltext", "dictionary"],
29+
"meta-spec": {
30+
"version": "1.0.0",
31+
"url": "http://pgxn.org/meta/spec.txt"
32+
},
33+
"release_status" : "testing"
34+
}

contrib/shared_ispell/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# contrib/shared_ispell/Makefile
2+
3+
MODULE_big = shared_ispell
4+
OBJS = src/shared_ispell.o
5+
6+
EXTENSION = shared_ispell
7+
DATA = sql/shared_ispell--1.1.0.sql
8+
9+
REGRESS = shared_ispell
10+
11+
ifdef USE_PGXS
12+
PG_CONFIG = pg_config
13+
PGXS := $(shell $(PG_CONFIG) --pgxs)
14+
include $(PGXS)
15+
else
16+
subdir = contrib/shared_ispell
17+
top_builddir = ../..
18+
include $(top_builddir)/src/Makefile.global
19+
include $(top_srcdir)/contrib/contrib-global.mk
20+
endif

contrib/shared_ispell/README.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
Shared ISpell Dictionary
2+
========================
3+
This PostgreSQL extension provides a shared ispell dictionary, i.e.
4+
a dictionary that's stored in shared segment. The traditional ispell
5+
implementation means that each session initializes and stores the
6+
dictionary on it's own, which means a lot of CPU/RAM is wasted.
7+
8+
This extension allocates an area in shared segment (you have to
9+
choose the size in advance) and then loads the dictionary into it
10+
when it's used for the first time.
11+
12+
If you need just snowball-type dictionaries, this extension is not
13+
really interesting for you. But if you really need an ispell
14+
dictionary, this may save you a lot of resources.
15+
16+
17+
Install
18+
-------
19+
Installing the extension is quite simple, especially if you're on 9.1.
20+
In that case all you need to do is this:
21+
22+
$ make install
23+
24+
and then (after connecting to the database)
25+
26+
db=# CREATE EXTENSION shared_ispell;
27+
28+
If you're on pre-9.1 version, you'll have to do the second part manually
29+
by running the SQL script (shared_ispell--x.y.sql) in the database. If
30+
needed, replace MODULE_PATHNAME by $libdir.
31+
32+
33+
Config
34+
------
35+
No the functions are created, but you still need to load the shared
36+
module. This needs to be done from postgresql.conf, as the module
37+
needs to allocate space in the shared memory segment. So add this to
38+
the config file (or update the current values)
39+
40+
# libraries to load
41+
shared_preload_libraries = 'shared_ispell'
42+
43+
# known GUC prefixes
44+
custom_variable_classes = 'shared_ispell'
45+
46+
# config of the shared memory
47+
shared_ispell.max_size = 32MB
48+
49+
Yes, there's a single GUC variable that defines the maximum size of
50+
the shared segment. This is a hard limit, the shared segment is not
51+
extensible and you need to set it so that all the dictionaries fit
52+
into it and not much memory is wasted.
53+
54+
To find out how much memory you actually need, use a large value
55+
(e.g. 200MB) and load all the dictionaries you want to use. Then use
56+
the shared_ispell_mem_used() function to find out how much memory
57+
was actually used (and set the max_size GUC variable accordingly).
58+
59+
Don't set it exactly to that value, leave there some free space,
60+
so that you can reload the dictionaries without changing the GUC
61+
max_size limit (which requires a restart of the DB). Ssomething
62+
like 512kB should be just fine.
63+
64+
The shared segment can contain several dictionaries at the same time,
65+
the amount of memory is the only limit. There's no limit on number
66+
of dictionaries / words etc. Just the max_size GUC variable.
67+
68+
69+
Using the dictionary
70+
--------------------
71+
Technically, the extension defines a 'shared_ispell' template that
72+
you may use to define custom dictionaries. E.g. you may do this
73+
74+
CREATE TEXT SEARCH DICTIONARY czech_shared (
75+
TEMPLATE = shared_ispell,
76+
DictFile = czech,
77+
AffFile = czech,
78+
StopWords = czech
79+
);
80+
81+
CREATE TEXT SEARCH CONFIGURATION public.czech_shared
82+
( COPY = pg_catalog.simple );
83+
84+
ALTER TEXT SEARCH CONFIGURATION czech_shared
85+
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
86+
word, hword, hword_part
87+
WITH czech_shared;
88+
89+
and then do the usual stuff, e.g.
90+
91+
db=# SELECT ts_lexize('czech_shared', 'automobile');
92+
93+
or whatever you want.
94+
95+
96+
Available functions
97+
-------------------
98+
The extension provides five management functions, that allow you to
99+
manage and get info about the preloaded dictionaries. The first two
100+
functions
101+
102+
shared_ispell_mem_used()
103+
shared_ispell_mem_available()
104+
105+
allow you to get info about the shared segment (used and free memory)
106+
e.g. to properly size the segment (max_size). Then there are functions
107+
return list of dictionaries / stop lists loaded in the shared segment
108+
109+
shared_ispell_dicts()
110+
shared_ispell_stoplists()
111+
112+
e.g. like this
113+
114+
db=# SELECT * FROM shared_ispell_dicts();
115+
116+
dict_name | affix_name | words | affixes | bytes
117+
-----------+------------+-------+---------+----------
118+
bulgarian | bulgarian | 79267 | 12 | 7622128
119+
czech | czech | 96351 | 2544 | 12715000
120+
(2 rows)
121+
122+
123+
db=# SELECT * FROM shared_ispell_stoplists();
124+
125+
stop_name | words | bytes
126+
-----------+-------+-------
127+
czech | 259 | 4552
128+
(1 row)
129+
130+
The last function allows you to reset the dictionary (e.g. so that you
131+
can reload the updated files from disk). The sessions that already use
132+
the dictionaries will be forced to reinitialize them (the first one
133+
will rebuild and copy them in the shared segment, the other ones will
134+
use this prepared data).
135+
136+
db=# SELECT shared_ispell_reset();
137+
138+
That's all for now ...

0 commit comments

Comments
 (0)