4
4
5
5
## Introduction
6
6
7
- The ** rum** module provides access method to work with RUM index. It is based
8
- on the GIN access methods code.
7
+ The ** rum** module provides access method to work with ` RUM ` index. It is based
8
+ on the ` GIN ` access methods code.
9
+
10
+ ` GIN ` index allows to perform fast full text search using ` tsvector ` and
11
+ ` tsquery ` types. But full text search with GIN index has several problems:
12
+
13
+ - Slow ranking. It is need position information about lexems to ranking. ` GIN `
14
+ index doesn't store positions of lexems. So after index scan we need additional
15
+ heap scan to retreive lexems positions.
16
+ - There isn't phrase search with ` GIN ` index. This problem relates with previous
17
+ problem. It is need position information to perform phrase search.
18
+ - Slow ordering by timestamp. ` GIN ` index can't store some related information
19
+ in index with lexemes. So it is necessary to perform additional heap scan.
20
+
21
+ ` RUM ` solves this problems by storing additional information in posting tree.
22
+ For example, positional information of lexemes or timestamps. You can get an
23
+ idea of ` RUM ` by the following picture:
24
+
25
+ ![ How RUM stores additional information] ( img/gin_rum.png )
26
+
27
+ Drawback of ` RUM ` is that it has slower build and insert time than ` GIN ` .
28
+ It is because we need to store additional information besides keys and because
29
+ ` RUM ` uses generic WAL.
9
30
10
31
## License
11
32
@@ -16,7 +37,7 @@ This module available under the same license as
16
37
17
38
Before build and install ** rum** you should ensure following:
18
39
19
- * PostgreSQL version is 9.6.
40
+ * PostgreSQL version is 9.6+ .
20
41
21
42
Typical installation procedure may look like this:
22
43
@@ -27,18 +48,27 @@ Typical installation procedure may look like this:
27
48
$ make USE_PGXS=1 installcheck
28
49
$ psql DB -c "CREATE EXTENSION rum;"
29
50
30
- ## New access method and operator class
31
-
32
- The ** rum** module provides the access method ** rum** and the operator class
33
- ** rum_tsvector_ops** .
51
+ ## Common operators and functions
34
52
35
- The module provides new operators.
53
+ ** rum ** module provides next operators.
36
54
37
55
| Operator | Returns | Description
38
56
| -------------------- | ------- | ----------------------------------------------
39
57
| tsvector < ; => ; tsquery | float4 | Returns distance between tsvector and tsquery.
58
+ | timestamp < ; => ; timestamp | float8 | Returns distance between two timestamps.
59
+ | timestamp < ; =| ; timestamp | float8 | Returns distance only for left timestamps.
60
+ | timestamp | ; => ; timestamp | float8 | Returns distance only for right timestamps.
61
+
62
+ ## Operator classes
40
63
41
- ## Examples
64
+ ** rum** provides next operator classes
65
+
66
+ ### rum_tsvector_ops
67
+
68
+ For type: ` tsvector `
69
+
70
+ This operator class stores ` tsvector ` lexemes with positional information. Supports
71
+ ordering by ` <=> ` operator and prefix search. There is the example.
42
72
43
73
Let us assume we have the table:
44
74
@@ -70,9 +100,9 @@ And we can execute the following queries:
70
100
71
101
``` sql
72
102
= # SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
73
- FROM test_rum
74
- WHERE a @@ to_tsquery(' english' , ' beautiful | place' )
75
- ORDER BY a <=> to_tsquery(' english' , ' beautiful | place' );
103
+ FROM test_rum
104
+ WHERE a @@ to_tsquery(' english' , ' beautiful | place' )
105
+ ORDER BY a <=> to_tsquery(' english' , ' beautiful | place' );
76
106
t | rank
77
107
-- -------------------------------+-----------
78
108
The situation is most beautiful | 0 .0303964
@@ -81,16 +111,128 @@ And we can execute the following queries:
81
111
(3 rows)
82
112
83
113
= # SELECT t, a <=> to_tsquery('english', 'place | situation') AS rank
84
- FROM test_rum
85
- WHERE a @@ to_tsquery(' english' , ' place | situation' )
86
- ORDER BY a <=> to_tsquery(' english' , ' place | situation' );
114
+ FROM test_rum
115
+ WHERE a @@ to_tsquery(' english' , ' place | situation' )
116
+ ORDER BY a <=> to_tsquery(' english' , ' place | situation' );
87
117
t | rank
88
118
-- -------------------------------+-----------
89
119
The situation is most beautiful | 0 .0303964
90
120
It looks like a beautiful place | 0 .0303964
91
121
(2 rows)
92
122
```
93
123
124
+ ### rum_tsvector_hash_ops
125
+
126
+ For type: ` tsvector `
127
+
128
+ This operator class stores hash of ` tsvector ` lexemes with positional information.
129
+ Supports ordering by ` <=> ` operator. But ** doesn't** support prefix search.
130
+
131
+ ### rum_timestamp_ops
132
+
133
+ For type: ` timestamp `
134
+
135
+ Operator class provides fast search and ordering by timestamp fields. Supports
136
+ ordering by ` <=> ` , ` <=| ` and ` |=> ` operators. Can be used
137
+ with ` rum_tsvector_timestamp_ops ` operator class.
138
+
139
+ ### rum_timestamptz_ops
140
+
141
+ For type: ` timestamptz `
142
+
143
+ Operator class provides fast search and ordering by timestamptz fields. Supports
144
+ ordering by ` <=> ` , ` <=| ` and ` |=> ` operators. Can be used
145
+ with ` rum_tsvector_timestamptz_ops ` operator class.
146
+
147
+ ### rum_tsvector_timestamp_ops
148
+
149
+ For type: ` tsvector `
150
+
151
+ This operator class stores ` tsvector ` lexems with timestamp field. There is the example:
152
+
153
+ Let us assume we have the table:
154
+ ``` sql
155
+ CREATE TABLE tsts (id int , t tsvector, d timestamp );
156
+
157
+ \copy tsts from ' rum/data/tsts.data'
158
+
159
+ CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_timestamp_ops, d)
160
+ WITH (attach = ' d' , to = ' t' );
161
+ ```
162
+
163
+ Now we can execute the following queries:
164
+ ``` sql
165
+ EXPLAIN (costs off)
166
+ SELECT id, d, d <=> ' 2016-05-16 14:21:25' FROM tsts WHERE t @@ ' wr&qh' ORDER BY d <=> ' 2016-05-16 14:21:25' LIMIT 5 ;
167
+ QUERY PLAN
168
+ -- ---------------------------------------------------------------------------------
169
+ Limit
170
+ - > Index Scan using tsts_idx on tsts
171
+ Index Cond: (t @@ ' ' ' wr' ' & ' ' qh' ' ' ::tsquery)
172
+ Order By : (d <=> ' Mon May 16 14:21:25 2016' ::timestamp without time zone )
173
+ (4 rows)
174
+
175
+ SELECT id, d, d <=> ' 2016-05-16 14:21:25' FROM tsts WHERE t @@ ' wr&qh' ORDER BY d <=> ' 2016-05-16 14:21:25' LIMIT 5 ;
176
+ id | d | ?column?
177
+ -- ---+---------------------------------+---------------
178
+ 355 | Mon May 16 14 :21 :22 .326724 2016 | 2 .673276
179
+ 354 | Mon May 16 13 :21 :22 .326724 2016 | 3602 .673276
180
+ 371 | Tue May 17 06 :21 :22 .326724 2016 | 57597 .326724
181
+ 406 | Wed May 18 17 :21 :22 .326724 2016 | 183597 .326724
182
+ 415 | Thu May 19 02 :21 :22 .326724 2016 | 215997 .326724
183
+ (5 rows)
184
+ ```
185
+
186
+ ### rum_tsvector_timestamptz_ops
187
+
188
+ For type: ` tsvector `
189
+
190
+ See comments for ` rum_tsvector_timestamp_ops ` operator class.
191
+
192
+ ### rum_tsvector_hash_timestamp_ops
193
+
194
+ For type: ` tsvector `
195
+
196
+ This operator class stores hash of ` tsvector ` lexems with timestamp field.
197
+ ** Doesn't** support prefix search.
198
+
199
+ ### rum_tsvector_hash_timestamptz_ops
200
+
201
+ For type: ` tsvector `
202
+
203
+ This operator class stores hash of ` tsvector ` lexems with timestamptz field.
204
+ ** Doesn't** support prefix search.
205
+
206
+ ### rum_tsquery_ops
207
+
208
+ For type: ` tsquery `
209
+
210
+ Stores branches of query tree in additional information. For example we have the table:
211
+ ``` sql
212
+ CREATE TABLE query (q tsquery, tag text );
213
+
214
+ INSERT INTO query VALUES (' supernova & star' , ' sn' ),
215
+ (' black' , ' color' ),
216
+ (' big & bang & black & hole' , ' bang' ),
217
+ (' spiral & galaxy' , ' shape' ),
218
+ (' black & hole' , ' color' );
219
+
220
+ CREATE INDEX query_idx ON query USING rum(q);
221
+ ```
222
+
223
+ Now we can execute the following fast query:
224
+ ``` sql
225
+ SELECT * FROM query
226
+ WHERE to_tsvector(' black holes never exists before we think about them' ) @@ q;
227
+ q | tag
228
+ -- ----------------+-------
229
+ ' black' | color
230
+ ' black' & ' hole' | color
231
+ (2 rows)
232
+
233
+
234
+ ```
235
+
94
236
## Authors
95
237
96
238
Alexander Korotkov <a.korotkov@postgrespro.ru > Postgres Professional Ltd., Russia
0 commit comments