1
+ [ ![ Postgres Professional] ( img/PGpro-logo.png )] ( https://postgrespro.com/ )
2
+
1
3
# RUM - RUM access method
2
4
3
5
## Introduction
4
6
5
- The ** rum** module provides access method to work with RUM index. It is based
6
- on the GIN access methods code.
7
+ The ** rum** module provides access method to work with ` RUM ` index. It is based
8
+ on the ` GIN ` access methods code.
9
+
10
+ ` GIN ` index allows to perform fast full text search using ` tsvector ` and
11
+ ` tsquery ` types. But full text search with GIN index has several problems:
12
+
13
+ - Slow ranking. It is need position information about lexems to ranking. ` GIN `
14
+ index doesn't store positions of lexems. So after index scan we need additional
15
+ heap scan to retreive lexems positions.
16
+ - Slow phrase search with ` GIN ` index. This problem relates with previous
17
+ problem. It is need position information to perform phrase search.
18
+ - Slow ordering by timestamp. ` GIN ` index can't store some related information
19
+ in index with lexemes. So it is necessary to perform additional heap scan.
20
+
21
+ ` RUM ` solves this problems by storing additional information in posting tree.
22
+ For example, positional information of lexemes or timestamps. You can get an
23
+ idea of ` RUM ` by the following picture:
24
+
25
+ ![ How RUM stores additional information] ( img/gin_rum.png )
26
+
27
+ Drawback of ` RUM ` is that it has slower build and insert time than ` GIN ` .
28
+ It is because we need to store additional information besides keys and because
29
+ ` RUM ` uses generic WAL.
7
30
8
31
## License
9
32
@@ -14,7 +37,7 @@ This module available under the same license as
14
37
15
38
Before build and install ** rum** you should ensure following:
16
39
17
- * PostgreSQL version is 9.6.
40
+ * PostgreSQL version is 9.6+ .
18
41
19
42
Typical installation procedure may look like this:
20
43
@@ -25,18 +48,27 @@ Typical installation procedure may look like this:
25
48
$ make USE_PGXS=1 installcheck
26
49
$ psql DB -c "CREATE EXTENSION rum;"
27
50
28
- ## New access method and operator class
29
-
30
- The ** rum** module provides the access method ** rum** and the operator class
31
- ** rum_tsvector_ops** .
51
+ ## Common operators and functions
32
52
33
- The module provides new operators.
53
+ ** rum ** module provides next operators.
34
54
35
55
| Operator | Returns | Description
36
56
| -------------------- | ------- | ----------------------------------------------
37
57
| tsvector < ; => ; tsquery | float4 | Returns distance between tsvector and tsquery.
58
+ | timestamp < ; => ; timestamp | float8 | Returns distance between two timestamps.
59
+ | timestamp < ; =| ; timestamp | float8 | Returns distance only for left timestamps.
60
+ | timestamp | ; => ; timestamp | float8 | Returns distance only for right timestamps.
38
61
39
- ## Examples
62
+ ## Operator classes
63
+
64
+ ** rum** provides next operator classes.
65
+
66
+ ### rum_tsvector_ops
67
+
68
+ For type: ` tsvector `
69
+
70
+ This operator class stores ` tsvector ` lexemes with positional information. Supports
71
+ ordering by ` <=> ` operator and prefix search. There is the example.
40
72
41
73
Let us assume we have the table:
42
74
@@ -67,28 +99,146 @@ CREATE INDEX rumidx ON test_rum USING rum (a rum_tsvector_ops);
67
99
And we can execute the following queries:
68
100
69
101
``` sql
70
- = # SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
71
- FROM test_rum
72
- WHERE a @@ to_tsquery(' english' , ' beautiful | place' )
73
- ORDER BY a <=> to_tsquery(' english' , ' beautiful | place' );
102
+ SELECT t, a <=> to_tsquery(' english' , ' beautiful | place' ) AS rank
103
+ FROM test_rum
104
+ WHERE a @@ to_tsquery(' english' , ' beautiful | place' )
105
+ ORDER BY a <=> to_tsquery(' english' , ' beautiful | place' );
74
106
t | rank
75
107
-- -------------------------------+-----------
76
108
The situation is most beautiful | 0 .0303964
77
109
It is a beautiful | 0 .0303964
78
110
It looks like a beautiful place | 0 .0607927
79
111
(3 rows)
80
112
81
- = # SELECT t, a <=> to_tsquery('english', 'place | situation') AS rank
82
- FROM test_rum
83
- WHERE a @@ to_tsquery(' english' , ' place | situation' )
84
- ORDER BY a <=> to_tsquery(' english' , ' place | situation' );
113
+ SELECT t, a <=> to_tsquery(' english' , ' place | situation' ) AS rank
114
+ FROM test_rum
115
+ WHERE a @@ to_tsquery(' english' , ' place | situation' )
116
+ ORDER BY a <=> to_tsquery(' english' , ' place | situation' );
85
117
t | rank
86
118
-- -------------------------------+-----------
87
119
The situation is most beautiful | 0 .0303964
88
120
It looks like a beautiful place | 0 .0303964
89
121
(2 rows)
90
122
```
91
123
124
+ ### rum_tsvector_hash_ops
125
+
126
+ For type: ` tsvector `
127
+
128
+ This operator class stores hash of ` tsvector ` lexemes with positional information.
129
+ Supports ordering by ` <=> ` operator. But ** doesn't** support prefix search.
130
+
131
+ ### rum_timestamp_ops
132
+
133
+ For type: ` timestamp `
134
+
135
+ Operator class provides fast search and ordering by timestamp fields. Supports
136
+ ordering by ` <=> ` , ` <=| ` and ` |=> ` operators. Can be used with
137
+ ` rum_tsvector_timestamp_ops ` operator class.
138
+
139
+ ### rum_timestamptz_ops
140
+
141
+ For type: ` timestamptz `
142
+
143
+ Operator class provides fast search and ordering by timestamptz fields. Supports
144
+ ordering by ` <=> ` , ` <=| ` and ` |=> ` operators. Can be used with
145
+ ` rum_tsvector_timestamptz_ops ` operator class.
146
+
147
+ ### rum_tsvector_timestamp_ops
148
+
149
+ For type: ` tsvector `
150
+
151
+ This operator class stores ` tsvector ` lexems with timestamp field. There is the example.
152
+
153
+ Let us assume we have the table:
154
+ ``` sql
155
+ CREATE TABLE tsts (id int , t tsvector, d timestamp );
156
+
157
+ \copy tsts from ' rum/data/tsts.data'
158
+
159
+ CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_timestamp_ops, d)
160
+ WITH (attach = ' d' , to = ' t' );
161
+ ```
162
+
163
+ Now we can execute the following queries:
164
+ ``` sql
165
+ EXPLAIN (costs off)
166
+ SELECT id, d, d <=> ' 2016-05-16 14:21:25' FROM tsts WHERE t @@ ' wr&qh' ORDER BY d <=> ' 2016-05-16 14:21:25' LIMIT 5 ;
167
+ QUERY PLAN
168
+ -- ---------------------------------------------------------------------------------
169
+ Limit
170
+ - > Index Scan using tsts_idx on tsts
171
+ Index Cond: (t @@ ' ' ' wr' ' & ' ' qh' ' ' ::tsquery)
172
+ Order By : (d <=> ' Mon May 16 14:21:25 2016' ::timestamp without time zone )
173
+ (4 rows)
174
+
175
+ SELECT id, d, d <=> ' 2016-05-16 14:21:25' FROM tsts WHERE t @@ ' wr&qh' ORDER BY d <=> ' 2016-05-16 14:21:25' LIMIT 5 ;
176
+ id | d | ?column?
177
+ -- ---+---------------------------------+---------------
178
+ 355 | Mon May 16 14 :21 :22 .326724 2016 | 2 .673276
179
+ 354 | Mon May 16 13 :21 :22 .326724 2016 | 3602 .673276
180
+ 371 | Tue May 17 06 :21 :22 .326724 2016 | 57597 .326724
181
+ 406 | Wed May 18 17 :21 :22 .326724 2016 | 183597 .326724
182
+ 415 | Thu May 19 02 :21 :22 .326724 2016 | 215997 .326724
183
+ (5 rows)
184
+ ```
185
+
186
+ ### rum_tsvector_timestamptz_ops
187
+
188
+ For type: ` tsvector `
189
+
190
+ See comments for ` rum_tsvector_timestamp_ops ` operator class.
191
+
192
+ ### rum_tsvector_hash_timestamp_ops
193
+
194
+ For type: ` tsvector `
195
+
196
+ This operator class stores hash of ` tsvector ` lexems with timestamp field.
197
+ ** Doesn't** support prefix search.
198
+
199
+ ### rum_tsvector_hash_timestamptz_ops
200
+
201
+ For type: ` tsvector `
202
+
203
+ This operator class stores hash of ` tsvector ` lexems with timestamptz field.
204
+ ** Doesn't** support prefix search.
205
+
206
+ ### rum_tsquery_ops
207
+
208
+ For type: ` tsquery `
209
+
210
+ Stores branches of query tree in additional information. For example we have the table:
211
+ ``` sql
212
+ CREATE TABLE query (q tsquery, tag text );
213
+
214
+ INSERT INTO query VALUES (' supernova & star' , ' sn' ),
215
+ (' black' , ' color' ),
216
+ (' big & bang & black & hole' , ' bang' ),
217
+ (' spiral & galaxy' , ' shape' ),
218
+ (' black & hole' , ' color' );
219
+
220
+ CREATE INDEX query_idx ON query USING rum(q);
221
+ ```
222
+
223
+ Now we can execute the following fast query:
224
+ ``` sql
225
+ SELECT * FROM query
226
+ WHERE to_tsvector(' black holes never exists before we think about them' ) @@ q;
227
+ q | tag
228
+ -- ----------------+-------
229
+ ' black' | color
230
+ ' black' & ' hole' | color
231
+ (2 rows)
232
+ ```
233
+
234
+ ## Todo
235
+
236
+ - Allow multiple additional information (lexemes positions + timestamp).
237
+ - Add support for arrays.
238
+ - Improve ranking function to support TF/IDF.
239
+ - Improve insert time.
240
+ - Improve GENERIC WAL to support shift (PostgreSQL core changes).
241
+
92
242
## Authors
93
243
94
244
Alexander Korotkov <a.korotkov@postgrespro.ru > Postgres Professional Ltd., Russia
0 commit comments