Skip to content

Commit 80101ab

Browse files
author
Zakirov Artur
committed
Added more information in README. Listed all operator classes
1 parent af9b870 commit 80101ab

File tree

2 files changed

+157
-15
lines changed

2 files changed

+157
-15
lines changed

README.md

Lines changed: 157 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,29 @@
44

55
## Introduction
66

7-
The **rum** module provides access method to work with RUM index. It is based
8-
on the GIN access methods code.
7+
The **rum** module provides access method to work with `RUM` index. It is based
8+
on the `GIN` access methods code.
9+
10+
`GIN` index allows to perform fast full text search using `tsvector` and
11+
`tsquery` types. But full text search with GIN index has several problems:
12+
13+
- Slow ranking. It is need position information about lexems to ranking. `GIN`
14+
index doesn't store positions of lexems. So after index scan we need additional
15+
heap scan to retreive lexems positions.
16+
- There isn't phrase search with `GIN` index. This problem relates with previous
17+
problem. It is need position information to perform phrase search.
18+
- Slow ordering by timestamp. `GIN` index can't store some related information
19+
in index with lexemes. So it is necessary to perform additional heap scan.
20+
21+
`RUM` solves this problems by storing additional information in posting tree.
22+
For example, positional information of lexemes or timestamps. You can get an
23+
idea of `RUM` by the following picture:
24+
25+
![How RUM stores additional information](img/gin_rum.png)
26+
27+
Drawback of `RUM` is that it has slower build and insert time than `GIN`.
28+
It is because we need to store additional information besides keys and because
29+
`RUM` uses generic WAL.
930

1031
## License
1132

@@ -16,7 +37,7 @@ This module available under the same license as
1637

1738
Before build and install **rum** you should ensure following:
1839

19-
* PostgreSQL version is 9.6.
40+
* PostgreSQL version is 9.6+.
2041

2142
Typical installation procedure may look like this:
2243

@@ -27,18 +48,27 @@ Typical installation procedure may look like this:
2748
$ make USE_PGXS=1 installcheck
2849
$ psql DB -c "CREATE EXTENSION rum;"
2950

30-
## New access method and operator class
31-
32-
The **rum** module provides the access method **rum** and the operator class
33-
**rum_tsvector_ops**.
51+
## Common operators and functions
3452

35-
The module provides new operators.
53+
**rum** module provides next operators.
3654

3755
| Operator | Returns | Description
3856
| -------------------- | ------- | ----------------------------------------------
3957
| tsvector <=> tsquery | float4 | Returns distance between tsvector and tsquery.
58+
| timestamp <=> timestamp | float8 | Returns distance between two timestamps.
59+
| timestamp <=| timestamp | float8 | Returns distance only for left timestamps.
60+
| timestamp |=> timestamp | float8 | Returns distance only for right timestamps.
61+
62+
## Operator classes
4063

41-
## Examples
64+
**rum** provides next operator classes
65+
66+
### rum_tsvector_ops
67+
68+
For type: `tsvector`
69+
70+
This operator class stores `tsvector` lexemes with positional information. Supports
71+
ordering by `<=>` operator and prefix search. There is the example.
4272

4373
Let us assume we have the table:
4474

@@ -70,9 +100,9 @@ And we can execute the following queries:
70100

71101
```sql
72102
=# SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
73-
FROM test_rum
74-
WHERE a @@ to_tsquery('english', 'beautiful | place')
75-
ORDER BY a <=> to_tsquery('english', 'beautiful | place');
103+
FROM test_rum
104+
WHERE a @@ to_tsquery('english', 'beautiful | place')
105+
ORDER BY a <=> to_tsquery('english', 'beautiful | place');
76106
t | rank
77107
---------------------------------+-----------
78108
The situation is most beautiful | 0.0303964
@@ -81,16 +111,128 @@ And we can execute the following queries:
81111
(3 rows)
82112

83113
=# SELECT t, a <=> to_tsquery('english', 'place | situation') AS rank
84-
FROM test_rum
85-
WHERE a @@ to_tsquery('english', 'place | situation')
86-
ORDER BY a <=> to_tsquery('english', 'place | situation');
114+
FROM test_rum
115+
WHERE a @@ to_tsquery('english', 'place | situation')
116+
ORDER BY a <=> to_tsquery('english', 'place | situation');
87117
t | rank
88118
---------------------------------+-----------
89119
The situation is most beautiful | 0.0303964
90120
It looks like a beautiful place | 0.0303964
91121
(2 rows)
92122
```
93123

124+
### rum_tsvector_hash_ops
125+
126+
For type: `tsvector`
127+
128+
This operator class stores hash of `tsvector` lexemes with positional information.
129+
Supports ordering by `&lt;=&gt;` operator. But **doesn't** support prefix search.
130+
131+
### rum_timestamp_ops
132+
133+
For type: `timestamp`
134+
135+
Operator class provides fast search and ordering by timestamp fields. Supports
136+
ordering by `&lt;=&gt;`, `&lt;=&#124;` and `&#124;=&gt;` operators. Can be used
137+
with `rum_tsvector_timestamp_ops` operator class.
138+
139+
### rum_timestamptz_ops
140+
141+
For type: `timestamptz`
142+
143+
Operator class provides fast search and ordering by timestamptz fields. Supports
144+
ordering by `&lt;=&gt;`, `&lt;=&#124;` and `&#124;=&gt;` operators. Can be used
145+
with `rum_tsvector_timestamptz_ops` operator class.
146+
147+
### rum_tsvector_timestamp_ops
148+
149+
For type: `tsvector`
150+
151+
This operator class stores `tsvector` lexems with timestamp field. There is the example:
152+
153+
Let us assume we have the table:
154+
```sql
155+
CREATE TABLE tsts (id int, t tsvector, d timestamp);
156+
157+
\copy tsts from 'rum/data/tsts.data'
158+
159+
CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_timestamp_ops, d)
160+
WITH (attach = 'd', to = 't');
161+
```
162+
163+
Now we can execute the following queries:
164+
```sql
165+
EXPLAIN (costs off)
166+
SELECT id, d, d <=> '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5;
167+
QUERY PLAN
168+
-----------------------------------------------------------------------------------
169+
Limit
170+
-> Index Scan using tsts_idx on tsts
171+
Index Cond: (t @@ '''wr'' & ''qh'''::tsquery)
172+
Order By: (d <=> 'Mon May 16 14:21:25 2016'::timestamp without time zone)
173+
(4 rows)
174+
175+
SELECT id, d, d <=> '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5;
176+
id | d | ?column?
177+
-----+---------------------------------+---------------
178+
355 | Mon May 16 14:21:22.326724 2016 | 2.673276
179+
354 | Mon May 16 13:21:22.326724 2016 | 3602.673276
180+
371 | Tue May 17 06:21:22.326724 2016 | 57597.326724
181+
406 | Wed May 18 17:21:22.326724 2016 | 183597.326724
182+
415 | Thu May 19 02:21:22.326724 2016 | 215997.326724
183+
(5 rows)
184+
```
185+
186+
### rum_tsvector_timestamptz_ops
187+
188+
For type: `tsvector`
189+
190+
See comments for `rum_tsvector_timestamp_ops` operator class.
191+
192+
### rum_tsvector_hash_timestamp_ops
193+
194+
For type: `tsvector`
195+
196+
This operator class stores hash of `tsvector` lexems with timestamp field.
197+
**Doesn't** support prefix search.
198+
199+
### rum_tsvector_hash_timestamptz_ops
200+
201+
For type: `tsvector`
202+
203+
This operator class stores hash of `tsvector` lexems with timestamptz field.
204+
**Doesn't** support prefix search.
205+
206+
### rum_tsquery_ops
207+
208+
For type: `tsquery`
209+
210+
Stores branches of query tree in additional information. For example we have the table:
211+
```sql
212+
CREATE TABLE query (q tsquery, tag text);
213+
214+
INSERT INTO query VALUES ('supernova & star', 'sn'),
215+
('black', 'color'),
216+
('big & bang & black & hole', 'bang'),
217+
('spiral & galaxy', 'shape'),
218+
('black & hole', 'color');
219+
220+
CREATE INDEX query_idx ON query USING rum(q);
221+
```
222+
223+
Now we can execute the following fast query:
224+
```sql
225+
SELECT * FROM query
226+
WHERE to_tsvector('black holes never exists before we think about them') @@ q;
227+
q | tag
228+
------------------+-------
229+
'black' | color
230+
'black' & 'hole' | color
231+
(2 rows)
232+
233+
234+
```
235+
94236
## Authors
95237

96238
Alexander Korotkov <a.korotkov@postgrespro.ru> Postgres Professional Ltd., Russia

img/gin_rum.png

18.7 KB
Loading

0 commit comments

Comments
 (0)