Skip to content

Commit cd38d4a

Browse files
committed
Merge commit '1111fce351cb161ef71ed8f18bac80211466de75' into PGPROEE9_6
2 parents a5990c0 + 1111fce commit cd38d4a

38 files changed

+2509
-107
lines changed

contrib/rum/Makefile

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
# contrib/rum/Makefile
22

33
MODULE_big = rum
4-
OBJS = rumsort.o rum_ts_utils.o rumtsquery.o \
5-
rumbtree.o rumbulk.o rumdatapage.o \
6-
rumentrypage.o rumget.o ruminsert.o \
7-
rumscan.o rumutil.o rumvacuum.o rumvalidate.o \
8-
rum_timestamp.o $(WIN32RES)
4+
OBJS = src/rumsort.o src/rum_ts_utils.o src/rumtsquery.o \
5+
src/rumbtree.o src/rumbulk.o src/rumdatapage.o \
6+
src/rumentrypage.o src/rumget.o src/ruminsert.o \
7+
src/rumscan.o src/rumutil.o src/rumvacuum.o src/rumvalidate.o \
8+
src/rum_timestamp.o $(WIN32RES)
99

1010
EXTENSION = rum
1111
DATA = rum--1.0.sql
1212
PGFILEDESC = "RUM index access method"
1313

14-
REGRESS = rum ruminv timestamp orderby altorder
14+
REGRESS = rum rum_hash ruminv timestamp orderby orderby_hash altorder \
15+
altorder_hash
1516

1617
LDFLAGS_SL += $(filter -lm, $(LIBS))
1718

contrib/rum/README.md

Lines changed: 167 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,32 @@
1+
[![Postgres Professional](img/PGpro-logo.png)](https://postgrespro.com/)
2+
13
# RUM - RUM access method
24

35
## Introduction
46

5-
The **rum** module provides access method to work with RUM index. It is based
6-
on the GIN access methods code.
7+
The **rum** module provides access method to work with `RUM` index. It is based
8+
on the `GIN` access methods code.
9+
10+
`GIN` index allows to perform fast full text search using `tsvector` and
11+
`tsquery` types. But full text search with GIN index has several problems:
12+
13+
- Slow ranking. It is need position information about lexems to ranking. `GIN`
14+
index doesn't store positions of lexems. So after index scan we need additional
15+
heap scan to retreive lexems positions.
16+
- Slow phrase search with `GIN` index. This problem relates with previous
17+
problem. It is need position information to perform phrase search.
18+
- Slow ordering by timestamp. `GIN` index can't store some related information
19+
in index with lexemes. So it is necessary to perform additional heap scan.
20+
21+
`RUM` solves this problems by storing additional information in posting tree.
22+
For example, positional information of lexemes or timestamps. You can get an
23+
idea of `RUM` by the following picture:
24+
25+
![How RUM stores additional information](img/gin_rum.png)
26+
27+
Drawback of `RUM` is that it has slower build and insert time than `GIN`.
28+
It is because we need to store additional information besides keys and because
29+
`RUM` uses generic WAL.
730

831
## License
932

@@ -14,7 +37,7 @@ This module available under the same license as
1437

1538
Before build and install **rum** you should ensure following:
1639

17-
* PostgreSQL version is 9.6.
40+
* PostgreSQL version is 9.6+.
1841

1942
Typical installation procedure may look like this:
2043

@@ -25,18 +48,27 @@ Typical installation procedure may look like this:
2548
$ make USE_PGXS=1 installcheck
2649
$ psql DB -c "CREATE EXTENSION rum;"
2750

28-
## New access method and operator class
29-
30-
The **rum** module provides the access method **rum** and the operator class
31-
**rum_tsvector_ops**.
51+
## Common operators and functions
3252

33-
The module provides new operators.
53+
**rum** module provides next operators.
3454

3555
| Operator | Returns | Description
3656
| -------------------- | ------- | ----------------------------------------------
3757
| tsvector <=> tsquery | float4 | Returns distance between tsvector and tsquery.
58+
| timestamp <=> timestamp | float8 | Returns distance between two timestamps.
59+
| timestamp <=| timestamp | float8 | Returns distance only for left timestamps.
60+
| timestamp |=> timestamp | float8 | Returns distance only for right timestamps.
3861

39-
## Examples
62+
## Operator classes
63+
64+
**rum** provides next operator classes.
65+
66+
### rum_tsvector_ops
67+
68+
For type: `tsvector`
69+
70+
This operator class stores `tsvector` lexemes with positional information. Supports
71+
ordering by `<=>` operator and prefix search. There is the example.
4072

4173
Let us assume we have the table:
4274

@@ -67,28 +99,146 @@ CREATE INDEX rumidx ON test_rum USING rum (a rum_tsvector_ops);
6799
And we can execute the following queries:
68100

69101
```sql
70-
=# SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
71-
FROM test_rum
72-
WHERE a @@ to_tsquery('english', 'beautiful | place')
73-
ORDER BY a <=> to_tsquery('english', 'beautiful | place');
102+
SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
103+
FROM test_rum
104+
WHERE a @@ to_tsquery('english', 'beautiful | place')
105+
ORDER BY a <=> to_tsquery('english', 'beautiful | place');
74106
t | rank
75107
---------------------------------+-----------
76108
The situation is most beautiful | 0.0303964
77109
It is a beautiful | 0.0303964
78110
It looks like a beautiful place | 0.0607927
79111
(3 rows)
80112

81-
=# SELECT t, a <=> to_tsquery('english', 'place | situation') AS rank
82-
FROM test_rum
83-
WHERE a @@ to_tsquery('english', 'place | situation')
84-
ORDER BY a <=> to_tsquery('english', 'place | situation');
113+
SELECT t, a <=> to_tsquery('english', 'place | situation') AS rank
114+
FROM test_rum
115+
WHERE a @@ to_tsquery('english', 'place | situation')
116+
ORDER BY a <=> to_tsquery('english', 'place | situation');
85117
t | rank
86118
---------------------------------+-----------
87119
The situation is most beautiful | 0.0303964
88120
It looks like a beautiful place | 0.0303964
89121
(2 rows)
90122
```
91123

124+
### rum_tsvector_hash_ops
125+
126+
For type: `tsvector`
127+
128+
This operator class stores hash of `tsvector` lexemes with positional information.
129+
Supports ordering by `<=>` operator. But **doesn't** support prefix search.
130+
131+
### rum_timestamp_ops
132+
133+
For type: `timestamp`
134+
135+
Operator class provides fast search and ordering by timestamp fields. Supports
136+
ordering by `<=>`, `<=|` and `|=>` operators. Can be used with
137+
`rum_tsvector_timestamp_ops` operator class.
138+
139+
### rum_timestamptz_ops
140+
141+
For type: `timestamptz`
142+
143+
Operator class provides fast search and ordering by timestamptz fields. Supports
144+
ordering by `<=>`, `<=|` and `|=>` operators. Can be used with
145+
`rum_tsvector_timestamptz_ops` operator class.
146+
147+
### rum_tsvector_timestamp_ops
148+
149+
For type: `tsvector`
150+
151+
This operator class stores `tsvector` lexems with timestamp field. There is the example.
152+
153+
Let us assume we have the table:
154+
```sql
155+
CREATE TABLE tsts (id int, t tsvector, d timestamp);
156+
157+
\copy tsts from 'rum/data/tsts.data'
158+
159+
CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_timestamp_ops, d)
160+
WITH (attach = 'd', to = 't');
161+
```
162+
163+
Now we can execute the following queries:
164+
```sql
165+
EXPLAIN (costs off)
166+
SELECT id, d, d <=> '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5;
167+
QUERY PLAN
168+
-----------------------------------------------------------------------------------
169+
Limit
170+
-> Index Scan using tsts_idx on tsts
171+
Index Cond: (t @@ '''wr'' & ''qh'''::tsquery)
172+
Order By: (d <=> 'Mon May 16 14:21:25 2016'::timestamp without time zone)
173+
(4 rows)
174+
175+
SELECT id, d, d <=> '2016-05-16 14:21:25' FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5;
176+
id | d | ?column?
177+
-----+---------------------------------+---------------
178+
355 | Mon May 16 14:21:22.326724 2016 | 2.673276
179+
354 | Mon May 16 13:21:22.326724 2016 | 3602.673276
180+
371 | Tue May 17 06:21:22.326724 2016 | 57597.326724
181+
406 | Wed May 18 17:21:22.326724 2016 | 183597.326724
182+
415 | Thu May 19 02:21:22.326724 2016 | 215997.326724
183+
(5 rows)
184+
```
185+
186+
### rum_tsvector_timestamptz_ops
187+
188+
For type: `tsvector`
189+
190+
See comments for `rum_tsvector_timestamp_ops` operator class.
191+
192+
### rum_tsvector_hash_timestamp_ops
193+
194+
For type: `tsvector`
195+
196+
This operator class stores hash of `tsvector` lexems with timestamp field.
197+
**Doesn't** support prefix search.
198+
199+
### rum_tsvector_hash_timestamptz_ops
200+
201+
For type: `tsvector`
202+
203+
This operator class stores hash of `tsvector` lexems with timestamptz field.
204+
**Doesn't** support prefix search.
205+
206+
### rum_tsquery_ops
207+
208+
For type: `tsquery`
209+
210+
Stores branches of query tree in additional information. For example we have the table:
211+
```sql
212+
CREATE TABLE query (q tsquery, tag text);
213+
214+
INSERT INTO query VALUES ('supernova & star', 'sn'),
215+
('black', 'color'),
216+
('big & bang & black & hole', 'bang'),
217+
('spiral & galaxy', 'shape'),
218+
('black & hole', 'color');
219+
220+
CREATE INDEX query_idx ON query USING rum(q);
221+
```
222+
223+
Now we can execute the following fast query:
224+
```sql
225+
SELECT * FROM query
226+
WHERE to_tsvector('black holes never exists before we think about them') @@ q;
227+
q | tag
228+
------------------+-------
229+
'black' | color
230+
'black' & 'hole' | color
231+
(2 rows)
232+
```
233+
234+
## Todo
235+
236+
- Allow multiple additional information (lexemes positions + timestamp).
237+
- Add support for arrays.
238+
- Improve ranking function to support TF/IDF.
239+
- Improve insert time.
240+
- Improve GENERIC WAL to support shift (PostgreSQL core changes).
241+
92242
## Authors
93243

94244
Alexander Korotkov <a.korotkov@postgrespro.ru> Postgres Professional Ltd., Russia

contrib/rum/expected/altorder.out

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -323,3 +323,4 @@ SELECT id, d FROM atsts WHERE t @@ 'wr&qh' AND d >= '2016-05-16 14:21:25' ORDER
323323
496 | Sun May 22 11:21:22.326724 2016
324324
(8 rows)
325325

326+
DROP TABLE atsts CASCADE;

contrib/rum/expected/altorder_1.out

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -322,3 +322,4 @@ SELECT id, d FROM atsts WHERE t @@ 'wr&qh' AND d >= '2016-05-16 14:21:25' ORDER
322322
496 | Sun May 22 11:21:22.326724 2016
323323
(8 rows)
324324

325+
DROP TABLE atsts CASCADE;

0 commit comments

Comments
 (0)