1
1
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2
2
< html >
3
3
< head >
4
- < link type ="text/css " rel ="stylesheet " href ="/~megera/postgres/gist/tsearch/tsearch.css ">
5
4
< title > tsearch2 guide</ title >
6
5
</ head >
7
6
< body >
8
7
< h1 align =center > The tsearch2 Guide</ h1 >
9
8
10
9
< p align =center >
11
10
Brandon Craig Rhodes< br > 30 June 2003
11
+ < br > Updated to 8.2 release by Oleg Bartunov, October 2006</ br >
12
12
< p >
13
13
This Guide introduces the reader to the PostgreSQL tsearch2 module,
14
14
version 2.
15
15
More formal descriptions of the module's types and functions
16
16
are provided in the < a href ="tsearch2-ref.html "> tsearch2 Reference</ a > ,
17
17
which is a companion to this document.
18
- You can retrieve a beta copy of the tsearch2 module from the
19
- < a href ="http://www.sai.msu.su/~megera/postgres/gist/ "> GiST for PostgreSQL</ a >
20
- page — look under the section entitled < i > Development History</ i >
21
- for the current version.
22
18
< p >
23
19
First we will examine the < tt > tsvector</ tt > and < tt > tsquery</ tt > types
24
20
and how they are used to search documents;
@@ -32,15 +28,40 @@ <h1 align=center>The tsearch2 Guide</h1>
32
28
< hr >
33
29
< h2 > Table of Contents</ h2 >
34
30
< blockquote >
31
+ < a href ="#intro "> Introduction to FTS with tsearch2</ a > < br >
35
32
< a href ="#vectors_queries "> Vectors and Queries</ a > < br >
36
33
< a href ="#simple_search "> A Simple Search Engine</ a > < br >
37
34
< a href ="#weights "> Ranking and Position Weights</ a > < br >
38
35
< a href ="#casting "> Casting Vectors and Queries</ a > < br >
39
36
< a href ="#parsing_lexing "> Parsing and Lexing</ a > < br >
37
+ < a href ="#ref "> Additional information</ a >
40
38
</ blockquote >
41
39
42
40
< hr >
43
41
42
+
43
+ < h2 > < a name ="intro "> Introduction to FTS with tsearch2</ a > </ h2 >
44
+ The purpose of FTS is to
45
+ find < b > documents</ b > , which satisfy < b > query</ b > and optionally return
46
+ them in some < b > order</ b > .
47
+ Most common case: Find documents containing all query terms and return them in order
48
+ of their similarity to the query. Document in database can be
49
+ any text attribute, or combination of text attributes from one or many tables
50
+ (using joins).
51
+ Text search operators existed for years, in PostgreSQL they are
52
+ < tt > < b > ~,~*, LIKE, ILIKE</ b > </ tt > , but they lack linguistic support,
53
+ tends to be slow and have no relevance ranking. The idea behind tsearch2 is
54
+ is rather simple - preprocess document at index time to save time at search stage.
55
+ Preprocessing includes
56
+ < ul >
57
+ < li > document parsing onto words
58
+ < li > linguistic - normalize words to obtain lexemes
59
+ < li > store document in optimized for searching way
60
+ </ ul >
61
+ Tsearch2, in a nutshell, provides FTS operator (contains) for two new data types,
62
+ which represent document and query - < tt > tsquery @@ tsvector</ tt > .
63
+
64
+ < P >
44
65
< h2 > < a name =vectors_queries > Vectors and Queries</ a > </ h2 >
45
66
46
67
< blockquote >
@@ -79,6 +100,8 @@ <h2><a name=vectors_queries>Vectors and Queries</a></h2>
79
100
on the < tt > tsvector</ tt > column of a table,
80
101
which implements a form of the Berkeley
81
102
< a href ="http://gist.cs.berkeley.edu/ "> < i > Generalized Search Tree</ i > </ a > .
103
+ Since PostgreSQL 8.2 tsearch2 supports < a href ="http://www.sigaev.ru/gin/ "> Gin</ a > index,
104
+ which is an inverted index, commonly used in search engines. It adds scalability to tsearch2.
82
105
</ ul >
83
106
Once your documents are indexed,
84
107
performing a search involves:
@@ -251,7 +274,7 @@ <h2><a name=vectors_queries>Vectors and Queries</a></h2>
251
274
252
275
< pre >
253
276
=# < b > SELECT to_tsquery('the')</ b >
254
- NOTICE: Query contains only stopword(s) or doesn't contain lexeme (s), ignored
277
+ NOTICE: Query contains only stopword(s) or doesn't contain lexem (s), ignored
255
278
to_tsquery
256
279
------------
257
280
@@ -483,8 +506,8 @@ <h2><a name=weights>Ranking and Position Weights</a></h2>
483
506
and has the feature that you can assign different weights
484
507
to words from different sections of your document.
485
508
The < tt > rank_cd()</ tt > uses a recent technique for weighting results
486
- but does not allow different weight to be given
487
- to different sections of your document.
509
+ and also allows different weight to be given
510
+ to different sections of your document (since 8.2) .
488
511
< p >
489
512
Both ranking functions allow you to specify,
490
513
as an optional last argument,
@@ -511,9 +534,6 @@ <h2><a name=weights>Ranking and Position Weights</a></h2>
511
534
see the < a href ="tsearch2-ref.html#ranking "> section on ranking</ a >
512
535
in the Reference.
513
536
< p >
514
- The < tt > rank()</ tt > function offers more flexibility
515
- because it pays attention to the < i > weights</ i >
516
- with which you have labelled lexeme positions.
517
537
Currently tsearch2 supports four different weight labels:
518
538
< tt > 'D'</ tt > , the default weight;
519
539
and < tt > 'A'</ tt > , < tt > 'B'</ tt > , and < tt > 'C'</ tt > .
@@ -730,7 +750,7 @@ <h2><a name=casting>Casting Vectors and Queries</a></h2>
730
750
are important < i > both</ i > to PostgreSQL when it is interpreting a string,
731
751
< i > and</ i > to the < tt > tsvector</ tt > conversion function.
732
752
You may want to review section
733
- < a href ="http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file= sql-syntax.html#SQL-SYNTAX-STRINGS "> 1.1.2.1,
753
+ < a href ="http://www.postgresql.org/docs/current/static/ sql-syntax.html#SQL-SYNTAX-STRINGS ">
734
754
“String Constants”</ a >
735
755
in the PostgreSQL documentation before proceeding.
736
756
< p >
@@ -1051,6 +1071,14 @@ <h2><a name=parsing_lexing>Parsing and Lexing</a></h2>
1051
1071
with the difference that the query parser recognizes as special
1052
1072
the boolean operators that separate query words.
1053
1073
1074
+
1075
+ < h2 > < a name ="ref "> Additional information</ a > </ h2 >
1076
+ More information about tsearch2 is available from
1077
+ < a href ="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 "> tsearch2</ a > page.
1078
+ Also, it's worth to check
1079
+ < a href ="http://www.sai.msu.su/~megera/wiki/Tsearch2 "> tsearch2 wiki</ a > pages.
1080
+
1081
+
1054
1082
</ body >
1055
1083
</ html >
1056
1084
0 commit comments