Skip to content

Commit bf028fa

Browse files
committed
Add description of new features
1 parent 7e63445 commit bf028fa

File tree

3 files changed

+503
-90
lines changed

3 files changed

+503
-90
lines changed

contrib/tsearch2/docs/tsearch-V2-intro.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -427,9 +427,9 @@ <h3>INDEXING FIELDS IN A TABLE</h3>
427427
<p>We need to create the index on the column idxFTI. Keep in mind
428428
that the database will update the index when some action is taken.
429429
In this case we _need_ the index (The whole point of Full Text
430-
INDEXINGi ;-)), so don't worry about any indexing overhead. We will
431-
create an index based on the gist function. GiST is an index
432-
structure for Generalized Search Tree.</p>
430+
INDEXING ;-)), so don't worry about any indexing overhead. We will
431+
create an index based on the gist or gin function. GiST is an index
432+
structure for Generalized Search Tree, GIN is a inverted index (see <a href="tsearch2-ref.html#indexes">The tsearch2 Reference: Indexes</a>).</p>
433433
<pre>
434434
CREATE INDEX idxFTI_idx ON tblMessages USING gist(idxFTI);
435435
VACUUM FULL ANALYZE;

contrib/tsearch2/docs/tsearch2-guide.html

Lines changed: 40 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,20 @@
11
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
22
<html>
33
<head>
4-
<link type="text/css" rel="stylesheet" href="/~megera/postgres/gist/tsearch/tsearch.css">
54
<title>tsearch2 guide</title>
65
</head>
76
<body>
87
<h1 align=center>The tsearch2 Guide</h1>
98

109
<p align=center>
1110
Brandon Craig Rhodes<br>30 June 2003
11+
<br>Updated to 8.2 release by Oleg Bartunov, October 2006</br>
1212
<p>
1313
This Guide introduces the reader to the PostgreSQL tsearch2 module,
1414
version&nbsp;2.
1515
More formal descriptions of the module's types and functions
1616
are provided in the <a href="tsearch2-ref.html">tsearch2 Reference</a>,
1717
which is a companion to this document.
18-
You can retrieve a beta copy of the tsearch2 module from the
19-
<a href="http://www.sai.msu.su/~megera/postgres/gist/">GiST for PostgreSQL</a>
20-
page &mdash; look under the section entitled <i>Development History</i>
21-
for the current version.
2218
<p>
2319
First we will examine the <tt>tsvector</tt> and <tt>tsquery</tt> types
2420
and how they are used to search documents;
@@ -32,15 +28,40 @@ <h1 align=center>The tsearch2 Guide</h1>
3228
<hr>
3329
<h2>Table of Contents</h2>
3430
<blockquote>
31+
<a href="#intro">Introduction to FTS with tsearch2</a><br>
3532
<a href="#vectors_queries">Vectors and Queries</a><br>
3633
<a href="#simple_search">A Simple Search Engine</a><br>
3734
<a href="#weights">Ranking and Position Weights</a><br>
3835
<a href="#casting">Casting Vectors and Queries</a><br>
3936
<a href="#parsing_lexing">Parsing and Lexing</a><br>
37+
<a href="#ref">Additional information</a>
4038
</blockquote>
4139

4240
<hr>
4341

42+
43+
<h2><a name="intro">Introduction to FTS with tsearch2</a></h2>
44+
The purpose of FTS is to
45+
find <b>documents</b>, which satisfy <b>query</b> and optionally return
46+
them in some <b>order</b>.
47+
Most common case: Find documents containing all query terms and return them in order
48+
of their similarity to the query. Document in database can be
49+
any text attribute, or combination of text attributes from one or many tables
50+
(using joins).
51+
Text search operators existed for years, in PostgreSQL they are
52+
<tt><b>~,~*, LIKE, ILIKE</b></tt>, but they lack linguistic support,
53+
tends to be slow and have no relevance ranking. The idea behind tsearch2 is
54+
is rather simple - preprocess document at index time to save time at search stage.
55+
Preprocessing includes
56+
<ul>
57+
<li>document parsing onto words
58+
<li>linguistic - normalize words to obtain lexemes
59+
<li>store document in optimized for searching way
60+
</ul>
61+
Tsearch2, in a nutshell, provides FTS operator (contains) for two new data types,
62+
which represent document and query - <tt>tsquery @@ tsvector</tt>.
63+
64+
<P>
4465
<h2><a name=vectors_queries>Vectors and Queries</a></h2>
4566

4667
<blockquote>
@@ -79,6 +100,8 @@ <h2><a name=vectors_queries>Vectors and Queries</a></h2>
79100
on the <tt>tsvector</tt> column of a table,
80101
which implements a form of the Berkeley
81102
<a href="http://gist.cs.berkeley.edu/"><i>Generalized Search Tree</i></a>.
103+
Since PostgreSQL 8.2 tsearch2 supports <a href="http://www.sigaev.ru/gin/">Gin</a> index,
104+
which is an inverted index, commonly used in search engines. It adds scalability to tsearch2.
82105
</ul>
83106
Once your documents are indexed,
84107
performing a search involves:
@@ -251,7 +274,7 @@ <h2><a name=vectors_queries>Vectors and Queries</a></h2>
251274

252275
<pre>
253276
=# <b>SELECT to_tsquery('the')</b>
254-
NOTICE: Query contains only stopword(s) or doesn't contain lexeme(s), ignored
277+
NOTICE: Query contains only stopword(s) or doesn't contain lexem(s), ignored
255278
to_tsquery
256279
------------
257280

@@ -483,8 +506,8 @@ <h2><a name=weights>Ranking and Position Weights</a></h2>
483506
and has the feature that you can assign different weights
484507
to words from different sections of your document.
485508
The <tt>rank_cd()</tt> uses a recent technique for weighting results
486-
but does not allow different weight to be given
487-
to different sections of your document.
509+
and also allows different weight to be given
510+
to different sections of your document (since 8.2).
488511
<p>
489512
Both ranking functions allow you to specify,
490513
as an optional last argument,
@@ -511,9 +534,6 @@ <h2><a name=weights>Ranking and Position Weights</a></h2>
511534
see the <a href="tsearch2-ref.html#ranking">section on ranking</a>
512535
in the Reference.
513536
<p>
514-
The <tt>rank()</tt> function offers more flexibility
515-
because it pays attention to the <i>weights</i>
516-
with which you have labelled lexeme positions.
517537
Currently tsearch2 supports four different weight labels:
518538
<tt>'D'</tt>, the default weight;
519539
and <tt>'A'</tt>, <tt>'B'</tt>, and <tt>'C'</tt>.
@@ -730,7 +750,7 @@ <h2><a name=casting>Casting Vectors and Queries</a></h2>
730750
are important <i>both</i> to PostgreSQL when it is interpreting a string,
731751
<i>and</i> to the <tt>tsvector</tt> conversion function.
732752
You may want to review section
733-
<a href="http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=sql-syntax.html#SQL-SYNTAX-STRINGS">1.1.2.1,
753+
<a href="http://www.postgresql.org/docs/current/static/sql-syntax.html#SQL-SYNTAX-STRINGS">
734754
&ldquo;String Constants&rdquo;</a>
735755
in the PostgreSQL documentation before proceeding.
736756
<p>
@@ -1051,6 +1071,14 @@ <h2><a name=parsing_lexing>Parsing and Lexing</a></h2>
10511071
with the difference that the query parser recognizes as special
10521072
the boolean operators that separate query words.
10531073

1074+
1075+
<h2><a name="ref">Additional information</a></h2>
1076+
More information about tsearch2 is available from
1077+
<a href="http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2">tsearch2</a> page.
1078+
Also, it's worth to check
1079+
<a href="http://www.sai.msu.su/~megera/wiki/Tsearch2">tsearch2 wiki</a> pages.
1080+
1081+
10541082
</body>
10551083
</html>
10561084

0 commit comments

Comments
 (0)