Skip to content

Commit 68af452

Browse files
tglsfdcdmpgpro
authored andcommitted
Adjust text search documentation for recent commits.
Fix some now-obsolete statements that were overlooked in commits 6734a1c, 3dbbd0f, 028350f. Document the behavior of <0>. Also do a little bit of rearranging and copy-editing for clarity. Conflicts: doc/src/sgml/datatype.sgml doc/src/sgml/textsearch.sgml
1 parent 7cb279a commit 68af452

File tree

2 files changed

+73
-43
lines changed

2 files changed

+73
-43
lines changed

doc/src/sgml/datatype.sgml

Lines changed: 51 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -3861,12 +3861,12 @@ SELECT 'a:1A fat:2B,4C cat:5D'::tsvector;
38613861

38623862
<para>
38633863
It is important to understand that the
3864-
<type>tsvector</type> type itself does not perform any normalization;
3865-
it assumes the words it is given are normalized appropriately
3866-
for the application. For example,
3864+
<type>tsvector</type> type itself does not perform any word
3865+
normalization; it assumes the words it is given are normalized
3866+
appropriately for the application. For example,
38673867

38683868
<programlisting>
3869-
select 'The Fat Rats'::tsvector;
3869+
SELECT 'The Fat Rats'::tsvector;
38703870
tsvector
38713871
--------------------
38723872
'Fat' 'Rats' 'The'
@@ -3899,11 +3899,26 @@ SELECT to_tsvector('english', 'The Fat Rats');
38993899

39003900
<para>
39013901
A <type>tsquery</type> value stores lexemes that are to be
3902-
searched for, and combines them honoring the Boolean operators
3903-
<literal>&amp;</literal> (AND), <literal>|</literal> (OR),
3904-
<literal>!</> (NOT) and <literal>&lt;-&gt;</> (FOLLOWED BY) phrase search
3905-
operator. Parentheses can be used to enforce grouping
3906-
of the operators:
3902+
searched for, and can combine them using the Boolean operators
3903+
<literal>&amp;</literal> (AND), <literal>|</literal> (OR), and
3904+
<literal>!</> (NOT), as well as the phrase search operator
3905+
<literal>&lt;-&gt;</> (FOLLOWED BY). There is also a variant
3906+
<literal>&lt;<replaceable>N</>&gt;</literal> of the FOLLOWED BY
3907+
operator, where <replaceable>N</> is an integer constant that
3908+
specifies the distance between the two lexemes being searched
3909+
for. <literal>&lt;-&gt;</> is equivalent to <literal>&lt;1&gt;</>.
3910+
</para>
3911+
3912+
<para>
3913+
Parentheses can be used to enforce grouping of these operators.
3914+
In the absence of parentheses, <literal>!</> (NOT) binds most tightly,
3915+
<literal>&lt;-&gt;</literal> (FOLLOWED BY) next most tightly, then
3916+
<literal>&amp;</literal> (AND), with <literal>|</literal> (OR) binding
3917+
the least tightly.
3918+
</para>
3919+
3920+
<para>
3921+
Here are some examples:
39073922

39083923
<programlisting>
39093924
SELECT 'fat &amp; rat'::tsquery;
@@ -3920,17 +3935,21 @@ SELECT 'fat &amp; rat &amp; ! cat'::tsquery;
39203935
tsquery
39213936
------------------------
39223937
'fat' &amp; 'rat' &amp; !'cat'
3938+
3939+
SELECT '(fat | rat) &lt;-&gt; cat'::tsquery;
3940+
tsquery
3941+
-----------------------------------
3942+
'fat' &lt;-&gt; 'cat' | 'rat' &lt;-&gt; 'cat'
39233943
</programlisting>
39243944

3925-
In the absence of parentheses, <literal>!</> (NOT) binds most tightly,
3926-
and <literal>&amp;</literal> (AND) and <literal>&lt;-&gt;</literal> (FOLLOWED BY)
3927-
both bind more tightly than <literal>|</literal> (OR).
3945+
The last example demonstrates that <type>tsquery</type> sometimes
3946+
rearranges nested operators into a logically equivalent formulation.
39283947
</para>
39293948

39303949
<para>
39313950
Optionally, lexemes in a <type>tsquery</type> can be labeled with
39323951
one or more weight letters, which restricts them to match only
3933-
<type>tsvector</> lexemes with matching weights:
3952+
<type>tsvector</> lexemes with one of those weights:
39343953

39353954
<programlisting>
39363955
SELECT 'fat:ab &amp; cat'::tsquery;
@@ -3950,25 +3969,7 @@ SELECT 'super:*'::tsquery;
39503969
'super':*
39513970
</programlisting>
39523971
This query will match any word in a <type>tsvector</> that begins
3953-
with <quote>super</>. Note that prefixes are first processed by
3954-
text search configurations, which means this comparison returns
3955-
true:
3956-
<programlisting>
3957-
SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
3958-
?column?
3959-
----------
3960-
t
3961-
(1 row)
3962-
</programlisting>
3963-
because <literal>postgres</> gets stemmed to <literal>postgr</>:
3964-
<programlisting>
3965-
SELECT to_tsquery('postgres:*');
3966-
to_tsquery
3967-
------------
3968-
'postgr':*
3969-
(1 row)
3970-
</programlisting>
3971-
which then matches <literal>postgraduate</>.
3972+
with <quote>super</>.
39723973
</para>
39733974

39743975
<para>
@@ -3984,6 +3985,24 @@ SELECT to_tsquery('Fat:ab &amp; Cats');
39843985
------------------
39853986
'fat':AB &amp; 'cat'
39863987
</programlisting>
3988+
3989+
Note that <function>to_tsquery</> will process prefixes in the same way
3990+
as other words, which means this comparison returns true:
3991+
3992+
<programlisting>
3993+
SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
3994+
?column?
3995+
----------
3996+
t
3997+
</programlisting>
3998+
because <literal>postgres</> gets stemmed to <literal>postgr</>:
3999+
<programlisting>
4000+
SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
4001+
to_tsvector | to_tsquery
4002+
---------------+------------
4003+
'postgradu':1 | 'postgr':*
4004+
</programlisting>
4005+
which will match the stemmed form of <literal>postgraduate</>.
39874006
</para>
39884007

39894008
</sect2>

doc/src/sgml/textsearch.sgml

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -351,8 +351,7 @@ text @@ text
351351
match. Similarly, the <literal>|</literal> (OR) operator specifies that
352352
at least one of its arguments must appear, while the <literal>!</> (NOT)
353353
operator specifies that its argument must <emphasis>not</> appear in
354-
order to have a match. Parentheses can be used to control nesting of
355-
these operators.
354+
order to have a match.
356355
</para>
357356

358357
<para>
@@ -375,10 +374,10 @@ SELECT to_tsvector('error is not fatal') @@ to_tsquery('fatal &lt;-&gt; error');
375374

376375
There is a more general version of the FOLLOWED BY operator having the
377376
form <literal>&lt;<replaceable>N</>&gt;</literal>,
378-
where <replaceable>N</> is an integer standing for the exact distance
379-
allowed between the matching lexemes. <literal>&lt;1&gt;</literal> is
377+
where <replaceable>N</> is an integer standing for the difference between
378+
the positions of the matching lexemes. <literal>&lt;1&gt;</literal> is
380379
the same as <literal>&lt;-&gt;</>, while <literal>&lt;2&gt;</literal>
381-
allows one other lexeme to appear between the matches, and so
380+
allows exactly one other lexeme to appear between the matches, and so
382381
on. The <literal>phraseto_tsquery</> function makes use of this
383382
operator to construct a <literal>tsquery</> that can match a multi-word
384383
phrase when some of the words are stop words. For example:
@@ -395,9 +394,17 @@ SELECT phraseto_tsquery('the cats ate the rats');
395394
'cat' &lt;-&gt; 'ate' &lt;2&gt; 'rat'
396395
</programlisting>
397396
</para>
397+
398+
<para>
399+
A special case that's sometimes useful is that <literal>&lt;0&gt;</literal>
400+
can be used to require that two patterns match the same word.
401+
</para>
402+
398403
<para>
399-
The precedence of tsquery operators is as follows: <literal>|</literal>, <literal>&amp;</literal>,
400-
<literal>&lt;-&gt;</literal>, <literal>!</literal>.
404+
Parentheses can be used to control nesting of the <type>tsquery</>
405+
operators. Without parentheses, <literal>|</literal> binds least tightly,
406+
then <literal>&amp;</literal>, then <literal>&lt;-&gt;</literal>,
407+
and <literal>!</literal> most tightly.
401408
</para>
402409
</sect2>
403410

@@ -1455,10 +1462,14 @@ FROM (SELECT id, body, q, ts_rank_cd(ti, q) AS rank
14551462

14561463
<listitem>
14571464
<para>
1458-
Returns a vector which lists the same lexemes as the given vector, but
1459-
which lacks any position or weight information. While the returned
1460-
vector is much less useful than an unstripped vector for relevance
1461-
ranking, it will usually be much smaller.
1465+
Returns a vector that lists the same lexemes as the given vector, but
1466+
lacks any position or weight information. The result is usually much
1467+
smaller than an unstripped vector, but it is also less useful.
1468+
Relevance ranking does not work as well on stripped vectors as
1469+
unstripped ones. Also,
1470+
the <literal>&lt;-&gt;</> (FOLLOWED BY) <type>tsquery</> operator
1471+
will never match stripped input, since it cannot determine the
1472+
distance between lexeme occurrences.
14621473
</para>
14631474
</listitem>
14641475

0 commit comments

Comments
 (0)