Skip to content

Commit 4242a71

Browse files
committed
Adjust text search documentation for recent commits.
Fix some now-obsolete statements that were overlooked in commits 6734a1c, 3dbbd0f, 028350f. Document the behavior of <0>. Also do a little bit of rearranging and copy-editing for clarity.
1 parent 8dee039 commit 4242a71

File tree

2 files changed

+58
-38
lines changed

2 files changed

+58
-38
lines changed

doc/src/sgml/datatype.sgml

Lines changed: 41 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -3885,12 +3885,12 @@ SELECT 'a:1A fat:2B,4C cat:5D'::tsvector;
38853885

38863886
<para>
38873887
It is important to understand that the
3888-
<type>tsvector</type> type itself does not perform any normalization;
3889-
it assumes the words it is given are normalized appropriately
3890-
for the application. For example,
3888+
<type>tsvector</type> type itself does not perform any word
3889+
normalization; it assumes the words it is given are normalized
3890+
appropriately for the application. For example,
38913891

38923892
<programlisting>
3893-
select 'The Fat Rats'::tsvector;
3893+
SELECT 'The Fat Rats'::tsvector;
38943894
tsvector
38953895
--------------------
38963896
'Fat' 'Rats' 'The'
@@ -3929,12 +3929,20 @@ SELECT to_tsvector('english', 'The Fat Rats');
39293929
<literal>&lt;-&gt;</> (FOLLOWED BY). There is also a variant
39303930
<literal>&lt;<replaceable>N</>&gt;</literal> of the FOLLOWED BY
39313931
operator, where <replaceable>N</> is an integer constant that
3932-
specifies a maximum distance between the two lexemes being searched
3932+
specifies the distance between the two lexemes being searched
39333933
for. <literal>&lt;-&gt;</> is equivalent to <literal>&lt;1&gt;</>.
39343934
</para>
39353935

39363936
<para>
3937-
Parentheses can be used to enforce grouping of the operators:
3937+
Parentheses can be used to enforce grouping of these operators.
3938+
In the absence of parentheses, <literal>!</> (NOT) binds most tightly,
3939+
<literal>&lt;-&gt;</literal> (FOLLOWED BY) next most tightly, then
3940+
<literal>&amp;</literal> (AND), with <literal>|</literal> (OR) binding
3941+
the least tightly.
3942+
</para>
3943+
3944+
<para>
3945+
Here are some examples:
39383946

39393947
<programlisting>
39403948
SELECT 'fat &amp; rat'::tsquery;
@@ -3951,17 +3959,21 @@ SELECT 'fat &amp; rat &amp; ! cat'::tsquery;
39513959
tsquery
39523960
------------------------
39533961
'fat' &amp; 'rat' &amp; !'cat'
3962+
3963+
SELECT '(fat | rat) &lt;-&gt; cat'::tsquery;
3964+
tsquery
3965+
-----------------------------------
3966+
'fat' &lt;-&gt; 'cat' | 'rat' &lt;-&gt; 'cat'
39543967
</programlisting>
39553968

3956-
In the absence of parentheses, <literal>!</> (NOT) binds most tightly,
3957-
and <literal>&amp;</literal> (AND) and <literal>&lt;-&gt;</literal> (FOLLOWED BY)
3958-
both bind more tightly than <literal>|</literal> (OR).
3969+
The last example demonstrates that <type>tsquery</type> sometimes
3970+
rearranges nested operators into a logically equivalent formulation.
39593971
</para>
39603972

39613973
<para>
39623974
Optionally, lexemes in a <type>tsquery</type> can be labeled with
39633975
one or more weight letters, which restricts them to match only
3964-
<type>tsvector</> lexemes with matching weights:
3976+
<type>tsvector</> lexemes with one of those weights:
39653977

39663978
<programlisting>
39673979
SELECT 'fat:ab &amp; cat'::tsquery;
@@ -3981,25 +3993,7 @@ SELECT 'super:*'::tsquery;
39813993
'super':*
39823994
</programlisting>
39833995
This query will match any word in a <type>tsvector</> that begins
3984-
with <quote>super</>. Note that prefixes are first processed by
3985-
text search configurations, which means this comparison returns
3986-
true:
3987-
<programlisting>
3988-
SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
3989-
?column?
3990-
----------
3991-
t
3992-
(1 row)
3993-
</programlisting>
3994-
because <literal>postgres</> gets stemmed to <literal>postgr</>:
3995-
<programlisting>
3996-
SELECT to_tsquery('postgres:*');
3997-
to_tsquery
3998-
------------
3999-
'postgr':*
4000-
(1 row)
4001-
</programlisting>
4002-
which then matches <literal>postgraduate</>.
3996+
with <quote>super</>.
40033997
</para>
40043998

40053999
<para>
@@ -4015,6 +4009,24 @@ SELECT to_tsquery('Fat:ab &amp; Cats');
40154009
------------------
40164010
'fat':AB &amp; 'cat'
40174011
</programlisting>
4012+
4013+
Note that <function>to_tsquery</> will process prefixes in the same way
4014+
as other words, which means this comparison returns true:
4015+
4016+
<programlisting>
4017+
SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
4018+
?column?
4019+
----------
4020+
t
4021+
</programlisting>
4022+
because <literal>postgres</> gets stemmed to <literal>postgr</>:
4023+
<programlisting>
4024+
SELECT to_tsvector( 'postgraduate' ), to_tsquery( 'postgres:*' );
4025+
to_tsvector | to_tsquery
4026+
---------------+------------
4027+
'postgradu':1 | 'postgr':*
4028+
</programlisting>
4029+
which will match the stemmed form of <literal>postgraduate</>.
40184030
</para>
40194031

40204032
</sect2>

doc/src/sgml/textsearch.sgml

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -322,8 +322,7 @@ text @@ text
322322
match. Similarly, the <literal>|</literal> (OR) operator specifies that
323323
at least one of its arguments must appear, while the <literal>!</> (NOT)
324324
operator specifies that its argument must <emphasis>not</> appear in
325-
order to have a match. Parentheses can be used to control nesting of
326-
these operators.
325+
order to have a match.
327326
</para>
328327

329328
<para>
@@ -346,10 +345,10 @@ SELECT to_tsvector('error is not fatal') @@ to_tsquery('fatal &lt;-&gt; error');
346345

347346
There is a more general version of the FOLLOWED BY operator having the
348347
form <literal>&lt;<replaceable>N</>&gt;</literal>,
349-
where <replaceable>N</> is an integer standing for the exact distance
350-
allowed between the matching lexemes. <literal>&lt;1&gt;</literal> is
348+
where <replaceable>N</> is an integer standing for the difference between
349+
the positions of the matching lexemes. <literal>&lt;1&gt;</literal> is
351350
the same as <literal>&lt;-&gt;</>, while <literal>&lt;2&gt;</literal>
352-
allows one other lexeme to appear between the matches, and so
351+
allows exactly one other lexeme to appear between the matches, and so
353352
on. The <literal>phraseto_tsquery</> function makes use of this
354353
operator to construct a <literal>tsquery</> that can match a multi-word
355354
phrase when some of the words are stop words. For example:
@@ -366,9 +365,17 @@ SELECT phraseto_tsquery('the cats ate the rats');
366365
'cat' &lt;-&gt; 'ate' &lt;2&gt; 'rat'
367366
</programlisting>
368367
</para>
368+
369+
<para>
370+
A special case that's sometimes useful is that <literal>&lt;0&gt;</literal>
371+
can be used to require that two patterns match the same word.
372+
</para>
373+
369374
<para>
370-
The precedence of tsquery operators is as follows: <literal>|</literal>, <literal>&amp;</literal>,
371-
<literal>&lt;-&gt;</literal>, <literal>!</literal>.
375+
Parentheses can be used to control nesting of the <type>tsquery</>
376+
operators. Without parentheses, <literal>|</literal> binds least tightly,
377+
then <literal>&amp;</literal>, then <literal>&lt;-&gt;</literal>,
378+
and <literal>!</literal> most tightly.
372379
</para>
373380
</sect2>
374381

@@ -1423,9 +1430,10 @@ FROM (SELECT id, body, q, ts_rank_cd(ti, q) AS rank
14231430
lacks any position or weight information. The result is usually much
14241431
smaller than an unstripped vector, but it is also less useful.
14251432
Relevance ranking does not work as well on stripped vectors as
1426-
unstripped ones. Also, when given stripped input,
1433+
unstripped ones. Also,
14271434
the <literal>&lt;-&gt;</> (FOLLOWED BY) <type>tsquery</> operator
1428-
effectively degenerates to a simple <literal>&amp;</> (AND) test.
1435+
will never match stripped input, since it cannot determine the
1436+
distance between lexeme occurrences.
14291437
</para>
14301438
</listitem>
14311439

0 commit comments

Comments
 (0)