Skip to content

Commit 2971180

Browse files
committed
Doc: improve documentation about ts_headline() function.
Now that I've had my nose in that code, I thought the docs about it left something to be desired.
1 parent 5f7247b commit 2971180

File tree

1 file changed

+58
-47
lines changed

1 file changed

+58
-47
lines changed

doc/src/sgml/textsearch.sgml

Lines changed: 58 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -1221,63 +1221,75 @@ ts_headline(<optional> <replaceable class="PARAMETER">config</replaceable> <type
12211221
<itemizedlist spacing="compact" mark="bullet">
12221222
<listitem>
12231223
<para>
1224-
<literal>StartSel</>, <literal>StopSel</literal>: the strings with
1225-
which to delimit query words appearing in the document, to distinguish
1226-
them from other excerpted words. You must double-quote these strings
1227-
if they contain spaces or commas.
1224+
<literal>MaxWords</literal>, <literal>MinWords</literal> (integers):
1225+
these numbers determine the longest and shortest headlines to output.
1226+
The default values are 35 and 15.
12281227
</para>
12291228
</listitem>
12301229
<listitem>
12311230
<para>
1232-
<literal>MaxWords</>, <literal>MinWords</literal>: these numbers
1233-
determine the longest and shortest headlines to output.
1231+
<literal>ShortWord</literal> (integer): words of this length or less
1232+
will be dropped at the start and end of a headline, unless they are
1233+
query terms. The default value of three eliminates common English
1234+
articles.
12341235
</para>
12351236
</listitem>
12361237
<listitem>
12371238
<para>
1238-
<literal>ShortWord</literal>: words of this length or less will be
1239-
dropped at the start and end of a headline. The default
1240-
value of three eliminates common English articles.
1239+
<literal>HighlightAll</literal> (boolean): if
1240+
<literal>true</literal> the whole document will be used as the
1241+
headline, ignoring the preceding three parameters. The default
1242+
is <literal>false</literal>.
12411243
</para>
12421244
</listitem>
12431245
<listitem>
12441246
<para>
1245-
<literal>HighlightAll</literal>: Boolean flag; if
1246-
<literal>true</literal> the whole document will be used as the
1247-
headline, ignoring the preceding three parameters.
1247+
<literal>MaxFragments</literal> (integer): maximum number of text
1248+
fragments to display. The default value of zero selects a
1249+
non-fragment-based headline generation method. A value greater
1250+
than zero selects fragment-based headline generation (see below).
12481251
</para>
12491252
</listitem>
12501253
<listitem>
12511254
<para>
1252-
<literal>MaxFragments</literal>: maximum number of text excerpts
1253-
or fragments to display. The default value of zero selects a
1254-
non-fragment-oriented headline generation method. A value greater than
1255-
zero selects fragment-based headline generation. This method
1256-
finds text fragments with as many query words as possible and
1257-
stretches those fragments around the query words. As a result
1258-
query words are close to the middle of each fragment and have words on
1259-
each side. Each fragment will be of at most <literal>MaxWords</> and
1260-
words of length <literal>ShortWord</> or less are dropped at the start
1261-
and end of each fragment. If not all query words are found in the
1262-
document, then a single fragment of the first <literal>MinWords</>
1263-
in the document will be displayed.
1255+
<literal>StartSel</literal>, <literal>StopSel</literal> (strings):
1256+
the strings with which to delimit query words appearing in the
1257+
document, to distinguish them from other excerpted words. The
1258+
default values are <quote><literal>&lt;b&gt;</literal></quote> and
1259+
<quote><literal>&lt;/b&gt;</literal></quote>, which can be suitable
1260+
for HTML output.
12641261
</para>
12651262
</listitem>
12661263
<listitem>
12671264
<para>
1268-
<literal>FragmentDelimiter</literal>: When more than one fragment is
1269-
displayed, the fragments will be separated by this string.
1265+
<literal>FragmentDelimiter</literal> (string): When more than one
1266+
fragment is displayed, the fragments will be separated by this string.
1267+
The default is <quote><literal> ... </literal></quote>.
12701268
</para>
12711269
</listitem>
12721270
</itemizedlist>
12731271

1274-
Any unspecified options receive these defaults:
1272+
These option names are recognized case-insensitively.
1273+
You must double-quote string values if they contain spaces or commas.
1274+
</para>
12751275

1276-
<programlisting>
1277-
StartSel=&lt;b&gt;, StopSel=&lt;/b&gt;,
1278-
MaxWords=35, MinWords=15, ShortWord=3, HighlightAll=FALSE,
1279-
MaxFragments=0, FragmentDelimiter=" ... "
1280-
</programlisting>
1276+
<para>
1277+
In non-fragment-based headline
1278+
generation, <function>ts_headline</function> locates matches for the
1279+
given <replaceable class="parameter">query</replaceable> and chooses a
1280+
single one to display, preferring matches that have more query words
1281+
within the allowed headline length.
1282+
In fragment-based headline generation, <function>ts_headline</function>
1283+
locates the query matches and splits each match
1284+
into <quote>fragments</quote> of no more than <literal>MaxWords</literal>
1285+
words each, preferring fragments with more query words, and when
1286+
possible <quote>stretching</quote> fragments to include surrounding
1287+
words. The fragment-based mode is thus more useful when the query
1288+
matches span large sections of the document, or when it's desirable to
1289+
display multiple matches.
1290+
In either mode, if no query matches can be identified, then a single
1291+
fragment of the first <literal>MinWords</literal> words in the document
1292+
will be displayed.
12811293
</para>
12821294

12831295
<para>
@@ -1289,25 +1301,24 @@ SELECT ts_headline('english',
12891301
is to find all documents containing given query terms
12901302
and return them in order of their similarity to the
12911303
query.',
1292-
to_tsquery('query &amp; similarity'));
1293-
ts_headline
1304+
to_tsquery('english', 'query &amp; similarity'));
1305+
ts_headline
12941306
------------------------------------------------------------
1295-
containing given &lt;b&gt;query&lt;/b&gt; terms
1296-
and return them in order of their &lt;b&gt;similarity&lt;/b&gt; to the
1307+
containing given &lt;b&gt;query&lt;/b&gt; terms +
1308+
and return them in order of their &lt;b&gt;similarity&lt;/b&gt; to the+
12971309
&lt;b&gt;query&lt;/b&gt;.
12981310

12991311
SELECT ts_headline('english',
1300-
'The most common type of search
1301-
is to find all documents containing given query terms
1302-
and return them in order of their similarity to the
1303-
query.',
1304-
to_tsquery('query &amp; similarity'),
1305-
'StartSel = &lt;, StopSel = &gt;');
1306-
ts_headline
1307-
-------------------------------------------------------
1308-
containing given &lt;query&gt; terms
1309-
and return them in order of their &lt;similarity&gt; to the
1310-
&lt;query&gt;.
1312+
'Search terms may occur
1313+
many times in a document,
1314+
requiring ranking of the search matches to decide which
1315+
occurrences to display in the result.',
1316+
to_tsquery('english', 'search &amp; term'),
1317+
'MaxFragments=10, MaxWords=7, MinWords=3, StartSel=&lt;&lt;, StopSel=&gt;&gt;');
1318+
ts_headline
1319+
------------------------------------------------------------
1320+
&lt;&lt;Search&gt;&gt; &lt;&lt;terms&gt;&gt; may occur +
1321+
many times ... ranking of the &lt;&lt;search&gt;&gt; matches to decide
13111322
</screen>
13121323
</para>
13131324

0 commit comments

Comments
 (0)