Skip to content

Commit 06bce4d

Browse files
committed
doc: Warn that ts_headline() output is not HTML-safe.
Add a documentation warning to ts_headline() pointing out that, when working with untrusted input documents, the output is not guaranteed to be safe for direct inclusion in web pages. This is because, while it does remove some XML tags from the input, it doesn't remove all HTML markup, and so the result may be unsafe (e.g., it might permit XSS attacks). To guard against that, all HTML markup should be removed from the input, making it plain text, or the output should be passed through an HTML sanitizer. In addition, document precisely what the default text search parser recognises as valid XML tags, since that's what determines which XML tags ts_headline() will remove. Reported-by: Richard Neill <richard.neill@telos.digital> Author: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Backpatch-through: 13
1 parent 9da548d commit 06bce4d

File tree

1 file changed

+31
-4
lines changed

1 file changed

+31
-4
lines changed

doc/src/sgml/textsearch.sgml

Lines changed: 31 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1339,7 +1339,7 @@ ts_headline(<optional> <replaceable class="parameter">config</replaceable> <type
13391339
document, to distinguish them from other excerpted words. The
13401340
default values are <quote><literal>&lt;b&gt;</literal></quote> and
13411341
<quote><literal>&lt;/b&gt;</literal></quote>, which can be suitable
1342-
for HTML output.
1342+
for HTML output (but see the warning below).
13431343
</para>
13441344
</listitem>
13451345
<listitem>
@@ -1351,6 +1351,21 @@ ts_headline(<optional> <replaceable class="parameter">config</replaceable> <type
13511351
</listitem>
13521352
</itemizedlist>
13531353

1354+
<warning>
1355+
<title>Warning: Cross-site scripting (XSS) safety</title>
1356+
<para>
1357+
The output from <function>ts_headline</function> is not guaranteed to
1358+
be safe for direct inclusion in web pages. When
1359+
<literal>HighlightAll</literal> is <literal>false</literal> (the
1360+
default), some simple XML tags are removed from the document, but this
1361+
is not guaranteed to remove all HTML markup. Therefore, this does not
1362+
provide an effective defense against attacks such as cross-site
1363+
scripting (XSS) attacks, when working with untrusted input. To guard
1364+
against such attacks, all HTML markup should be removed from the input
1365+
document, or an HTML sanitizer should be used on the output.
1366+
</para>
1367+
</warning>
1368+
13541369
These option names are recognized case-insensitively.
13551370
You must double-quote string values if they contain spaces or commas.
13561371
</para>
@@ -2218,9 +2233,21 @@ LIMIT 10;
22182233

22192234
<para>
22202235
<literal>email</literal> does not support all valid email characters as
2221-
defined by RFC 5322. Specifically, the only non-alphanumeric
2222-
characters supported for email user names are period, dash, and
2223-
underscore.
2236+
defined by <ulink url="https://datatracker.ietf.org/doc/html/rfc5322">RFC 5322</ulink>.
2237+
Specifically, the only non-alphanumeric characters supported for
2238+
email user names are period, dash, and underscore.
2239+
</para>
2240+
2241+
<para>
2242+
<literal>tag</literal> does not support all valid tag names as defined by
2243+
<ulink url="https://www.w3.org/TR/xml/">W3C Recommendation, XML</ulink>.
2244+
Specifically, the only tag names supported are those starting with an
2245+
ASCII letter, underscore, or colon, and containing only letters, digits,
2246+
hyphens, underscores, periods, and colons. <literal>tag</literal> also
2247+
includes XML comments starting with <literal>&lt;!--</literal> and ending
2248+
with <literal>--&gt;</literal>, and XML declarations (but note that this
2249+
includes anything starting with <literal>&lt;?x</literal> and ending with
2250+
<literal>&gt;</literal>).
22242251
</para>
22252252
</note>
22262253

0 commit comments

Comments
 (0)