Skip to content

Commit ccae096

Browse files
committed
Doc: fix thinko in description of how to escape a backslash in bytea.
Also clean up some discussion that had been left in a very confused state thanks to half-hearted adjustments for the change to standard_conforming_strings being the default. Discussion: https://postgr.es/m/154954987367.1297.4358910045409218@wrigleys.postgresql.org
1 parent 876fd37 commit ccae096

File tree

1 file changed

+26
-32
lines changed

1 file changed

+26
-32
lines changed

doc/src/sgml/datatype.sgml

Lines changed: 26 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1294,9 +1294,9 @@ SELECT b, char_length(b) FROM test2;
12941294
per byte, most significant nibble first. The entire string is
12951295
preceded by the sequence <literal>\x</literal> (to distinguish it
12961296
from the escape format). In some contexts, the initial backslash may
1297-
need to be escaped by doubling it, in the same cases in which backslashes
1298-
have to be doubled in escape format; details appear below.
1299-
The hexadecimal digits can
1297+
need to be escaped by doubling it
1298+
(see <xref linkend="sql-syntax-strings">).
1299+
For input, the hexadecimal digits can
13001300
be either upper or lower case, and whitespace is permitted between
13011301
digit pairs (but not within a digit pair nor in the starting
13021302
<literal>\x</literal> sequence).
@@ -1338,9 +1338,7 @@ SELECT '\xDEADBEEF';
13381338
values <emphasis>must</emphasis> be escaped, while all octet
13391339
values <emphasis>can</emphasis> be escaped. In
13401340
general, to escape an octet, convert it into its three-digit
1341-
octal value and precede it
1342-
by a backslash (or two backslashes, if writing the value as a
1343-
literal using escape string syntax).
1341+
octal value and precede it by a backslash.
13441342
Backslash itself (octet decimal value 92) can alternatively be represented by
13451343
double backslashes.
13461344
<xref linkend="datatype-binary-sqlesc">
@@ -1357,7 +1355,7 @@ SELECT '\xDEADBEEF';
13571355
<entry>Description</entry>
13581356
<entry>Escaped Input Representation</entry>
13591357
<entry>Example</entry>
1360-
<entry>Output Representation</entry>
1358+
<entry>Hex Representation</entry>
13611359
</row>
13621360
</thead>
13631361

@@ -1381,7 +1379,7 @@ SELECT '\xDEADBEEF';
13811379
<row>
13821380
<entry>92</entry>
13831381
<entry>backslash</entry>
1384-
<entry><literal>'\'</literal> or <literal>'\\134'</literal></entry>
1382+
<entry><literal>'\\'</literal> or <literal>'\134'</literal></entry>
13851383
<entry><literal>SELECT '\\'::bytea;</literal></entry>
13861384
<entry><literal>\x5c</literal></entry>
13871385
</row>
@@ -1401,39 +1399,35 @@ SELECT '\xDEADBEEF';
14011399
<para>
14021400
The requirement to escape <emphasis>non-printable</emphasis> octets
14031401
varies depending on locale settings. In some instances you can get away
1404-
with leaving them unescaped. Note that the result in each of the examples
1405-
in <xref linkend="datatype-binary-sqlesc"> was exactly one octet in
1406-
length, even though the output representation is sometimes
1407-
more than one character.
1402+
with leaving them unescaped.
14081403
</para>
14091404

14101405
<para>
1411-
The reason multiple backslashes are required, as shown
1412-
in <xref linkend="datatype-binary-sqlesc">, is that an input
1413-
string written as a string literal must pass through two parse
1414-
phases in the <productname>PostgreSQL</productname> server.
1415-
The first backslash of each pair is interpreted as an escape
1416-
character by the string-literal parser (assuming escape string
1417-
syntax is used) and is therefore consumed, leaving the second backslash of the
1418-
pair. (Dollar-quoted strings can be used to avoid this level
1419-
of escaping.) The remaining backslash is then recognized by the
1420-
<type>bytea</type> input function as starting either a three
1421-
digit octal value or escaping another backslash. For example,
1422-
a string literal passed to the server as <literal>'\001'</literal>
1423-
becomes <literal>\001</literal> after passing through the
1424-
escape string parser. The <literal>\001</literal> is then sent
1425-
to the <type>bytea</type> input function, where it is converted
1426-
to a single octet with a decimal value of 1. Note that the
1427-
single-quote character is not treated specially by <type>bytea</type>,
1428-
so it follows the normal rules for string literals. (See also
1429-
<xref linkend="sql-syntax-strings">.)
1406+
The reason that single quotes must be doubled, as shown
1407+
in <xref linkend="datatype-binary-sqlesc">, is that this
1408+
is true for any string literal in a SQL command. The generic
1409+
string-literal parser consumes the outermost single quotes
1410+
and reduces any pair of single quotes to one data character.
1411+
What the <type>bytea</type> input function sees is just one
1412+
single quote, which it treats as a plain data character.
1413+
However, the <type>bytea</type> input function treats
1414+
backslashes as special, and the other behaviors shown in
1415+
<xref linkend="datatype-binary-sqlesc"> are implemented by
1416+
that function.
1417+
</para>
1418+
1419+
<para>
1420+
In some contexts, backslashes must be doubled compared to what is
1421+
shown above, because the generic string-literal parser will also
1422+
reduce pairs of backslashes to one data character;
1423+
see <xref linkend="sql-syntax-strings">.
14301424
</para>
14311425

14321426
<para>
14331427
<type>Bytea</type> octets are output in <literal>hex</literal>
14341428
format by default. If you change <xref linkend="guc-bytea-output">
14351429
to <literal>escape</literal>,
1436-
<quote>non-printable</quote> octet are converted to
1430+
<quote>non-printable</quote> octets are converted to their
14371431
equivalent three-digit octal value and preceded by one backslash.
14381432
Most <quote>printable</quote> octets are output by their standard
14391433
representation in the client character set, e.g.:

0 commit comments

Comments
 (0)