Skip to content

Commit 7702337

Browse files
committed
Do not treat \. as an EOF marker in CSV mode for COPY IN.
Since backslash is (typically) not special in CSV data, we should not be treating \. as special either. The server historically did this to keep CSV and TEXT modes more alike and to support V2 protocol; but V2 protocol is long dead, and the inconsistency with CSV standards is annoying. Remove that behavior in CopyReadLineText, and make some minor consequent code simplifications. On the client side, we need to fix psql so that it does not check for \. except when reading data from STDIN (that is, the script source). We must do that regardless of TEXT/CSV mode or there is no way to end the COPY short of script EOF. Also, be careful not to send the \. to the server in that case. This is a small compatibility break in that other applications beside psql may need similar adjustment. Also, using an older version of psql with a v18 server may result in misbehavior during CSV-mode COPY IN. Daniel Vérité, reviewed by vignesh C, Robert Haas, and myself Discussion: https://postgr.es/m/ed659f37-a9dd-42a7-82b9-0da562cc4006@manitou-mail.org
1 parent a19f83f commit 7702337

File tree

9 files changed

+105
-92
lines changed

9 files changed

+105
-92
lines changed

doc/src/sgml/libpq.sgml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7381,8 +7381,9 @@ int PQputline(PGconn *conn,
73817381
<literal>\.</literal> as a final line to indicate to the server that it had
73827382
finished sending <command>COPY</command> data. While this still works, it is deprecated and the
73837383
special meaning of <literal>\.</literal> can be expected to be removed in a
7384-
future release. It is sufficient to call <xref linkend="libpq-PQendcopy"/> after
7385-
having sent the actual data.
7384+
future release. (It already will misbehave in <literal>CSV</literal>
7385+
mode.) It is sufficient to call <xref linkend="libpq-PQendcopy"/>
7386+
after having sent the actual data.
73867387
</para>
73877388
</note>
73887389
</listitem>

doc/src/sgml/protocol.sgml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7606,8 +7606,9 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
76067606
is a well-defined way to recover from errors during <command>COPY</command>. The special
76077607
<quote><literal>\.</literal></quote> last line is not needed anymore, and is not sent
76087608
during <command>COPY OUT</command>.
7609-
(It is still recognized as a terminator during <command>COPY IN</command>, but its use is
7610-
deprecated and will eventually be removed.) Binary <command>COPY</command> is supported.
7609+
(It is still recognized as a terminator during text-mode <command>COPY
7610+
IN</command>, but not in CSV mode. The text-mode behavior is
7611+
deprecated and may eventually be removed.) Binary <command>COPY</command> is supported.
76117612
The CopyInResponse and CopyOutResponse messages include fields indicating
76127613
the number of columns and the format of each column.
76137614
</para>

doc/src/sgml/ref/copy.sgml

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -646,11 +646,16 @@ COPY <replaceable class="parameter">count</replaceable>
646646
</para>
647647

648648
<para>
649-
End of data can be represented by a single line containing just
649+
End of data can be represented by a line containing just
650650
backslash-period (<literal>\.</literal>). An end-of-data marker is
651651
not necessary when reading from a file, since the end of file
652-
serves perfectly well; it is needed only when copying data to or from
653-
client applications using pre-3.0 client protocol.
652+
serves perfectly well; in that context this provision exists only for
653+
backward compatibility. However, <application>psql</application>
654+
uses <literal>\.</literal> to terminate a <literal>COPY FROM
655+
STDIN</literal> operation (that is, reading
656+
in-line <command>COPY</command> data in a SQL script). In that
657+
context the rule is needed to be able to end the operation before the
658+
end of the script.
654659
</para>
655660

656661
<para>
@@ -811,16 +816,25 @@ COPY <replaceable class="parameter">count</replaceable>
811816

812817
<para>
813818
Because backslash is not a special character in the <literal>CSV</literal>
814-
format, <literal>\.</literal>, the end-of-data marker, could also appear
815-
as a data value. To avoid any misinterpretation, a <literal>\.</literal>
816-
data value appearing as a lone entry on a line is automatically
817-
quoted on output, and on input, if quoted, is not interpreted as the
818-
end-of-data marker. If you are loading a file created by another
819-
application that has a single unquoted column and might have a
820-
value of <literal>\.</literal>, you might need to quote that value in the
821-
input file.
819+
format, the end-of-data marker used in text mode (<literal>\.</literal>)
820+
is not normally treated as special when reading <literal>CSV</literal>
821+
data. An exception is that <application>psql</application> will terminate
822+
a <literal>COPY FROM STDIN</literal> operation (that is, reading
823+
in-line <command>COPY</command> data in a SQL script) at a line containing
824+
only <literal>\.</literal>, whether it is text or <literal>CSV</literal>
825+
mode.
822826
</para>
823827

828+
<note>
829+
<para>
830+
<productname>PostgreSQL</productname> versions before v18 always
831+
recognized unquoted <literal>\.</literal> as an end-of-data marker,
832+
even when reading from a separate file. For compatibility with older
833+
versions, <command>COPY TO</command> will quote <literal>\.</literal>
834+
when it's alone on a line, even though this is no longer necessary.
835+
</para>
836+
</note>
837+
824838
<note>
825839
<para>
826840
In <literal>CSV</literal> format, all characters are significant. A quoted value

doc/src/sgml/ref/psql-ref.sgml

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1135,7 +1135,8 @@ SELECT $1 \parse stmt1
11351135

11361136
<para>
11371137
For <literal>\copy ... from stdin</literal>, data rows are read from the same
1138-
source that issued the command, continuing until <literal>\.</literal>
1138+
source that issued the command, continuing until a line containing
1139+
only <literal>\.</literal>
11391140
is read or the stream reaches <acronym>EOF</acronym>. This option is useful
11401141
for populating tables in-line within an SQL script file.
11411142
For <literal>\copy ... to stdout</literal>, output is sent to the same place
@@ -1179,10 +1180,6 @@ SELECT $1 \parse stmt1
11791180
destination, because all data must pass through the client/server
11801181
connection. For large amounts of data the <acronym>SQL</acronym>
11811182
command might be preferable.
1182-
Also, because of this pass-through method, <literal>\copy
1183-
... from</literal> in <acronym>CSV</acronym> mode will erroneously
1184-
treat a <literal>\.</literal> data value alone on a line as an
1185-
end-of-input marker.
11861183
</para>
11871184
</tip>
11881185

src/backend/commands/copyfromparse.c

Lines changed: 19 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -136,14 +136,6 @@ if (1) \
136136
} \
137137
} else ((void) 0)
138138

139-
/* Undo any read-ahead and jump out of the block. */
140-
#define NO_END_OF_COPY_GOTO \
141-
if (1) \
142-
{ \
143-
input_buf_ptr = prev_raw_ptr + 1; \
144-
goto not_end_of_copy; \
145-
} else ((void) 0)
146-
147139
/* NOTE: there's a copy of this in copyto.c */
148140
static const char BinarySignature[11] = "PGCOPY\n\377\r\n\0";
149141

@@ -1182,7 +1174,6 @@ CopyReadLineText(CopyFromState cstate)
11821174
bool result = false;
11831175

11841176
/* CSV variables */
1185-
bool first_char_in_line = true;
11861177
bool in_quote = false,
11871178
last_was_esc = false;
11881179
char quotec = '\0';
@@ -1268,12 +1259,12 @@ CopyReadLineText(CopyFromState cstate)
12681259
if (cstate->opts.csv_mode)
12691260
{
12701261
/*
1271-
* If character is '\\' or '\r', we may need to look ahead below.
1272-
* Force fetch of the next character if we don't already have it.
1273-
* We need to do this before changing CSV state, in case one of
1274-
* these characters is also the quote or escape character.
1262+
* If character is '\r', we may need to look ahead below. Force
1263+
* fetch of the next character if we don't already have it. We
1264+
* need to do this before changing CSV state, in case '\r' is also
1265+
* the quote or escape character.
12751266
*/
1276-
if (c == '\\' || c == '\r')
1267+
if (c == '\r')
12771268
{
12781269
IF_NEED_REFILL_AND_NOT_EOF_CONTINUE(0);
12791270
}
@@ -1377,10 +1368,10 @@ CopyReadLineText(CopyFromState cstate)
13771368
}
13781369

13791370
/*
1380-
* In CSV mode, we only recognize \. alone on a line. This is because
1381-
* \. is a valid CSV data value.
1371+
* Process backslash, except in CSV mode where backslash is a normal
1372+
* character.
13821373
*/
1383-
if (c == '\\' && (!cstate->opts.csv_mode || first_char_in_line))
1374+
if (c == '\\' && !cstate->opts.csv_mode)
13841375
{
13851376
char c2;
13861377

@@ -1398,12 +1389,6 @@ CopyReadLineText(CopyFromState cstate)
13981389
if (c2 == '.')
13991390
{
14001391
input_buf_ptr++; /* consume the '.' */
1401-
1402-
/*
1403-
* Note: if we loop back for more data here, it does not
1404-
* matter that the CSV state change checks are re-executed; we
1405-
* will come back here with no important state changed.
1406-
*/
14071392
if (cstate->eol_type == EOL_CRNL)
14081393
{
14091394
/* Get the next character */
@@ -1412,23 +1397,13 @@ CopyReadLineText(CopyFromState cstate)
14121397
c2 = copy_input_buf[input_buf_ptr++];
14131398

14141399
if (c2 == '\n')
1415-
{
1416-
if (!cstate->opts.csv_mode)
1417-
ereport(ERROR,
1418-
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
1419-
errmsg("end-of-copy marker does not match previous newline style")));
1420-
else
1421-
NO_END_OF_COPY_GOTO;
1422-
}
1400+
ereport(ERROR,
1401+
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
1402+
errmsg("end-of-copy marker does not match previous newline style")));
14231403
else if (c2 != '\r')
1424-
{
1425-
if (!cstate->opts.csv_mode)
1426-
ereport(ERROR,
1427-
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
1428-
errmsg("end-of-copy marker corrupt")));
1429-
else
1430-
NO_END_OF_COPY_GOTO;
1431-
}
1404+
ereport(ERROR,
1405+
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
1406+
errmsg("end-of-copy marker corrupt")));
14321407
}
14331408

14341409
/* Get the next character */
@@ -1437,14 +1412,9 @@ CopyReadLineText(CopyFromState cstate)
14371412
c2 = copy_input_buf[input_buf_ptr++];
14381413

14391414
if (c2 != '\r' && c2 != '\n')
1440-
{
1441-
if (!cstate->opts.csv_mode)
1442-
ereport(ERROR,
1443-
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
1444-
errmsg("end-of-copy marker corrupt")));
1445-
else
1446-
NO_END_OF_COPY_GOTO;
1447-
}
1415+
ereport(ERROR,
1416+
(errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
1417+
errmsg("end-of-copy marker corrupt")));
14481418

14491419
if ((cstate->eol_type == EOL_NL && c2 != '\n') ||
14501420
(cstate->eol_type == EOL_CRNL && c2 != '\n') ||
@@ -1467,31 +1437,19 @@ CopyReadLineText(CopyFromState cstate)
14671437
result = true; /* report EOF */
14681438
break;
14691439
}
1470-
else if (!cstate->opts.csv_mode)
1440+
else
14711441
{
14721442
/*
14731443
* If we are here, it means we found a backslash followed by
14741444
* something other than a period. In non-CSV mode, anything
14751445
* after a backslash is special, so we skip over that second
14761446
* character too. If we didn't do that \\. would be
14771447
* considered an eof-of copy, while in non-CSV mode it is a
1478-
* literal backslash followed by a period. In CSV mode,
1479-
* backslashes are not special, so we want to process the
1480-
* character after the backslash just like a normal character,
1481-
* so we don't increment in those cases.
1448+
* literal backslash followed by a period.
14821449
*/
14831450
input_buf_ptr++;
14841451
}
14851452
}
1486-
1487-
/*
1488-
* This label is for CSV cases where \. appears at the start of a
1489-
* line, but there is more text after it, meaning it was a data value.
1490-
* We are more strict for \. in CSV mode because \. could be a data
1491-
* value, while in non-CSV mode, \. cannot be a data value.
1492-
*/
1493-
not_end_of_copy:
1494-
first_char_in_line = false;
14951453
} /* end of outer loop */
14961454

14971455
/*

src/backend/commands/copyto.c

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1160,8 +1160,11 @@ CopyAttributeOutCSV(CopyToState cstate, const char *string,
11601160
if (!use_quote)
11611161
{
11621162
/*
1163-
* Because '\.' can be a data value, quote it if it appears alone on a
1164-
* line so it is not interpreted as the end-of-data marker.
1163+
* Quote '\.' if it appears alone on a line, so that it will not be
1164+
* interpreted as an end-of-data marker. (PG 18 and up will not
1165+
* interpret '\.' in CSV that way, except in embedded-in-SQL data; but
1166+
* we want the data to be loadable by older versions too. Also, this
1167+
* avoids breaking clients that are still using PQgetline().)
11651168
*/
11661169
if (single_attr && strcmp(ptr, "\\.") == 0)
11671170
use_quote = true;

src/bin/psql/copy.c

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -620,20 +620,29 @@ handleCopyIn(PGconn *conn, FILE *copystream, bool isbinary, PGresult **res)
620620
/* current line is done? */
621621
if (buf[buflen - 1] == '\n')
622622
{
623-
/* check for EOF marker, but not on a partial line */
624-
if (at_line_begin)
623+
/*
624+
* When at the beginning of the line and the data is
625+
* inlined, check for EOF marker. If the marker is found,
626+
* we must stop at this point. If not, the \. line can be
627+
* sent to the server, and we let it decide whether it's
628+
* an EOF or not depending on the format: in TEXT mode, \.
629+
* will be interpreted as an EOF, in CSV, it will not.
630+
*/
631+
if (at_line_begin && copystream == pset.cur_cmd_source)
625632
{
626-
/*
627-
* This code erroneously assumes '\.' on a line alone
628-
* inside a quoted CSV string terminates the \copy.
629-
* https://www.postgresql.org/message-id/E1TdNVQ-0001ju-GO@wrigleys.postgresql.org
630-
*
631-
* https://www.postgresql.org/message-id/bfcd57e4-8f23-4c3e-a5db-2571d09208e2@beta.fastmail.com
632-
*/
633633
if ((linelen == 3 && memcmp(fgresult, "\\.\n", 3) == 0) ||
634634
(linelen == 4 && memcmp(fgresult, "\\.\r\n", 4) == 0))
635635
{
636636
copydone = true;
637+
638+
/*
639+
* Remove the EOF marker from the data sent. In
640+
* CSV mode, the EOF marker must be removed,
641+
* otherwise it would be interpreted by the server
642+
* as valid data.
643+
*/
644+
*fgresult = '\0';
645+
buflen -= linelen;
637646
}
638647
}
639648

src/test/regress/expected/copy.out

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,24 @@ select * from copytest except select * from copytest2;
3232
-------+------+--------
3333
(0 rows)
3434

35+
--- test unquoted \. as data inside CSV
36+
-- do not use copy out to export the data, as it would quote \.
37+
\o :filename
38+
\qecho line1
39+
\qecho '\\.'
40+
\qecho line2
41+
\o
42+
-- get the data back in with copy
43+
truncate copytest2;
44+
copy copytest2(test) from :'filename' csv;
45+
select test from copytest2 order by test collate "C";
46+
test
47+
-------
48+
\.
49+
line1
50+
line2
51+
(3 rows)
52+
3553
-- test header line feature
3654
create temp table copytest3 (
3755
c1 int,

src/test/regress/sql/copy.sql

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,18 @@ copy copytest2 from :'filename' csv quote '''' escape E'\\';
3838

3939
select * from copytest except select * from copytest2;
4040

41+
--- test unquoted \. as data inside CSV
42+
-- do not use copy out to export the data, as it would quote \.
43+
\o :filename
44+
\qecho line1
45+
\qecho '\\.'
46+
\qecho line2
47+
\o
48+
-- get the data back in with copy
49+
truncate copytest2;
50+
copy copytest2(test) from :'filename' csv;
51+
select test from copytest2 order by test collate "C";
52+
4153

4254
-- test header line feature
4355

0 commit comments

Comments
 (0)