Skip to content

Commit cb50ee2

Browse files
committed
Tighened up syntax checking of array input processing considerably. Junk that
was previously allowed in odd places with odd results now causes an ERROR. Also changed behavior with respect to whitespace -- trailing whitespace is now ignored as well as leading whitespace (which has always been ignored). Documentation updated to reflect change in whitespace handling. Also some refactoring to what I believe is a more sensible order of several paragraphs.
1 parent 988d84f commit cb50ee2

File tree

2 files changed

+187
-51
lines changed

2 files changed

+187
-51
lines changed

doc/src/sgml/array.sgml

Lines changed: 39 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/array.sgml,v 1.36 2004/08/05 03:29:11 joe Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/array.sgml,v 1.37 2004/08/08 05:01:51 joe Exp $ -->
22

33
<sect1 id="arrays">
44
<title>Arrays</title>
@@ -95,10 +95,12 @@ CREATE TABLE tictactoe (
9595
</synopsis>
9696
where <replaceable>delim</replaceable> is the delimiter character
9797
for the type, as recorded in its <literal>pg_type</literal> entry.
98-
(For all built-in types, this is the comma character
99-
<quote><literal>,</literal></>.) Each
100-
<replaceable>val</replaceable> is either a constant of the array
101-
element type, or a subarray. An example of an array constant is
98+
Among the standard data types provided in the
99+
<productname>PostgreSQL</productname> distribution, type
100+
<literal>box</> uses a semicolon (<literal>;</>) but all the others
101+
use comma (<literal>,</>). Each <replaceable>val</replaceable> is
102+
either a constant of the array element type, or a subarray. An example
103+
of an array constant is
102104
<programlisting>
103105
'{{1,2,3},{4,5,6},{7,8,9}}'
104106
</programlisting>
@@ -161,7 +163,7 @@ SELECT * FROM sal_emp;
161163
</para>
162164

163165
<para>
164-
The <literal>ARRAY</literal> expression syntax may also be used:
166+
The <literal>ARRAY</> constructor syntax may also be used:
165167
<programlisting>
166168
INSERT INTO sal_emp
167169
VALUES ('Bill',
@@ -176,8 +178,8 @@ INSERT INTO sal_emp
176178
Notice that the array elements are ordinary SQL constants or
177179
expressions; for instance, string literals are single quoted, instead of
178180
double quoted as they would be in an array literal. The <literal>ARRAY</>
179-
expression syntax is discussed in more detail in <xref
180-
linkend="sql-syntax-array-constructors">.
181+
constructor syntax is discussed in more detail in
182+
<xref linkend="sql-syntax-array-constructors">.
181183
</para>
182184
</sect2>
183185

@@ -524,10 +526,17 @@ SELECT * FROM sal_emp WHERE 10000 = ALL (pay_by_quarter);
524526
use comma.) In a multidimensional array, each dimension (row, plane,
525527
cube, etc.) gets its own level of curly braces, and delimiters
526528
must be written between adjacent curly-braced entities of the same level.
527-
You may write whitespace before a left brace, after a right
528-
brace, or before any individual item string. Whitespace after an item
529-
is not ignored, however: after skipping leading whitespace, everything
530-
up to the next right brace or delimiter is taken as the item value.
529+
</para>
530+
531+
<para>
532+
The array output routine will put double quotes around element values
533+
if they are empty strings or contain curly braces, delimiter characters,
534+
double quotes, backslashes, or white space. Double quotes and backslashes
535+
embedded in element values will be backslash-escaped. For numeric
536+
data types it is safe to assume that double quotes will never appear, but
537+
for textual data types one should be prepared to cope with either presence
538+
or absence of quotes. (This is a change in behavior from pre-7.2
539+
<productname>PostgreSQL</productname> releases.)
531540
</para>
532541

533542
<para>
@@ -573,26 +582,22 @@ SELECT f1[1][-2][3] AS e1, f1[1][-1][5] AS e2
573582

574583
<para>
575584
As shown previously, when writing an array value you may write double
576-
quotes around any individual array
577-
element. You <emphasis>must</> do so if the element value would otherwise
578-
confuse the array-value parser. For example, elements containing curly
579-
braces, commas (or whatever the delimiter character is), double quotes,
580-
backslashes, or leading white space must be double-quoted. To put a double
581-
quote or backslash in a quoted array element value, precede it with a
582-
backslash.
583-
Alternatively, you can use backslash-escaping to protect all data characters
584-
that would otherwise be taken as array syntax or ignorable white space.
585+
quotes around any individual array element. You <emphasis>must</> do so
586+
if the element value would otherwise confuse the array-value parser.
587+
For example, elements containing curly braces, commas (or whatever the
588+
delimiter character is), double quotes, backslashes, or leading white
589+
space must be double-quoted. To put a double quote or backslash in a
590+
quoted array element value, precede it with a backslash. Alternatively,
591+
you can use backslash-escaping to protect all data characters that would
592+
otherwise be taken as array syntax.
585593
</para>
586594

587595
<para>
588-
The array output routine will put double quotes around element values
589-
if they are empty strings or contain curly braces, delimiter characters,
590-
double quotes, backslashes, or white space. Double quotes and backslashes
591-
embedded in element values will be backslash-escaped. For numeric
592-
data types it is safe to assume that double quotes will never appear, but
593-
for textual data types one should be prepared to cope with either presence
594-
or absence of quotes. (This is a change in behavior from pre-7.2
595-
<productname>PostgreSQL</productname> releases.)
596+
You may write whitespace before a left brace or after a right
597+
brace. You may also write whitespace before or after any individual item
598+
string. In all of these cases the whitespace will be ignored. However,
599+
whitespace within double quoted elements, or surrounded on both sides by
600+
non-whitespace characters of an element, are not ignored.
596601
</para>
597602

598603
<note>
@@ -616,10 +621,11 @@ INSERT ... VALUES ('{"\\\\","\\""}');
616621

617622
<tip>
618623
<para>
619-
The <literal>ARRAY</> constructor syntax is often easier to work with
620-
than the array-literal syntax when writing array values in SQL commands.
621-
In <literal>ARRAY</>, individual element values are written the same way
622-
they would be written when not members of an array.
624+
The <literal>ARRAY</> constructor syntax (see
625+
<xref linkend="sql-syntax-array-constructors">) is often easier to work
626+
with than the array-literal syntax when writing array values in SQL
627+
commands. In <literal>ARRAY</>, individual element values are written the
628+
same way they would be written when not members of an array.
623629
</para>
624630
</tip>
625631
</sect2>

src/backend/utils/adt/arrayfuncs.c

Lines changed: 148 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
*
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/utils/adt/arrayfuncs.c,v 1.106 2004/08/05 03:29:37 joe Exp $
11+
* $PostgreSQL: pgsql/src/backend/utils/adt/arrayfuncs.c,v 1.107 2004/08/08 05:01:55 joe Exp $
1212
*
1313
*-------------------------------------------------------------------------
1414
*/
@@ -351,25 +351,40 @@ array_in(PG_FUNCTION_ARGS)
351351
* The syntax for array input is C-like nested curly braces
352352
*-----------------------------------------------------------------------------
353353
*/
354+
typedef enum
355+
{
356+
ARRAY_NO_LEVEL,
357+
ARRAY_LEVEL_STARTED,
358+
ARRAY_ELEM_STARTED,
359+
ARRAY_ELEM_COMPLETED,
360+
ARRAY_QUOTED_ELEM_STARTED,
361+
ARRAY_QUOTED_ELEM_COMPLETED,
362+
ARRAY_ELEM_DELIMITED,
363+
ARRAY_LEVEL_COMPLETED,
364+
ARRAY_LEVEL_DELIMITED
365+
} ArrayParseState;
366+
354367
static int
355368
ArrayCount(char *str, int *dim, char typdelim)
356369
{
357-
int nest_level = 0,
358-
i;
359-
int ndim = 1,
360-
temp[MAXDIM],
361-
nelems[MAXDIM],
362-
nelems_last[MAXDIM];
363-
bool scanning_string = false;
364-
bool eoArray = false;
365-
char *ptr;
370+
int nest_level = 0,
371+
i;
372+
int ndim = 1,
373+
temp[MAXDIM],
374+
nelems[MAXDIM],
375+
nelems_last[MAXDIM];
376+
bool scanning_string = false;
377+
bool eoArray = false;
378+
char *ptr;
379+
ArrayParseState parse_state = ARRAY_NO_LEVEL;
366380

367381
for (i = 0; i < MAXDIM; ++i)
368382
{
369383
temp[i] = dim[i] = 0;
370384
nelems_last[i] = nelems[i] = 1;
371385
}
372386

387+
/* special case for an empty array */
373388
if (strncmp(str, "{}", 2) == 0)
374389
return 0;
375390

@@ -389,6 +404,20 @@ ArrayCount(char *str, int *dim, char typdelim)
389404
errmsg("malformed array literal: \"%s\"", str)));
390405
break;
391406
case '\\':
407+
/*
408+
* An escape must be after a level start, after an
409+
* element start, or after an element delimiter. In any
410+
* case we now must be past an element start.
411+
*/
412+
if (parse_state != ARRAY_LEVEL_STARTED &&
413+
parse_state != ARRAY_ELEM_STARTED &&
414+
parse_state != ARRAY_QUOTED_ELEM_STARTED &&
415+
parse_state != ARRAY_ELEM_DELIMITED)
416+
ereport(ERROR,
417+
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
418+
errmsg("malformed array literal: \"%s\"", str)));
419+
if (parse_state != ARRAY_QUOTED_ELEM_STARTED)
420+
parse_state = ARRAY_ELEM_STARTED;
392421
/* skip the escaped character */
393422
if (*(ptr + 1))
394423
ptr++;
@@ -398,11 +427,38 @@ ArrayCount(char *str, int *dim, char typdelim)
398427
errmsg("malformed array literal: \"%s\"", str)));
399428
break;
400429
case '\"':
430+
/*
431+
* A quote must be after a level start, after a quoted
432+
* element start, or after an element delimiter. In any
433+
* case we now must be past an element start.
434+
*/
435+
if (parse_state != ARRAY_LEVEL_STARTED &&
436+
parse_state != ARRAY_QUOTED_ELEM_STARTED &&
437+
parse_state != ARRAY_ELEM_DELIMITED)
438+
ereport(ERROR,
439+
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
440+
errmsg("malformed array literal: \"%s\"", str)));
401441
scanning_string = !scanning_string;
442+
if (scanning_string)
443+
parse_state = ARRAY_QUOTED_ELEM_STARTED;
444+
else
445+
parse_state = ARRAY_QUOTED_ELEM_COMPLETED;
402446
break;
403447
case '{':
404448
if (!scanning_string)
405449
{
450+
/*
451+
* A left brace can occur if no nesting has
452+
* occurred yet, after a level start, or
453+
* after a level delimiter.
454+
*/
455+
if (parse_state != ARRAY_NO_LEVEL &&
456+
parse_state != ARRAY_LEVEL_STARTED &&
457+
parse_state != ARRAY_LEVEL_DELIMITED)
458+
ereport(ERROR,
459+
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
460+
errmsg("malformed array literal: \"%s\"", str)));
461+
parse_state = ARRAY_LEVEL_STARTED;
406462
if (nest_level >= MAXDIM)
407463
ereport(ERROR,
408464
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
@@ -417,6 +473,19 @@ ArrayCount(char *str, int *dim, char typdelim)
417473
case '}':
418474
if (!scanning_string)
419475
{
476+
/*
477+
* A right brace can occur after an element start,
478+
* an element completion, a quoted element completion,
479+
* or a level completion.
480+
*/
481+
if (parse_state != ARRAY_ELEM_STARTED &&
482+
parse_state != ARRAY_ELEM_COMPLETED &&
483+
parse_state != ARRAY_QUOTED_ELEM_COMPLETED &&
484+
parse_state != ARRAY_LEVEL_COMPLETED)
485+
ereport(ERROR,
486+
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
487+
errmsg("malformed array literal: \"%s\"", str)));
488+
parse_state = ARRAY_LEVEL_COMPLETED;
420489
if (nest_level == 0)
421490
ereport(ERROR,
422491
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
@@ -445,10 +514,45 @@ ArrayCount(char *str, int *dim, char typdelim)
445514
}
446515
break;
447516
default:
448-
if (*ptr == typdelim && !scanning_string)
517+
if (!scanning_string)
449518
{
450-
itemdone = true;
451-
nelems[nest_level - 1]++;
519+
if (*ptr == typdelim)
520+
{
521+
/*
522+
* Delimiters can occur after an element start,
523+
* an element completion, a quoted element
524+
* completion, or a level completion.
525+
*/
526+
if (parse_state != ARRAY_ELEM_STARTED &&
527+
parse_state != ARRAY_ELEM_COMPLETED &&
528+
parse_state != ARRAY_QUOTED_ELEM_COMPLETED &&
529+
parse_state != ARRAY_LEVEL_COMPLETED)
530+
ereport(ERROR,
531+
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
532+
errmsg("malformed array literal: \"%s\"", str)));
533+
if (parse_state == ARRAY_LEVEL_COMPLETED)
534+
parse_state = ARRAY_LEVEL_DELIMITED;
535+
else
536+
parse_state = ARRAY_ELEM_DELIMITED;
537+
itemdone = true;
538+
nelems[nest_level - 1]++;
539+
}
540+
else if (!isspace(*ptr))
541+
{
542+
/*
543+
* Other non-space characters must be after a level
544+
* start, after an element start, or after an element
545+
* delimiter. In any case we now must be past an
546+
* element start.
547+
*/
548+
if (parse_state != ARRAY_LEVEL_STARTED &&
549+
parse_state != ARRAY_ELEM_STARTED &&
550+
parse_state != ARRAY_ELEM_DELIMITED)
551+
ereport(ERROR,
552+
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
553+
errmsg("malformed array literal: \"%s\"", str)));
554+
parse_state = ARRAY_ELEM_STARTED;
555+
}
452556
}
453557
break;
454558
}
@@ -511,12 +615,15 @@ ReadArrayStr(char *arrayStr,
511615
while (!eoArray)
512616
{
513617
bool itemdone = false;
618+
bool itemquoted = false;
514619
int i = -1;
515620
char *itemstart;
621+
char *eptr;
516622

517623
/* skip leading whitespace */
518624
while (isspace((unsigned char) *ptr))
519625
ptr++;
626+
520627
itemstart = ptr;
521628

522629
while (!itemdone)
@@ -547,11 +654,15 @@ ReadArrayStr(char *arrayStr,
547654
char *cptr;
548655

549656
scanning_string = !scanning_string;
550-
/* Crunch the string on top of the quote. */
551-
for (cptr = ptr; *cptr != '\0'; cptr++)
552-
*cptr = *(cptr + 1);
553-
/* Back up to not miss following character. */
554-
ptr--;
657+
if (scanning_string)
658+
{
659+
itemquoted = true;
660+
/* Crunch the string on top of the first quote. */
661+
for (cptr = ptr; *cptr != '\0'; cptr++)
662+
*cptr = *(cptr + 1);
663+
/* Back up to not miss following character. */
664+
ptr--;
665+
}
555666
break;
556667
}
557668
case '{':
@@ -615,6 +726,25 @@ ReadArrayStr(char *arrayStr,
615726
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
616727
errmsg("malformed array literal: \"%s\"", arrayStr)));
617728

729+
/*
730+
* skip trailing whitespace
731+
*/
732+
eptr = ptr - 1;
733+
if (!itemquoted)
734+
{
735+
/* skip to last non-NULL, non-space, character */
736+
while ((*eptr == '\0') || (isspace((unsigned char) *eptr)))
737+
eptr--;
738+
*(++eptr) = '\0';
739+
}
740+
else
741+
{
742+
/* skip to last quote character */
743+
while (*eptr != '"')
744+
eptr--;
745+
*eptr = '\0';
746+
}
747+
618748
values[i] = FunctionCall3(inputproc,
619749
CStringGetDatum(itemstart),
620750
ObjectIdGetDatum(typioparam),

0 commit comments

Comments
 (0)