Skip to content

Commit a57695d

Browse files
committed
Fix detection of unfinished Unicode surrogate pair at end of string.
The U&'...' and U&"..." syntaxes silently discarded a surrogate pair start (that is, a code between U+D800 and U+DBFF) if it occurred at the very end of the string. This seems like an obvious oversight, since we throw an error for every other invalid combination of surrogate characters, including the very same situation in E'...' syntax. This has been wrong since the pair processing was added (in 9.0), so back-patch to all supported branches. Discussion: https://postgr.es/m/19113.1482337898@sss.pgh.pa.us
1 parent 1f2cfd2 commit a57695d

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

src/backend/parser/scan.l

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1395,7 +1395,15 @@ litbuf_udeescape(unsigned char escape, core_yyscan_t yyscanner)
13951395
}
13961396
}
13971397

1398+
/* unfinished surrogate pair? */
1399+
if (pair_first)
1400+
{
1401+
ADVANCE_YYLLOC(in - litbuf + 3); /* 3 for U&" */
1402+
yyerror("invalid Unicode surrogate pair");
1403+
}
1404+
13981405
*out = '\0';
1406+
13991407
/*
14001408
* We could skip pg_verifymbstr if we didn't process any non-7-bit-ASCII
14011409
* codes; but it's probably not worth the trouble, since this isn't

0 commit comments

Comments
 (0)