@@ -5970,6 +5970,145 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
5970
5970
5971
5971
<!-- end re_syntax.n man page -->
5972
5972
5973
+ <sect3 id="posix-vs-xquery">
5974
+ <title>Differences From XQuery (<literal>LIKE_REGEX</literal>)</title>
5975
+
5976
+ <indexterm zone="posix-vs-xquery">
5977
+ <primary><literal>LIKE_REGEX</literal></primary>
5978
+ </indexterm>
5979
+
5980
+ <indexterm zone="posix-vs-xquery">
5981
+ <primary>XQuery regular expressions</primary>
5982
+ </indexterm>
5983
+
5984
+ <para>
5985
+ Since SQL:2008, the SQL standard includes
5986
+ a <literal>LIKE_REGEX</literal> operator that performs pattern
5987
+ matching according to the XQuery regular expression
5988
+ standard. <productname>PostgreSQL</productname> does not yet
5989
+ implement this operator, but you can get very similar behavior using
5990
+ the <function>regexp_match()</function> function, since XQuery
5991
+ regular expressions are quite close to the ARE syntax described above.
5992
+ </para>
5993
+
5994
+ <para>
5995
+ Notable differences between the existing POSIX-based
5996
+ regular-expression feature and XQuery regular expressions include:
5997
+
5998
+ <itemizedlist>
5999
+ <listitem>
6000
+ <para>
6001
+ XQuery character class subtraction is not supported. An example of
6002
+ this feature is using the following to match only English
6003
+ consonants: <literal>[a-z-[aeiou]]</literal>.
6004
+ </para>
6005
+ </listitem>
6006
+ <listitem>
6007
+ <para>
6008
+ XQuery character class shorthands <literal>\c</literal>,
6009
+ <literal>\C</literal>, <literal>\i</literal>,
6010
+ and <literal>\I</literal> are not supported.
6011
+ </para>
6012
+ </listitem>
6013
+ <listitem>
6014
+ <para>
6015
+ XQuery character class elements
6016
+ using <literal>\p{UnicodeProperty}</literal> or the
6017
+ inverse <literal>\P{UnicodeProperty}</literal> are not supported.
6018
+ </para>
6019
+ </listitem>
6020
+ <listitem>
6021
+ <para>
6022
+ POSIX interprets character classes such as <literal>\w</literal>
6023
+ (see <xref linkend="posix-class-shorthand-escapes-table"/>)
6024
+ according to the prevailing locale (which you can control by
6025
+ attaching a <literal>COLLATE</literal> clause to the operator or
6026
+ function). XQuery specifies these classes by reference to Unicode
6027
+ character properties, so equivalent behavior is obtained only with
6028
+ a locale that follows the Unicode rules.
6029
+ </para>
6030
+ </listitem>
6031
+ <listitem>
6032
+ <para>
6033
+ The SQL standard (not XQuery itself) attempts to cater for more
6034
+ variants of <quote>newline</quote> than POSIX does. The
6035
+ newline-sensitive matching options described above consider only
6036
+ ASCII NL (<literal>\n</literal>) to be a newline, but SQL would have
6037
+ us treat CR (<literal>\r</literal>), CRLF (<literal>\r\n</literal>)
6038
+ (a Windows-style newline), and some Unicode-only characters like
6039
+ LINE SEPARATOR (U+2028) as newlines as well.
6040
+ Notably, <literal>.</literal> and <literal>\s</literal> should
6041
+ count <literal>\r\n</literal> as one character not two according to
6042
+ SQL.
6043
+ </para>
6044
+ </listitem>
6045
+ <listitem>
6046
+ <para>
6047
+ Of the character-entry escapes described in
6048
+ <xref linkend="posix-character-entry-escapes-table"/>,
6049
+ XQuery supports only <literal>\n</literal>, <literal>\r</literal>,
6050
+ and <literal>\t</literal>.
6051
+ </para>
6052
+ </listitem>
6053
+ <listitem>
6054
+ <para>
6055
+ XQuery does not support
6056
+ the <literal>[:<replaceable>name</replaceable>:]</literal> syntax
6057
+ for character classes within bracket expressions.
6058
+ </para>
6059
+ </listitem>
6060
+ <listitem>
6061
+ <para>
6062
+ XQuery does not have lookahead or lookbehind constraints,
6063
+ nor any of the constraint escapes described in
6064
+ <xref linkend="posix-constraint-escapes-table"/>.
6065
+ </para>
6066
+ </listitem>
6067
+ <listitem>
6068
+ <para>
6069
+ The metasyntax forms described in <xref linkend="posix-metasyntax"/>
6070
+ do not exist in XQuery.
6071
+ </para>
6072
+ </listitem>
6073
+ <listitem>
6074
+ <para>
6075
+ The regular expression flag letters defined by XQuery are
6076
+ related to but not the same as the option letters for POSIX
6077
+ (<xref linkend="posix-embedded-options-table"/>). While the
6078
+ <literal>i</literal> and <literal>q</literal> options behave the
6079
+ same, others do not:
6080
+ <itemizedlist>
6081
+ <listitem>
6082
+ <para>
6083
+ XQuery's <literal>s</literal> (allow dot to match newline)
6084
+ and <literal>m</literal> (allow <literal>^</literal>
6085
+ and <literal>$</literal> to match at newlines) flags provide
6086
+ access to the same behaviors as
6087
+ POSIX's <literal>n</literal>, <literal>p</literal>
6088
+ and <literal>w</literal> flags, but they
6089
+ do <emphasis>not</emphasis> match the behavior of
6090
+ POSIX's <literal>s</literal> and <literal>m</literal> flags.
6091
+ Note in particular that dot-matches-newline is the default
6092
+ behavior in POSIX but not XQuery.
6093
+ </para>
6094
+ </listitem>
6095
+ <listitem>
6096
+ <para>
6097
+ XQuery's <literal>x</literal> (ignore whitespace in pattern) flag
6098
+ is noticeably different from POSIX's expanded-mode flag.
6099
+ POSIX's <literal>x</literal> flag also
6100
+ allows <literal>#</literal> to begin a comment in the pattern,
6101
+ and POSIX will not ignore a whitespace character after a
6102
+ backslash.
6103
+ </para>
6104
+ </listitem>
6105
+ </itemizedlist>
6106
+ </para>
6107
+ </listitem>
6108
+ </itemizedlist>
6109
+ </para>
6110
+
6111
+ </sect3>
5973
6112
</sect2>
5974
6113
</sect1>
5975
6114
@@ -11793,6 +11932,14 @@ table2-mapping
11793
11932
</programlisting>
11794
11933
</para>
11795
11934
</listitem>
11935
+
11936
+ <listitem>
11937
+ <para>
11938
+ There are minor differences in the interpretation of regular
11939
+ expression patterns used in <literal>like_regex</literal> filters, as
11940
+ described in <xref linkend="jsonpath-regular-expressions"/>.
11941
+ </para>
11942
+ </listitem>
11796
11943
</itemizedlist>
11797
11944
11798
11945
<sect3 id="strict-and-lax-modes">
@@ -11872,6 +12019,63 @@ table2-mapping
11872
12019
11873
12020
</sect3>
11874
12021
12022
+ <sect3 id="jsonpath-regular-expressions">
12023
+ <title>Regular Expressions</title>
12024
+
12025
+ <indexterm zone="jsonpath-regular-expressions">
12026
+ <primary><literal>LIKE_REGEX</literal></primary>
12027
+ <secondary>in SQL/JSON</secondary>
12028
+ </indexterm>
12029
+
12030
+ <para>
12031
+ SQL/JSON path expressions allow matching text to a regular expression
12032
+ with the <literal>like_regex</literal> filter. For example, the
12033
+ following SQL/JSON path query would case-insensitively match all
12034
+ strings in an array that start with an English vowel:
12035
+ <programlisting>
12036
+ '$[*] ? (@ like_regex "^[aeiou]" flag "i")'
12037
+ </programlisting>
12038
+ </para>
12039
+
12040
+ <para>
12041
+ The optional <literal>flag</literal> string may include one or more of
12042
+ the characters
12043
+ <literal>i</literal> for case-insensitive match,
12044
+ <literal>m</literal> to allow <literal>^</literal>
12045
+ and <literal>$</literal> to match at newlines,
12046
+ <literal>s</literal> to allow <literal>.</literal> to match a newline,
12047
+ and <literal>q</literal> to quote the whole pattern (reducing the
12048
+ behavior to a simple substring match).
12049
+ </para>
12050
+
12051
+ <para>
12052
+ The SQL/JSON standard borrows its definition for regular expressions
12053
+ from the <literal>LIKE_REGEX</literal> operator, which in turn uses the
12054
+ XQuery standard. PostgreSQL does not currently support the
12055
+ <literal>LIKE_REGEX</literal> operator. Therefore,
12056
+ the <literal>like_regex</literal> filter is implemented using the
12057
+ POSIX regular expression engine described in
12058
+ <xref linkend="functions-posix-regexp"/>. This leads to various minor
12059
+ discrepancies from standard SQL/JSON behavior, which are cataloged in
12060
+ <xref linkend="posix-vs-xquery"/>.
12061
+ Note, however, that the flag-letter incompatibilities described there
12062
+ do not apply to SQL/JSON, as it translates the XQuery flag letters to
12063
+ match what the POSIX engine expects.
12064
+ </para>
12065
+
12066
+ <para>
12067
+ Keep in mind that the pattern argument of <literal>like_regex</literal>
12068
+ is a JSON path string literal, written according to the rules given in
12069
+ <xref linkend="datatype-jsonpath"/>. This means in particular that any
12070
+ backslashes you want to use in the regular expression must be doubled.
12071
+ For example, to match strings that contain only digits:
12072
+ <programlisting>
12073
+ '$ ? (@ like_regex "^\\d+$")'
12074
+ </programlisting>
12075
+ </para>
12076
+
12077
+ </sect3>
12078
+
11875
12079
<sect3 id="functions-sqljson-path-operators">
11876
12080
<title>SQL/JSON Path Operators and Methods</title>
11877
12081
@@ -12113,10 +12317,11 @@ table2-mapping
12113
12317
<row>
12114
12318
<entry><literal>like_regex</literal></entry>
12115
12319
<entry>
12116
- Tests pattern matching with POSIX regular expressions
12117
- (see <xref linkend="functions-posix-regexp"/>). Supported flags
12118
- are <literal>i</literal>, <literal>s</literal>, <literal>m</literal>,
12119
- <literal>x</literal>, and <literal>q</literal>.</entry>
12320
+ Tests whether the first operand matches the regular expression
12321
+ given by the second operand, optionally with modifications
12322
+ described by a string of <literal>flag</literal> characters (see
12323
+ <xref linkend="jsonpath-regular-expressions"/>)
12324
+ </entry>
12120
12325
<entry><literal>["abc", "abd", "aBdC", "abdacb", "babc"]</literal></entry>
12121
12326
<entry><literal>$[*] ? (@ like_regex "^ab.*c" flag "i")</literal></entry>
12122
12327
<entry><literal>"abc", "aBdC", "abdacb"</literal></entry>
0 commit comments