@@ -5966,6 +5966,145 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
5966
5966
5967
5967
<!-- end re_syntax.n man page -->
5968
5968
5969
+ <sect3 id="posix-vs-xquery">
5970
+ <title>Differences From XQuery (<literal>LIKE_REGEX</literal>)</title>
5971
+
5972
+ <indexterm zone="posix-vs-xquery">
5973
+ <primary><literal>LIKE_REGEX</literal></primary>
5974
+ </indexterm>
5975
+
5976
+ <indexterm zone="posix-vs-xquery">
5977
+ <primary>XQuery regular expressions</primary>
5978
+ </indexterm>
5979
+
5980
+ <para>
5981
+ Since SQL:2008, the SQL standard includes
5982
+ a <literal>LIKE_REGEX</literal> operator that performs pattern
5983
+ matching according to the XQuery regular expression
5984
+ standard. <productname>PostgreSQL</productname> does not yet
5985
+ implement this operator, but you can get very similar behavior using
5986
+ the <function>regexp_match()</function> function, since XQuery
5987
+ regular expressions are quite close to the ARE syntax described above.
5988
+ </para>
5989
+
5990
+ <para>
5991
+ Notable differences between the existing POSIX-based
5992
+ regular-expression feature and XQuery regular expressions include:
5993
+
5994
+ <itemizedlist>
5995
+ <listitem>
5996
+ <para>
5997
+ XQuery character class subtraction is not supported. An example of
5998
+ this feature is using the following to match only English
5999
+ consonants: <literal>[a-z-[aeiou]]</literal>.
6000
+ </para>
6001
+ </listitem>
6002
+ <listitem>
6003
+ <para>
6004
+ XQuery character class shorthands <literal>\c</literal>,
6005
+ <literal>\C</literal>, <literal>\i</literal>,
6006
+ and <literal>\I</literal> are not supported.
6007
+ </para>
6008
+ </listitem>
6009
+ <listitem>
6010
+ <para>
6011
+ XQuery character class elements
6012
+ using <literal>\p{UnicodeProperty}</literal> or the
6013
+ inverse <literal>\P{UnicodeProperty}</literal> are not supported.
6014
+ </para>
6015
+ </listitem>
6016
+ <listitem>
6017
+ <para>
6018
+ POSIX interprets character classes such as <literal>\w</literal>
6019
+ (see <xref linkend="posix-class-shorthand-escapes-table"/>)
6020
+ according to the prevailing locale (which you can control by
6021
+ attaching a <literal>COLLATE</literal> clause to the operator or
6022
+ function). XQuery specifies these classes by reference to Unicode
6023
+ character properties, so equivalent behavior is obtained only with
6024
+ a locale that follows the Unicode rules.
6025
+ </para>
6026
+ </listitem>
6027
+ <listitem>
6028
+ <para>
6029
+ The SQL standard (not XQuery itself) attempts to cater for more
6030
+ variants of <quote>newline</quote> than POSIX does. The
6031
+ newline-sensitive matching options described above consider only
6032
+ ASCII NL (<literal>\n</literal>) to be a newline, but SQL would have
6033
+ us treat CR (<literal>\r</literal>), CRLF (<literal>\r\n</literal>)
6034
+ (a Windows-style newline), and some Unicode-only characters like
6035
+ LINE SEPARATOR (U+2028) as newlines as well.
6036
+ Notably, <literal>.</literal> and <literal>\s</literal> should
6037
+ count <literal>\r\n</literal> as one character not two according to
6038
+ SQL.
6039
+ </para>
6040
+ </listitem>
6041
+ <listitem>
6042
+ <para>
6043
+ Of the character-entry escapes described in
6044
+ <xref linkend="posix-character-entry-escapes-table"/>,
6045
+ XQuery supports only <literal>\n</literal>, <literal>\r</literal>,
6046
+ and <literal>\t</literal>.
6047
+ </para>
6048
+ </listitem>
6049
+ <listitem>
6050
+ <para>
6051
+ XQuery does not support
6052
+ the <literal>[:<replaceable>name</replaceable>:]</literal> syntax
6053
+ for character classes within bracket expressions.
6054
+ </para>
6055
+ </listitem>
6056
+ <listitem>
6057
+ <para>
6058
+ XQuery does not have lookahead or lookbehind constraints,
6059
+ nor any of the constraint escapes described in
6060
+ <xref linkend="posix-constraint-escapes-table"/>.
6061
+ </para>
6062
+ </listitem>
6063
+ <listitem>
6064
+ <para>
6065
+ The metasyntax forms described in <xref linkend="posix-metasyntax"/>
6066
+ do not exist in XQuery.
6067
+ </para>
6068
+ </listitem>
6069
+ <listitem>
6070
+ <para>
6071
+ The regular expression flag letters defined by XQuery are
6072
+ related to but not the same as the option letters for POSIX
6073
+ (<xref linkend="posix-embedded-options-table"/>). While the
6074
+ <literal>i</literal> and <literal>q</literal> options behave the
6075
+ same, others do not:
6076
+ <itemizedlist>
6077
+ <listitem>
6078
+ <para>
6079
+ XQuery's <literal>s</literal> (allow dot to match newline)
6080
+ and <literal>m</literal> (allow <literal>^</literal>
6081
+ and <literal>$</literal> to match at newlines) flags provide
6082
+ access to the same behaviors as
6083
+ POSIX's <literal>n</literal>, <literal>p</literal>
6084
+ and <literal>w</literal> flags, but they
6085
+ do <emphasis>not</emphasis> match the behavior of
6086
+ POSIX's <literal>s</literal> and <literal>m</literal> flags.
6087
+ Note in particular that dot-matches-newline is the default
6088
+ behavior in POSIX but not XQuery.
6089
+ </para>
6090
+ </listitem>
6091
+ <listitem>
6092
+ <para>
6093
+ XQuery's <literal>x</literal> (ignore whitespace in pattern) flag
6094
+ is noticeably different from POSIX's expanded-mode flag.
6095
+ POSIX's <literal>x</literal> flag also
6096
+ allows <literal>#</literal> to begin a comment in the pattern,
6097
+ and POSIX will not ignore a whitespace character after a
6098
+ backslash.
6099
+ </para>
6100
+ </listitem>
6101
+ </itemizedlist>
6102
+ </para>
6103
+ </listitem>
6104
+ </itemizedlist>
6105
+ </para>
6106
+
6107
+ </sect3>
5969
6108
</sect2>
5970
6109
</sect1>
5971
6110
@@ -11739,6 +11878,14 @@ table2-mapping
11739
11878
</programlisting>
11740
11879
</para>
11741
11880
</listitem>
11881
+
11882
+ <listitem>
11883
+ <para>
11884
+ There are minor differences in the interpretation of regular
11885
+ expression patterns used in <literal>like_regex</literal> filters, as
11886
+ described in <xref linkend="jsonpath-regular-expressions"/>.
11887
+ </para>
11888
+ </listitem>
11742
11889
</itemizedlist>
11743
11890
11744
11891
<sect3 id="strict-and-lax-modes">
@@ -11818,6 +11965,63 @@ table2-mapping
11818
11965
11819
11966
</sect3>
11820
11967
11968
+ <sect3 id="jsonpath-regular-expressions">
11969
+ <title>Regular Expressions</title>
11970
+
11971
+ <indexterm zone="jsonpath-regular-expressions">
11972
+ <primary><literal>LIKE_REGEX</literal></primary>
11973
+ <secondary>in SQL/JSON</secondary>
11974
+ </indexterm>
11975
+
11976
+ <para>
11977
+ SQL/JSON path expressions allow matching text to a regular expression
11978
+ with the <literal>like_regex</literal> filter. For example, the
11979
+ following SQL/JSON path query would case-insensitively match all
11980
+ strings in an array that start with an English vowel:
11981
+ <programlisting>
11982
+ '$[*] ? (@ like_regex "^[aeiou]" flag "i")'
11983
+ </programlisting>
11984
+ </para>
11985
+
11986
+ <para>
11987
+ The optional <literal>flag</literal> string may include one or more of
11988
+ the characters
11989
+ <literal>i</literal> for case-insensitive match,
11990
+ <literal>m</literal> to allow <literal>^</literal>
11991
+ and <literal>$</literal> to match at newlines,
11992
+ <literal>s</literal> to allow <literal>.</literal> to match a newline,
11993
+ and <literal>q</literal> to quote the whole pattern (reducing the
11994
+ behavior to a simple substring match).
11995
+ </para>
11996
+
11997
+ <para>
11998
+ The SQL/JSON standard borrows its definition for regular expressions
11999
+ from the <literal>LIKE_REGEX</literal> operator, which in turn uses the
12000
+ XQuery standard. PostgreSQL does not currently support the
12001
+ <literal>LIKE_REGEX</literal> operator. Therefore,
12002
+ the <literal>like_regex</literal> filter is implemented using the
12003
+ POSIX regular expression engine described in
12004
+ <xref linkend="functions-posix-regexp"/>. This leads to various minor
12005
+ discrepancies from standard SQL/JSON behavior, which are cataloged in
12006
+ <xref linkend="posix-vs-xquery"/>.
12007
+ Note, however, that the flag-letter incompatibilities described there
12008
+ do not apply to SQL/JSON, as it translates the XQuery flag letters to
12009
+ match what the POSIX engine expects.
12010
+ </para>
12011
+
12012
+ <para>
12013
+ Keep in mind that the pattern argument of <literal>like_regex</literal>
12014
+ is a JSON path string literal, written according to the rules given in
12015
+ <xref linkend="datatype-jsonpath"/>. This means in particular that any
12016
+ backslashes you want to use in the regular expression must be doubled.
12017
+ For example, to match strings that contain only digits:
12018
+ <programlisting>
12019
+ '$ ? (@ like_regex "^\\d+$")'
12020
+ </programlisting>
12021
+ </para>
12022
+
12023
+ </sect3>
12024
+
11821
12025
<sect3 id="functions-sqljson-path-operators">
11822
12026
<title>SQL/JSON Path Operators and Methods</title>
11823
12027
@@ -12059,10 +12263,11 @@ table2-mapping
12059
12263
<row>
12060
12264
<entry><literal>like_regex</literal></entry>
12061
12265
<entry>
12062
- Tests pattern matching with POSIX regular expressions
12063
- (see <xref linkend="functions-posix-regexp"/>). Supported flags
12064
- are <literal>i</literal>, <literal>s</literal>, <literal>m</literal>,
12065
- <literal>x</literal>, and <literal>q</literal>.</entry>
12266
+ Tests whether the first operand matches the regular expression
12267
+ given by the second operand, optionally with modifications
12268
+ described by a string of <literal>flag</literal> characters (see
12269
+ <xref linkend="jsonpath-regular-expressions"/>)
12270
+ </entry>
12066
12271
<entry><literal>["abc", "abd", "aBdC", "abdacb", "babc"]</literal></entry>
12067
12272
<entry><literal>$[*] ? (@ like_regex "^ab.*c" flag "i")</literal></entry>
12068
12273
<entry><literal>"abc", "aBdC", "abdacb"</literal></entry>
0 commit comments