Skip to content

Commit 9eb78be

Browse files
author
Neil Conway
committed
Add three new regexp functions: regexp_matches, regexp_split_to_array,
and regexp_split_to_table. These functions provide access to the capture groups resulting from a POSIX regular expression match, and provide the ability to split a string on a POSIX regular expression, respectively. Patch from Jeremy Drake; code review by Neil Conway, additional comments and suggestions from Tom and Peter E. This patch bumps the catversion, adds some regression tests, and updates the docs.
1 parent 5e96b04 commit 9eb78be

File tree

7 files changed

+1001
-70
lines changed

7 files changed

+1001
-70
lines changed

doc/src/sgml/func.sgml

Lines changed: 176 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.369 2007/02/20 19:59:04 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.370 2007/03/20 05:44:59 neilc Exp $ -->
22

33
<chapter id="functions">
44
<title>Functions and Operators</title>
@@ -1468,17 +1468,52 @@
14681468
</row>
14691469

14701470
<row>
1471-
<entry><literal><function>regexp_replace</function>(<parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [,<parameter>flags</parameter> <type>text</type>])</literal></entry>
1471+
<entry><literal><function>regexp_matches</function>(<parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type>])</literal></entry>
1472+
<entry><type>setof text[]</type></entry>
1473+
<entry>
1474+
Return all capture groups resulting from matching POSIX regular
1475+
expression against the <parameter>string</parameter>. See
1476+
<xref linkend="functions-posix-regexp"> for more information.
1477+
</entry>
1478+
<entry><literal>regexp_matches('foobarbequebaz', '(bar)(beque)')</literal></entry>
1479+
<entry><literal>{bar,beque}</literal></entry>
1480+
</row>
1481+
1482+
<row>
1483+
<entry><literal><function>regexp_replace</function>(<parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type>])</literal></entry>
14721484
<entry><type>text</type></entry>
14731485
<entry>
14741486
Replace substring matching POSIX regular expression. See
1475-
<xref linkend="functions-matching"> for more information on pattern
1476-
matching.
1487+
<xref linkend="functions-posix-regexp"> for more information.
14771488
</entry>
14781489
<entry><literal>regexp_replace('Thomas', '.[mN]a.', 'M')</literal></entry>
14791490
<entry><literal>ThM</literal></entry>
14801491
</row>
14811492

1493+
<row>
1494+
<entry><literal><function>regexp_split_to_array</function>(<parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ])</literal></entry>
1495+
<entry><type>text[]</type></entry>
1496+
<entry>
1497+
Split <parameter>string</parameter> using POSIX regular expression as
1498+
the delimiter. See <xref linkend="functions-posix-regexp"> for more
1499+
information.
1500+
</entry>
1501+
<entry><literal>regexp_split_to_array('hello world', E'\\s+')</literal></entry>
1502+
<entry><literal>{hello,world}</literal></entry>
1503+
</row>
1504+
1505+
<row>
1506+
<entry><literal><function>regexp_split_to_table</function>(<parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type>])</literal></entry>
1507+
<entry><type>setof text</type></entry>
1508+
<entry>
1509+
Split <parameter>string</parameter> using POSIX regular expression as
1510+
the delimiter. See <xref linkend="functions-posix-regexp"> for more
1511+
information.
1512+
</entry>
1513+
<entry><literal>regexp_split_to_table('hello world', E'\\s+')</literal></entry>
1514+
<entry><literal>hello</literal><para><literal>world</literal></para> (2 rows)</entry>
1515+
</row>
1516+
14821517
<row>
14831518
<entry><literal><function>repeat</function>(<parameter>string</parameter> <type>text</type>, <parameter>number</parameter> <type>int</type>)</literal></entry>
14841519
<entry><type>text</type></entry>
@@ -2883,9 +2918,6 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation>
28832918
<indexterm>
28842919
<primary>substring</primary>
28852920
</indexterm>
2886-
<indexterm>
2887-
<primary>regexp_replace</primary>
2888-
</indexterm>
28892921

28902922
<synopsis>
28912923
<replaceable>string</replaceable> SIMILAR TO <replaceable>pattern</replaceable> <optional>ESCAPE <replaceable>escape-character</replaceable></optional>
@@ -3004,6 +3036,21 @@ substring('foobar' from '#"o_b#"%' for '#') <lineannotation>NULL</lineannotat
30043036
<primary>regular expression</primary>
30053037
<seealso>pattern matching</seealso>
30063038
</indexterm>
3039+
<indexterm>
3040+
<primary>substring</primary>
3041+
</indexterm>
3042+
<indexterm>
3043+
<primary>regexp_replace</primary>
3044+
</indexterm>
3045+
<indexterm>
3046+
<primary>regexp_matches</primary>
3047+
</indexterm>
3048+
<indexterm>
3049+
<primary>regexp_split_to_table</primary>
3050+
</indexterm>
3051+
<indexterm>
3052+
<primary>regexp_split_to_array</primary>
3053+
</indexterm>
30073054

30083055
<para>
30093056
<xref linkend="functions-posix-table"> lists the available
@@ -3134,7 +3181,10 @@ substring('foobar' from 'o(.)b') <lineannotation>o</lineannotation>
31343181
string containing zero or more single-letter flags that change the
31353182
function's behavior. Flag <literal>i</> specifies case-insensitive
31363183
matching, while flag <literal>g</> specifies replacement of each matching
3137-
substring rather than only the first one.
3184+
substring rather than only the first one. Other supported flags are
3185+
<literal>m</>, <literal>n</>, <literal>p</>, <literal>w</> and
3186+
<literal>x</>, whose meanings correspond to those shown in
3187+
<xref linkend="posix-embedded-options-table">.
31383188
</para>
31393189

31403190
<para>
@@ -3149,6 +3199,124 @@ regexp_replace('foobarbaz', 'b(..)', E'X\\1Y', 'g')
31493199
</programlisting>
31503200
</para>
31513201

3202+
<para>
3203+
The <function>regexp_matches</> function returns all of the capture
3204+
groups resulting from matching a POSIX regular expression pattern.
3205+
It has the syntax
3206+
<function>regexp_matches</function>(<replaceable>string</>, <replaceable>pattern</>
3207+
<optional>, <replaceable>flags</> </optional>).
3208+
If there is no match to the <replaceable>pattern</>, the function returns no rows.
3209+
If there is a match, the function returns the contents of all of the capture groups
3210+
in a text array, or if there were no capture groups in the pattern, it returns the
3211+
contents of the entire match as a single-element text array.
3212+
The <replaceable>flags</> parameter is an optional text
3213+
string containing zero or more single-letter flags that change the
3214+
function's behavior. Flag <literal>i</> specifies case-insensitive
3215+
matching, while flag <literal>g</> causes the return of each matching
3216+
substring rather than only the first one. Other supported
3217+
flags are <literal>m</>, <literal>n</>, <literal>p</>, <literal>w</> and
3218+
<literal>x</>, whose meanings are described in
3219+
<xref linkend="posix-embedded-options-table">.
3220+
</para>
3221+
3222+
<para>
3223+
Some examples:
3224+
<programlisting>
3225+
SELECT regexp_matches('foobarbequebaz', '(bar)(beque)');
3226+
regexp_matches
3227+
----------------
3228+
{bar,beque}
3229+
(1 row)
3230+
3231+
SELECT regexp_matches('foobarbequebazilbarfbonk', '(b[^b]+)(b[^b]+)', 'g');
3232+
regexp_matches
3233+
----------------
3234+
{bar,beque}
3235+
{bazil,barf}
3236+
(2 rows)
3237+
3238+
SELECT regexp_matches('foobarbequebaz', 'barbeque');
3239+
regexp_matches
3240+
----------------
3241+
{barbeque}
3242+
(1 row)
3243+
</programlisting>
3244+
</para>
3245+
3246+
<para>
3247+
The <function>regexp_split_to_table</> function splits a string using a POSIX
3248+
regular expression pattern as a delimiter. It has the syntax
3249+
<function>regexp_split_to_table</function>(<replaceable>string</>, <replaceable>pattern</>
3250+
<optional>, <replaceable>flags</> </optional>).
3251+
If there is no match to the <replaceable>pattern</>, the function returns the
3252+
<replaceable>string</>. If there is at least one match, for each match it returns
3253+
the text from the end of the last match (or the beginning of the string)
3254+
to the beginning of the match. When there are no more matches, it
3255+
returns the text from the end of the last match to the end of the string.
3256+
The <replaceable>flags</> parameter is an optional text string containing
3257+
zero or more single-letter flags that change the function's behavior.
3258+
<function>regexp_split_to_table</function> supports the flags <literal>i</>,
3259+
<literal>m</>, <literal>n</>, <literal>p</>, <literal>w</> and
3260+
<literal>x</>, whose meanings are described in
3261+
<xref linkend="posix-embedded-options-table">.
3262+
</para>
3263+
3264+
<para>
3265+
The <function>regexp_split_to_array</> function behaves the same as
3266+
<function>regexp_split_to_table</>, except that <function>regexp_split_to_array</>
3267+
returns its results as a <type>text[]</>. It has the syntax
3268+
<function>regexp_split_to_array</function>(<replaceable>string</>, <replaceable>pattern</>
3269+
<optional>, <replaceable>flags</> </optional>).
3270+
The parameters are the same as for <function>regexp_split_to_table</>.
3271+
</para>
3272+
3273+
<para>
3274+
Some examples:
3275+
<programlisting>
3276+
3277+
SELECT foo FROM regexp_split_to_table('the quick brown fox jumped over the lazy dog', E'\\\s+') AS foo;
3278+
foo
3279+
--------
3280+
the
3281+
quick
3282+
brown
3283+
fox
3284+
jumped
3285+
over
3286+
the
3287+
lazy
3288+
dog
3289+
(9 rows)
3290+
3291+
SELECT regexp_split_to_array('the quick brown fox jumped over the lazy dog', E'\\s+');
3292+
regexp_split_to_array
3293+
------------------------------------------------
3294+
{the,quick,brown,fox,jumped,over,the,lazy,dog}
3295+
(1 row)
3296+
3297+
SELECT foo FROM regexp_split_to_table('the quick brown fox', E'\\s*') AS foo;
3298+
foo
3299+
-----
3300+
t
3301+
h
3302+
e
3303+
q
3304+
u
3305+
i
3306+
c
3307+
k
3308+
b
3309+
r
3310+
o
3311+
w
3312+
n
3313+
f
3314+
o
3315+
x
3316+
(16 rows)
3317+
</programlisting>
3318+
</para>
3319+
31523320
<para>
31533321
<productname>PostgreSQL</productname>'s regular expressions are implemented
31543322
using a package written by Henry Spencer. Much of

0 commit comments

Comments
 (0)