Skip to content

Commit 1b70619

Browse files
committed
Code review for regexp_matches/regexp_split patch. Refactor to avoid assuming
that cached compiled patterns will still be there when the function is next called. Clean up looping logic, thereby fixing bug identified by Pavel Stehule. Share setup code between the two functions, add some comments, and avoid risky mixing of int and size_t variables. Clean up the documentation a tad, and accept all the flag characters mentioned in table 9-19 rather than just a subset.
1 parent d0e5c0c commit 1b70619

File tree

4 files changed

+395
-330
lines changed

4 files changed

+395
-330
lines changed

doc/src/sgml/func.sgml

Lines changed: 33 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.383 2007/07/18 03:12:42 momjian Exp $ -->
1+
<!-- $PostgreSQL: pgsql/doc/src/sgml/func.sgml,v 1.384 2007/08/11 03:56:24 tgl Exp $ -->
22

33
<chapter id="functions">
44
<title>Functions and Operators</title>
@@ -1499,7 +1499,7 @@
14991499
<entry><literal><function>regexp_matches</function>(<parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type>])</literal></entry>
15001500
<entry><type>setof text[]</type></entry>
15011501
<entry>
1502-
Return all capture groups resulting from matching POSIX regular
1502+
Return all captured substrings resulting from matching a POSIX regular
15031503
expression against the <parameter>string</parameter>. See
15041504
<xref linkend="functions-posix-regexp"> for more information.
15051505
</entry>
@@ -1511,7 +1511,7 @@
15111511
<entry><literal><function>regexp_replace</function>(<parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type>])</literal></entry>
15121512
<entry><type>text</type></entry>
15131513
<entry>
1514-
Replace substring matching POSIX regular expression. See
1514+
Replace substring(s) matching a POSIX regular expression. See
15151515
<xref linkend="functions-posix-regexp"> for more information.
15161516
</entry>
15171517
<entry><literal>regexp_replace('Thomas', '.[mN]a.', 'M')</literal></entry>
@@ -1522,7 +1522,7 @@
15221522
<entry><literal><function>regexp_split_to_array</function>(<parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type> ])</literal></entry>
15231523
<entry><type>text[]</type></entry>
15241524
<entry>
1525-
Split <parameter>string</parameter> using POSIX regular expression as
1525+
Split <parameter>string</parameter> using a POSIX regular expression as
15261526
the delimiter. See <xref linkend="functions-posix-regexp"> for more
15271527
information.
15281528
</entry>
@@ -1534,7 +1534,7 @@
15341534
<entry><literal><function>regexp_split_to_table</function>(<parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>flags</parameter> <type>text</type>])</literal></entry>
15351535
<entry><type>setof text</type></entry>
15361536
<entry>
1537-
Split <parameter>string</parameter> using POSIX regular expression as
1537+
Split <parameter>string</parameter> using a POSIX regular expression as
15381538
the delimiter. See <xref linkend="functions-posix-regexp"> for more
15391539
information.
15401540
</entry>
@@ -2856,11 +2856,9 @@ cast(-44 as bit(12)) <lineannotation>111111010100</lineannotation>
28562856
<acronym>SQL</acronym> <function>LIKE</function> operator, the
28572857
more recent <function>SIMILAR TO</function> operator (added in
28582858
SQL:1999), and <acronym>POSIX</acronym>-style regular
2859-
expressions.
2860-
Additionally, a pattern matching function,
2861-
<function>substring</function>, is available, using either
2862-
<function>SIMILAR TO</function>-style or POSIX-style regular
2863-
expressions.
2859+
expressions. Aside from the basic <quote>does this string match
2860+
this pattern?</> operators, functions are available to extract
2861+
or replace matching substrings and to split a string at the matches.
28642862
</para>
28652863

28662864
<tip>
@@ -3186,15 +3184,20 @@ substring('foobar' from '#"o_b#"%' for '#') <lineannotation>NULL</lineannotat
31863184
end of the string.
31873185
</para>
31883186

3189-
<para>
3190-
Some examples:
3187+
<para>
3188+
Some examples:
31913189
<programlisting>
31923190
'abc' ~ 'abc' <lineannotation>true</lineannotation>
31933191
'abc' ~ '^a' <lineannotation>true</lineannotation>
31943192
'abc' ~ '(b|d)' <lineannotation>true</lineannotation>
31953193
'abc' ~ '^(b|c)' <lineannotation>false</lineannotation>
31963194
</programlisting>
3197-
</para>
3195+
</para>
3196+
3197+
<para>
3198+
The <acronym>POSIX</acronym> pattern language is described in much
3199+
greater detail below.
3200+
</para>
31983201

31993202
<para>
32003203
The <function>substring</> function with two parameters,
@@ -3246,9 +3249,7 @@ substring('foobar' from 'o(.)b') <lineannotation>o</lineannotation>
32463249
function's behavior. Flag <literal>i</> specifies case-insensitive
32473250
matching, while flag <literal>g</> specifies replacement of each matching
32483251
substring rather than only the first one. Other supported flags are
3249-
<literal>m</>, <literal>n</>, <literal>p</>, <literal>w</> and
3250-
<literal>x</>, whose meanings correspond to those shown in
3251-
<xref linkend="posix-embedded-options-table">.
3252+
described in <xref linkend="posix-embedded-options-table">.
32523253
</para>
32533254

32543255
<para>
@@ -3264,23 +3265,25 @@ regexp_replace('foobarbaz', 'b(..)', E'X\\1Y', 'g')
32643265
</para>
32653266

32663267
<para>
3267-
The <function>regexp_matches</> function returns all of the capture
3268-
groups resulting from matching a POSIX regular expression pattern.
3268+
The <function>regexp_matches</> function returns all of the captured
3269+
substrings resulting from matching a POSIX regular expression pattern.
32693270
It has the syntax
32703271
<function>regexp_matches</function>(<replaceable>string</>, <replaceable>pattern</>
32713272
<optional>, <replaceable>flags</> </optional>).
3272-
If there is no match to the <replaceable>pattern</>, the function returns no rows.
3273-
If there is a match, the function returns the contents of all of the capture groups
3274-
in a text array, or if there were no capture groups in the pattern, it returns the
3275-
contents of the entire match as a single-element text array.
3273+
If there is no match to the <replaceable>pattern</>, the function returns
3274+
no rows. If there is a match, the function returns a text array whose
3275+
<replaceable>n</>'th element is the substring matching the
3276+
<replaceable>n</>'th parenthesized subexpression of the pattern
3277+
(not counting <quote>non-capturing</> parentheses; see below for
3278+
details). If the pattern does not contain any parenthesized
3279+
subexpressions, then the result is a single-element text array containing
3280+
the substring matching the whole pattern.
32763281
The <replaceable>flags</> parameter is an optional text
32773282
string containing zero or more single-letter flags that change the
3278-
function's behavior. Flag <literal>i</> specifies case-insensitive
3279-
matching, while flag <literal>g</> causes the return of each matching
3280-
substring rather than only the first one. Other supported
3281-
flags are <literal>m</>, <literal>n</>, <literal>p</>, <literal>w</> and
3282-
<literal>x</>, whose meanings are described in
3283-
<xref linkend="posix-embedded-options-table">.
3283+
function's behavior. Flag <literal>g</> causes the function to find
3284+
each match in the string, not only the first one, and return a row for
3285+
each such match. Other supported
3286+
flags are described in <xref linkend="posix-embedded-options-table">.
32843287
</para>
32853288

32863289
<para>
@@ -3319,16 +3322,14 @@ SELECT regexp_matches('foobarbequebaz', 'barbeque');
33193322
returns the text from the end of the last match to the end of the string.
33203323
The <replaceable>flags</> parameter is an optional text string containing
33213324
zero or more single-letter flags that change the function's behavior.
3322-
<function>regexp_split_to_table</function> supports the flags <literal>i</>,
3323-
<literal>m</>, <literal>n</>, <literal>p</>, <literal>w</> and
3324-
<literal>x</>, whose meanings are described in
3325+
<function>regexp_split_to_table</function> supports the flags described in
33253326
<xref linkend="posix-embedded-options-table">.
33263327
</para>
33273328

33283329
<para>
33293330
The <function>regexp_split_to_array</> function behaves the same as
33303331
<function>regexp_split_to_table</>, except that <function>regexp_split_to_array</>
3331-
returns its results as a <type>text[]</>. It has the syntax
3332+
returns its result as an array of <type>text</>. It has the syntax
33323333
<function>regexp_split_to_array</function>(<replaceable>string</>, <replaceable>pattern</>
33333334
<optional>, <replaceable>flags</> </optional>).
33343335
The parameters are the same as for <function>regexp_split_to_table</>.

0 commit comments

Comments
 (0)