Skip to content

Commit c65c182

Browse files
author
Andrei Zmievski
committed
Upgrade to version 3.92.
1 parent b434843 commit c65c182

39 files changed

+11003
-7068
lines changed

ext/pcre/config.m4

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ PHP_ARG_WITH(pcre-regex,for PCRE support,
1313

1414
if test "$PHP_PCRE_REGEX" != "no"; then
1515
if test "$PHP_PCRE_REGEX" = "yes"; then
16-
PHP_NEW_EXTENSION(pcre, pcrelib/maketables.c pcrelib/get.c pcrelib/study.c pcrelib/pcre.c php_pcre.c, $ext_shared,,-DSUPPORT_UTF8 -I@ext_srcdir@/pcrelib)
16+
PHP_NEW_EXTENSION(pcre, pcrelib/maketables.c pcrelib/get.c pcrelib/study.c pcrelib/pcre.c php_pcre.c, $ext_shared,,-DSUPPORT_UTF8 -DLINK_SIZE=2 -I@ext_srcdir@/pcrelib)
1717
PHP_ADD_BUILD_DIR($ext_builddir/pcrelib)
1818
AC_DEFINE(HAVE_BUNDLED_PCRE, 1, [ ])
1919
else
@@ -49,7 +49,7 @@ if test "$PHP_PCRE_REGEX" != "no"; then
4949

5050
AC_DEFINE(HAVE_PCRE, 1, [ ])
5151
PHP_ADD_INCLUDE($PCRE_INCDIR)
52-
PHP_NEW_EXTENSION(pcre, php_pcre.c, $ext_shared,,-DSUPPORT_UTF8)
52+
PHP_NEW_EXTENSION(pcre, php_pcre.c, $ext_shared,,-DSUPPORT_UTF8 -DLINK_SIZE=2)
5353
fi
5454
fi
5555
PHP_SUBST(PCRE_SHARED_LIBADD)

ext/pcre/config0.m4

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ PHP_ARG_WITH(pcre-regex,for PCRE support,
1313

1414
if test "$PHP_PCRE_REGEX" != "no"; then
1515
if test "$PHP_PCRE_REGEX" = "yes"; then
16-
PHP_NEW_EXTENSION(pcre, pcrelib/maketables.c pcrelib/get.c pcrelib/study.c pcrelib/pcre.c php_pcre.c, $ext_shared,,-DSUPPORT_UTF8 -I@ext_srcdir@/pcrelib)
16+
PHP_NEW_EXTENSION(pcre, pcrelib/maketables.c pcrelib/get.c pcrelib/study.c pcrelib/pcre.c php_pcre.c, $ext_shared,,-DSUPPORT_UTF8 -DLINK_SIZE=2 -I@ext_srcdir@/pcrelib)
1717
PHP_ADD_BUILD_DIR($ext_builddir/pcrelib)
1818
AC_DEFINE(HAVE_BUNDLED_PCRE, 1, [ ])
1919
else
@@ -49,7 +49,7 @@ if test "$PHP_PCRE_REGEX" != "no"; then
4949

5050
AC_DEFINE(HAVE_PCRE, 1, [ ])
5151
PHP_ADD_INCLUDE($PCRE_INCDIR)
52-
PHP_NEW_EXTENSION(pcre, php_pcre.c, $ext_shared,,-DSUPPORT_UTF8)
52+
PHP_NEW_EXTENSION(pcre, php_pcre.c, $ext_shared,,-DSUPPORT_UTF8 -DLINK_SIZE=2)
5353
fi
5454
fi
5555
PHP_SUBST(PCRE_SHARED_LIBADD)

ext/pcre/pcrelib/ChangeLog

Lines changed: 208 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,214 @@
11
ChangeLog for PCRE
22
------------------
33

4-
Version 3.0 02-Jan-02
4+
Version 4.00 ....
5+
-----------------
6+
7+
1. If a comment in an extended regex that started immediately after a meta-item
8+
extended to the end of string, PCRE compiled incorrect data. This could lead to
9+
all kinds of weird effects. Example: /#/ was bad; /()#/ was bad; /a#/ was not.
10+
11+
2. Moved to autoconf 2.53 and libtool 1.4.2.
12+
13+
3. Perl 5.8 no longer needs "use utf8" for doing UTF-8 things. Consequently,
14+
the special perltest8 script is no longer needed - all the tests can be run
15+
from a single perltest script.
16+
17+
4. From 5.004, Perl has not included the VT character (0x0b) in the set defined
18+
by \s. It has now been removed in PCRE. This means it isn't recognized as
19+
whitespace in /x regexes too, which is the same as Perl. Note that the POSIX
20+
class [:space:] *does* include VT, thereby creating a mess.
21+
22+
5. Added the class [:blank:] (a GNU extension from Perl 5.8) to match only
23+
space and tab.
24+
25+
6. Perl 5.005 was a long time ago. It's time to amalgamate the tests that use
26+
its new features into the main test script, reducing the number of scripts.
27+
28+
7. Perl 5.8 has changed the meaning of patterns like /a(?i)b/. Earlier
29+
versions were backward compatible, and made the (?i) apply to the whole
30+
pattern, as if /i were given. Now it behaves more logically, and applies the
31+
option setting only to what follows. PCRE has been changed to follow suit.
32+
However, if it finds options settings right at the start of the pattern, it
33+
extracts them into the global options, as before. Thus, they show up in the
34+
info data.
35+
36+
8. Added support for the \Q...\E escape sequence. Characters in between are
37+
treated as literals. This is slightly different from Perl in that $ and @ are
38+
also handled as literals inside the quotes. In Perl, they will cause variable
39+
interpolation. Note the following examples:
40+
41+
Pattern PCRE matches Perl matches
42+
43+
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
44+
\Qabc\$xyz\E abc\$xyz abc\$xyz
45+
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
46+
47+
9. Re-organized 3 code statements in pcretest to avoid "overflow in
48+
floating-point constant arithmetic" warnings from a Microsoft compiler. Added a
49+
(size_t) cast to one statement in pcretest and one in pcreposix to avoid
50+
signed/unsigned warnings.
51+
52+
10. SunOS4 doesn't have strtoul(). This was used only for unpicking the -o
53+
option for pcretest, so I've replaced it by a simple function that does just
54+
that job.
55+
56+
11. pcregrep was ending with code 0 instead of 2 for the commands "pcregrep" or
57+
"pcregrep -".
58+
59+
12. Added "possessive quantifiers" ?+, *+, ++, and {,}+ which come from Sun's
60+
Java package. This provides some syntactic sugar for simple cases of what my
61+
documentation calls "once-only subpatterns". A pattern such as x*+ is the
62+
same as (?>x*). In other words, if what is inside (?>...) is just a single
63+
repeated item, you can use this simplified notation. Note that only makes sense
64+
with greedy quantifiers. Consequently, the use of the possessive quantifier
65+
forces greediness, whatever the setting of the PCRE_UNGREEDY option.
66+
67+
13. A change of greediness default within a pattern was not taking effect at
68+
the current level for patterns like /(b+(?U)a+)/. It did apply to parenthesized
69+
subpatterns that followed. Patterns like /b+(?U)a+/ worked because the option
70+
was abstracted outside.
71+
72+
14. PCRE now supports the \G assertion. It is true when the current matching
73+
position is at the start point of the match. This differs from \A when the
74+
starting offset is non-zero. Used with the /g option of pcretest (or similar
75+
code), it works in the same way as it does for Perl's /g option.
76+
77+
15. Some bugs concerning the handling of certain option changes within patterns
78+
have been fixed. These applied to options other than (?ims). For example,
79+
"a(?x: b c )d" did not match "XabcdY" but did match "Xa b c dY". It should have
80+
been the other way round. Some of this was related to change 7 above.
81+
82+
16. PCRE now gives errors for /[.x.]/ and /[=x=]/ as unsupported POSIX
83+
features, as Perl does. Previously, PCRE gave the warnings only for /[[.x.]]/
84+
and /[[=x=]]/. PCRE now also gives an error for /[:name:]/ because it supports
85+
POSIX classes only within a class (e.g. /[[:alpha:]]/).
86+
87+
17. Added support for Perl's \C escape. This matches one byte, even in UTF8
88+
mode. Unlike ".", it always matches newline, whatever the setting of
89+
PCRE_DOTALL. However, PCRE does not permit \C to appear in lookbehind
90+
assertions. (Perl allows it, but it doesn't (in general) work because it can't
91+
calculate the length of the lookbehind. At least, that's the case for Perl
92+
5.8.0)
93+
94+
18. Added an error diagnosis for escapes that PCRE does not support: these are
95+
\L, \l, \N, \P, \p, \U, \u, and \X.
96+
97+
19. Although correctly diagnosing a missing ']' in a character class, PCRE was
98+
reading past the end of the pattern in cases such as /[abcd/.
99+
100+
20. PCRE was getting more memory than necessary for patterns with classes that
101+
contained both POSIX named classes and other characters, e.g. /[[:space:]abc/.
102+
103+
21. Added some code, conditional on #ifdef VPCOMPAT, to make life easier for
104+
compiling PCRE for use with Virtual Pascal.
105+
106+
22. Small fix to the Makefile to make it work properly if the build is done
107+
outside the source tree.
108+
109+
23. Added a new extension: a condition to go with recursion. If a conditional
110+
subpattern starts with (?(R) the "true" branch is used if recursion has
111+
happened, whereas the "false" branch is used only at the top level.
112+
113+
24. When there was a very long string of literal characters (over 255 bytes
114+
without UTF support, over 250 bytes with UTF support), the computation of how
115+
much memory was required could be incorrect, leading to segfaults or other
116+
strange effects.
117+
118+
25. PCRE was incorrectly assuming anchoring (either to start of subject or to
119+
start of line for a non-DOTALL pattern) when a pattern started with (.*) and
120+
there was a subsequent back reference to those brackets. This meant that, for
121+
example, /(.*)\d+\1/ failed to match "abc123bc". Unfortunately, it isn't
122+
possible to check for precisely this case. All we can do is abandon the
123+
optimization if .* occurs inside capturing brackets when there are any back
124+
references whatsoever.
125+
126+
26. The handling of the optimization for finding the first character of a
127+
non-anchored pattern, and for finding a character that is required later in the
128+
match were failing in some cases. This didn't break the matching; it just
129+
failed to optimize when it could. The way this is done has been re-implemented.
130+
131+
27. Fixed typo in error message for invalid (?R item (it said "(?p").
132+
133+
28. Added a new feature that provides some of the functionality that Perl
134+
provides with (?{...}). The facility is termed a "callout". The way it is done
135+
in PCRE is for the caller to provide an optional function, by setting
136+
pcre_callout to its entry point. Like pcre_malloc and pcre_free, this is a
137+
global variable. By default it is unset, which disables all calling out. To get
138+
the function called, the regex must include (?C) at appropriate points. This
139+
is, in fact, equivalent to (?C0), and any number <= 255 may be given with (?C).
140+
This provides a means of identifying different callout points. When PCRE
141+
reaches such a point in the regex, if pcre_callout has been set, the external
142+
function is called. It is provided with data in a structure called
143+
pcre_callout_block, which is defined in pcre.h. If the function returns 0,
144+
matching continues; if it returns a non-zero value, the match at the current
145+
point fails. However, backtracking will occur if possible.
146+
147+
29. pcretest is upgraded to test the callout functionality. It provides a
148+
callout function that displays information. By default, it shows the start of
149+
the match and the current position in the text. There are some new data escapes
150+
to vary what happens:
151+
152+
\C+ in addition, show current contents of captured substrings
153+
\C- do not supply a callout function
154+
\C!n return 1 when callout number n is reached
155+
\C!n!m return 1 when callout number n is reached for the mth time
156+
157+
30. If pcregrep was called with the -l option and just a single file name, it
158+
output "<stdin>" if a match was found, instead of the file name.
159+
160+
31. Improve the efficiency of the POSIX API to PCRE. If the number of capturing
161+
slots is less than POSIX_MALLOC_THRESHOLD, use a block on the stack to pass to
162+
pcre_exec(). This saves a malloc/free per call. The default value of
163+
POSIX_MALLOC_THRESHOLD is 5; it can be changed by --with-posix-malloc-threshold
164+
when configuring.
165+
166+
32. The default maximum size of a compiled pattern is 64K. There have been a
167+
few cases of people hitting this limit. The code now uses macros to handle the
168+
storing of links as offsets within the compiled pattern. It defaults to 2-byte
169+
links, but this can be changed to 3 or 4 bytes by --with-link-size when
170+
configuring. Tests 2 and 5 work only with 2-byte links because they output
171+
debugging information about compiled patterns.
172+
173+
33. Internal code re-arrangements:
174+
175+
(a) Moved the debugging function for printing out a compiled regex into
176+
its own source file (printint.c) and used #include to pull it into
177+
pcretest.c and, when DEBUG is defined, into pcre.c, instead of having
178+
two separate copies.
179+
180+
(b) Defined the list of op-code names for debugging as a macro in
181+
internal.h so that it is next to the definition of the opcodes.
182+
183+
(c) Defined a table of op-code lengths for simpler skipping along compiled
184+
code. This is again a macro in internal.h so that it is next to the
185+
definition of the opcodes.
186+
187+
34. Added support for recursive calls to individual subpatterns, along the
188+
lines of Robin Houston's patch (but implemented somewhat differently).
189+
190+
35. Further mods to the Makefile to help Win32. Also, added code to pcregrep
191+
to allow it to read and process whole directories in Win32. This code was
192+
contributed by Lionel Fourquaux; it has not been tested by me.
193+
194+
36. Added support for named subpatterns. The Python syntax (?P<name>...) is
195+
used to name a group. Names consist of alphanumerics and underscores, and
196+
must be unique. Back references use the syntax (?P=name) and recursive
197+
calls use (?P>name) which is a PCRE extension to the Python extension.
198+
Groups still have numbers. The function pcre_fullinfo() can be used after
199+
compilation to extract a name/number map. There are three relevant calls:
200+
201+
PCRE_INFO_NAMEENTRYSIZE yields the size of each entry in the map
202+
PCRE_INFO_NAMECOUNT yields the number of entries
203+
PCRE_INFO_NAMETABLE yields a pointer to the map.
204+
205+
The map is a vector of fixed-size entries. The size of each entry depends
206+
on the length of the longest name used. The first two bytes of each entry
207+
are the group number, most significant byte first. There follows the
208+
corresponding name, zero terminated. The names are in alphabetical order.
209+
210+
211+
Version 3.9 02-Jan-02
5212
---------------------
6213

7214
1. A bit of extraneous text had somehow crept into the pcregrep documentation.

ext/pcre/pcrelib/NON-UNIX-USE

Lines changed: 39 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,13 +41,49 @@ Makefile.in to create Makefile, substituting suitable values for the variables
4141
at the head of the file.
4242

4343
Some help in building a Win32 DLL of PCRE in GnuWin32 environments was
44-
contributed by Paul.Sokolovsky@technologist.com. These environments are
45-
Mingw32 (http://www.xraylith.wisc.edu/~khan/software/gnu-win32/) and
46-
CygWin (http://sourceware.cygnus.com/cygwin/). Paul comments:
44+
contributed by Paul Sokolovsky. These environments are Mingw32
45+
(http://www.xraylith.wisc.edu/~khan/software/gnu-win32/) and CygWin
46+
(http://sourceware.cygnus.com/cygwin/). Paul comments:
4747

4848
For CygWin, set CFLAGS=-mno-cygwin, and do 'make dll'. You'll get
4949
pcre.dll (containing pcreposix also), libpcre.dll.a, and dynamically
5050
linked pgrep and pcretest. If you have /bin/sh, run RunTest (three
5151
main test go ok, locale not supported).
5252

53+
A script for building PCRE using Borland's C++ compiler for use with VPASCAL
54+
was contributed by Alexander Tokarev. It is called makevp.bat.
55+
56+
These are some further comments about Win32 builds from Mark Evans:
57+
58+
The documentation for Win32 builds is a bit shy. Under MSVC6 I
59+
followed their instructions to the letter, but there were still
60+
some things missing.
61+
62+
(1) Must #define STATIC for entire project if linking statically.
63+
(I see no reason to use DLLs for code this compact.) This of
64+
course is a project setting in MSVC under Preprocessor.
65+
66+
(2) Missing some #ifdefs relating to the function pointers
67+
pcre_malloc and pcre_free. See my solution below. (The stubs
68+
may not be mandatory but they made me feel better.)
69+
70+
=========================
71+
#ifdef _WIN32
72+
#include <malloc.h>
73+
74+
void* malloc_stub(size_t N)
75+
{ return malloc(N); }
76+
void free_stub(void* p)
77+
{ free(p); }
78+
void *(*pcre_malloc)(size_t) = &malloc_stub;
79+
void (*pcre_free)(void *) = &free_stub;
80+
81+
#else
82+
83+
void *(*pcre_malloc)(size_t) = malloc;
84+
void (*pcre_free)(void *) = free;
85+
86+
#endif
87+
=========================
88+
5389
****

0 commit comments

Comments
 (0)