Skip to content

Commit 19ad138

Browse files
committed
Merge branch 'PHP-5.5' into PHP-5.6
* PHP-5.5: Upgrade PCRE to 8.36, it fixes some crashes
2 parents 1e18ffd + 13c32a1 commit 19ad138

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+5107
-4347
lines changed

ext/pcre/pcrelib/AUTHORS

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Email domain: cam.ac.uk
88
University of Cambridge Computing Service,
99
Cambridge, England.
1010

11-
Copyright (c) 1997-2013 University of Cambridge
11+
Copyright (c) 1997-2014 University of Cambridge
1212
All rights reserved
1313

1414

@@ -19,7 +19,7 @@ Written by: Zoltan Herczeg
1919
Email local part: hzmester
2020
Emain domain: freemail.hu
2121

22-
Copyright(c) 2010-2013 Zoltan Herczeg
22+
Copyright(c) 2010-2014 Zoltan Herczeg
2323
All rights reserved.
2424

2525

@@ -30,7 +30,7 @@ Written by: Zoltan Herczeg
3030
Email local part: hzmester
3131
Emain domain: freemail.hu
3232

33-
Copyright(c) 2009-2013 Zoltan Herczeg
33+
Copyright(c) 2009-2014 Zoltan Herczeg
3434
All rights reserved.
3535

3636

ext/pcre/pcrelib/ChangeLog

Lines changed: 219 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,224 @@
11
ChangeLog for PCRE
22
------------------
33

4+
Version 8.36 26-September-2014
5+
------------------------------
6+
7+
1. Got rid of some compiler warnings in the C++ modules that were shown up by
8+
-Wmissing-field-initializers and -Wunused-parameter.
9+
10+
2. The tests for quantifiers being too big (greater than 65535) were being
11+
applied after reading the number, and stupidly assuming that integer
12+
overflow would give a negative number. The tests are now applied as the
13+
numbers are read.
14+
15+
3. Tidy code in pcre_exec.c where two branches that used to be different are
16+
now the same.
17+
18+
4. The JIT compiler did not generate match limit checks for certain
19+
bracketed expressions with quantifiers. This may lead to exponential
20+
backtracking, instead of returning with PCRE_ERROR_MATCHLIMIT. This
21+
issue should be resolved now.
22+
23+
5. Fixed an issue, which occures when nested alternatives are optimized
24+
with table jumps.
25+
26+
6. Inserted two casts and changed some ints to size_t in the light of some
27+
reported 64-bit compiler warnings (Bugzilla 1477).
28+
29+
7. Fixed a bug concerned with zero-minimum possessive groups that could match
30+
an empty string, which sometimes were behaving incorrectly in the
31+
interpreter (though correctly in the JIT matcher). This pcretest input is
32+
an example:
33+
34+
'\A(?:[^"]++|"(?:[^"]*+|"")*+")++'
35+
NON QUOTED "QUOT""ED" AFTER "NOT MATCHED
36+
37+
the interpreter was reporting a match of 'NON QUOTED ' only, whereas the
38+
JIT matcher and Perl both matched 'NON QUOTED "QUOT""ED" AFTER '. The test
39+
for an empty string was breaking the inner loop and carrying on at a lower
40+
level, when possessive repeated groups should always return to a higher
41+
level as they have no backtrack points in them. The empty string test now
42+
occurs at the outer level.
43+
44+
8. Fixed a bug that was incorrectly auto-possessifying \w+ in the pattern
45+
^\w+(?>\s*)(?<=\w) which caused it not to match "test test".
46+
47+
9. Give a compile-time error for \o{} (as Perl does) and for \x{} (which Perl
48+
doesn't).
49+
50+
10. Change 8.34/15 introduced a bug that caused the amount of memory needed
51+
to hold a pattern to be incorrectly computed (too small) when there were
52+
named back references to duplicated names. This could cause "internal
53+
error: code overflow" or "double free or corruption" or other memory
54+
handling errors.
55+
56+
11. When named subpatterns had the same prefixes, back references could be
57+
confused. For example, in this pattern:
58+
59+
/(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/
60+
61+
the reference to 'Name' was incorrectly treated as a reference to a
62+
duplicate name.
63+
64+
12. A pattern such as /^s?c/mi8 where the optional character has more than
65+
one "other case" was incorrectly compiled such that it would only try to
66+
match starting at "c".
67+
68+
13. When a pattern starting with \s was studied, VT was not included in the
69+
list of possible starting characters; this should have been part of the
70+
8.34/18 patch.
71+
72+
14. If a character class started [\Qx]... where x is any character, the class
73+
was incorrectly terminated at the ].
74+
75+
15. If a pattern that started with a caseless match for a character with more
76+
than one "other case" was studied, PCRE did not set up the starting code
77+
unit bit map for the list of possible characters. Now it does. This is an
78+
optimization improvement, not a bug fix.
79+
80+
16. The Unicode data tables have been updated to Unicode 7.0.0.
81+
82+
17. Fixed a number of memory leaks in pcregrep.
83+
84+
18. Avoid a compiler warning (from some compilers) for a function call with
85+
a cast that removes "const" from an lvalue by using an intermediate
86+
variable (to which the compiler does not object).
87+
88+
19. Incorrect code was compiled if a group that contained an internal recursive
89+
back reference was optional (had quantifier with a minimum of zero). This
90+
example compiled incorrect code: /(((a\2)|(a*)\g<-1>))*/ and other examples
91+
caused segmentation faults because of stack overflows at compile time.
92+
93+
20. A pattern such as /((?(R)a|(?1)))+/, which contains a recursion within a
94+
group that is quantified with an indefinite repeat, caused a compile-time
95+
loop which used up all the system stack and provoked a segmentation fault.
96+
This was not the same bug as 19 above.
97+
98+
21. Add PCRECPP_EXP_DECL declaration to operator<< in pcre_stringpiece.h.
99+
Patch by Mike Frysinger.
100+
101+
102+
Version 8.35 04-April-2014
103+
--------------------------
104+
105+
1. A new flag is set, when property checks are present in an XCLASS.
106+
When this flag is not set, PCRE can perform certain optimizations
107+
such as studying these XCLASS-es.
108+
109+
2. The auto-possessification of character sets were improved: a normal
110+
and an extended character set can be compared now. Furthermore
111+
the JIT compiler optimizes more character set checks.
112+
113+
3. Got rid of some compiler warnings for potentially uninitialized variables
114+
that show up only when compiled with -O2.
115+
116+
4. A pattern such as (?=ab\K) that uses \K in an assertion can set the start
117+
of a match later then the end of the match. The pcretest program was not
118+
handling the case sensibly - it was outputting from the start to the next
119+
binary zero. It now reports this situation in a message, and outputs the
120+
text from the end to the start.
121+
122+
5. Fast forward search is improved in JIT. Instead of the first three
123+
characters, any three characters with fixed position can be searched.
124+
Search order: first, last, middle.
125+
126+
6. Improve character range checks in JIT. Characters are read by an inprecise
127+
function now, which returns with an unknown value if the character code is
128+
above a certain threshold (e.g: 256). The only limitation is that the value
129+
must be bigger than the threshold as well. This function is useful when
130+
the characters above the threshold are handled in the same way.
131+
132+
7. The macros whose names start with RAWUCHAR are placeholders for a future
133+
mode in which only the bottom 21 bits of 32-bit data items are used. To
134+
make this more memorable for those maintaining the code, the names have
135+
been changed to start with UCHAR21, and an extensive comment has been added
136+
to their definition.
137+
138+
8. Add missing (new) files sljitNativeTILEGX.c and sljitNativeTILEGX-encoder.c
139+
to the export list in Makefile.am (they were accidentally omitted from the
140+
8.34 tarball).
141+
142+
9. The informational output from pcretest used the phrase "starting byte set"
143+
which is inappropriate for the 16-bit and 32-bit libraries. As the output
144+
for "first char" and "need char" really means "non-UTF-char", I've changed
145+
"byte" to "char", and slightly reworded the output. The documentation about
146+
these values has also been (I hope) clarified.
147+
148+
10. Another JIT related optimization: use table jumps for selecting the correct
149+
backtracking path, when more than four alternatives are present inside a
150+
bracket.
151+
152+
11. Empty match is not possible, when the minimum length is greater than zero,
153+
and there is no \K in the pattern. JIT should avoid empty match checks in
154+
such cases.
155+
156+
12. In a caseless character class with UCP support, when a character with more
157+
than one alternative case was not the first character of a range, not all
158+
the alternative cases were added to the class. For example, s and \x{17f}
159+
are both alternative cases for S: the class [RST] was handled correctly,
160+
but [R-T] was not.
161+
162+
13. The configure.ac file always checked for pthread support when JIT was
163+
enabled. This is not used in Windows, so I have put this test inside a
164+
check for the presence of windows.h (which was already tested for).
165+
166+
14. Improve pattern prefix search by a simplified Boyer-Moore algorithm in JIT.
167+
The algorithm provides a way to skip certain starting offsets, and usually
168+
faster than linear prefix searches.
169+
170+
15. Change 13 for 8.20 updated RunTest to check for the 'fr' locale as well
171+
as for 'fr_FR' and 'french'. For some reason, however, it then used the
172+
Windows-specific input and output files, which have 'french' screwed in.
173+
So this could never have worked. One of the problems with locales is that
174+
they aren't always the same. I have now updated RunTest so that it checks
175+
the output of the locale test (test 3) against three different output
176+
files, and it allows the test to pass if any one of them matches. With luck
177+
this should make the test pass on some versions of Solaris where it was
178+
failing. Because of the uncertainty, the script did not used to stop if
179+
test 3 failed; it now does. If further versions of a French locale ever
180+
come to light, they can now easily be added.
181+
182+
16. If --with-pcregrep-bufsize was given a non-integer value such as "50K",
183+
there was a message during ./configure, but it did not stop. This now
184+
provokes an error. The invalid example in README has been corrected.
185+
If a value less than the minimum is given, the minimum value has always
186+
been used, but now a warning is given.
187+
188+
17. If --enable-bsr-anycrlf was set, the special 16/32-bit test failed. This
189+
was a bug in the test system, which is now fixed. Also, the list of various
190+
configurations that are tested for each release did not have one with both
191+
16/32 bits and --enable-bar-anycrlf. It now does.
192+
193+
18. pcretest was missing "-C bsr" for displaying the \R default setting.
194+
195+
19. Little endian PowerPC systems are supported now by the JIT compiler.
196+
197+
20. The fast forward newline mechanism could enter to an infinite loop on
198+
certain invalid UTF-8 input. Although we don't support these cases
199+
this issue can be fixed by a performance optimization.
200+
201+
21. Change 33 of 8.34 is not sufficient to ensure stack safety because it does
202+
not take account if existing stack usage. There is now a new global
203+
variable called pcre_stack_guard that can be set to point to an external
204+
function to check stack availability. It is called at the start of
205+
processing every parenthesized group.
206+
207+
22. A typo in the code meant that in ungreedy mode the max/min qualifier
208+
behaved like a min-possessive qualifier, and, for example, /a{1,3}b/U did
209+
not match "ab".
210+
211+
23. When UTF was disabled, the JIT program reported some incorrect compile
212+
errors. These messages are silenced now.
213+
214+
24. Experimental support for ARM-64 and MIPS-64 has been added to the JIT
215+
compiler.
216+
217+
25. Change all the temporary files used in RunGrepTest to be different to those
218+
used by RunTest so that the tests can be run simultaneously, for example by
219+
"make -j check".
220+
221+
4222
Version 8.34 15-December-2013
5223
-----------------------------
6224

@@ -5311,7 +5529,7 @@ by an auxiliary program - but can then be edited by hand if required. There are
53115529
now no calls to isalnum(), isspace(), isdigit(), isxdigit(), tolower() or
53125530
toupper() in the code.
53135531

5314-
7. Turn the malloc/free functions variables into pcre_malloc and pcre_free and
5532+
7. Turn the malloc/free funtions variables into pcre_malloc and pcre_free and
53155533
make them global. Abolish the function for setting them, as the caller can now
53165534
set them directly.
53175535

ext/pcre/pcrelib/LICENCE

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Email domain: cam.ac.uk
2424
University of Cambridge Computing Service,
2525
Cambridge, England.
2626

27-
Copyright (c) 1997-2013 University of Cambridge
27+
Copyright (c) 1997-2014 University of Cambridge
2828
All rights reserved.
2929

3030

@@ -35,7 +35,7 @@ Written by: Zoltan Herczeg
3535
Email local part: hzmester
3636
Emain domain: freemail.hu
3737

38-
Copyright(c) 2010-2013 Zoltan Herczeg
38+
Copyright(c) 2010-2014 Zoltan Herczeg
3939
All rights reserved.
4040

4141

@@ -46,7 +46,7 @@ Written by: Zoltan Herczeg
4646
Email local part: hzmester
4747
Emain domain: freemail.hu
4848

49-
Copyright(c) 2009-2013 Zoltan Herczeg
49+
Copyright(c) 2009-2014 Zoltan Herczeg
5050
All rights reserved.
5151

5252

ext/pcre/pcrelib/NEWS

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,24 @@
11
News about PCRE releases
22
------------------------
33

4+
Release 8.36 26-September-2014
5+
------------------------------
6+
7+
This is primarily a bug-fix release. However, in addition, the Unicode data
8+
tables have been updated to Unicode 7.0.0.
9+
10+
11+
Release 8.35 04-April-2014
12+
--------------------------
13+
14+
There have been performance improvements for classes containing non-ASCII
15+
characters and the "auto-possessification" feature has been extended. Other
16+
minor improvements have been implemented and bugs fixed. There is a new callout
17+
feature to enable applications to do detailed stack checks at compile time, to
18+
avoid running out of stack for deeply nested parentheses. The JIT compiler has
19+
been extended with experimental support for ARM-64, MIPS-64, and PPC-LE.
20+
21+
422
Release 8.34 15-December-2013
523
-----------------------------
624

ext/pcre/pcrelib/README

Lines changed: 20 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,16 @@ the 16-bit library, which processes strings of 16-bit values, and one for the
4545
32-bit library, which processes strings of 32-bit values. The distribution also
4646
includes a set of C++ wrapper functions (see the pcrecpp man page for details),
4747
courtesy of Google Inc., which can be used to call the 8-bit PCRE library from
48-
C++.
48+
C++. Other C++ wrappers have been created from time to time. See, for example:
49+
https://github.com/YasserAsmi/regexp, which aims to be simple and similar in
50+
style to the C API.
4951

50-
In addition, there is a set of C wrapper functions (again, just for the 8-bit
51-
library) that are based on the POSIX regular expression API (see the pcreposix
52-
man page). These end up in the library called libpcreposix. Note that this just
53-
provides a POSIX calling interface to PCRE; the regular expressions themselves
54-
still follow Perl syntax and semantics. The POSIX API is restricted, and does
55-
not give full access to all of PCRE's facilities.
52+
The distribution also contains a set of C wrapper functions (again, just for
53+
the 8-bit library) that are based on the POSIX regular expression API (see the
54+
pcreposix man page). These end up in the library called libpcreposix. Note that
55+
this just provides a POSIX calling interface to PCRE; the regular expressions
56+
themselves still follow Perl syntax and semantics. The POSIX API is restricted,
57+
and does not give full access to all of PCRE's facilities.
5658

5759
The header file for the POSIX-style functions is called pcreposix.h. The
5860
official POSIX name is regex.h, but I did not want to risk possible problems
@@ -85,11 +87,12 @@ documentation is supplied in two other forms:
8587
1. There are files called doc/pcre.txt, doc/pcregrep.txt, and
8688
doc/pcretest.txt in the source distribution. The first of these is a
8789
concatenation of the text forms of all the section 3 man pages except
88-
those that summarize individual functions. The other two are the text
89-
forms of the section 1 man pages for the pcregrep and pcretest commands.
90-
These text forms are provided for ease of scanning with text editors or
91-
similar tools. They are installed in <prefix>/share/doc/pcre, where
92-
<prefix> is the installation prefix (defaulting to /usr/local).
90+
the listing of pcredemo.c and those that summarize individual functions.
91+
The other two are the text forms of the section 1 man pages for the
92+
pcregrep and pcretest commands. These text forms are provided for ease of
93+
scanning with text editors or similar tools. They are installed in
94+
<prefix>/share/doc/pcre, where <prefix> is the installation prefix
95+
(defaulting to /usr/local).
9396

9497
2. A set of files containing all the documentation in HTML form, hyperlinked
9598
in various ways, and rooted in a file called index.html, is distributed in
@@ -372,12 +375,12 @@ library. They are also documented in the pcrebuild man page.
372375

373376
Of course, the relevant libraries must be installed on your system.
374377

375-
. The default size of internal buffer used by pcregrep can be set by, for
376-
example:
378+
. The default size (in bytes) of the internal buffer used by pcregrep can be
379+
set by, for example:
377380

378-
--with-pcregrep-bufsize=50K
381+
--with-pcregrep-bufsize=51200
379382

380-
The default value is 20K.
383+
The value must be a plain integer. The default is 20480.
381384

382385
. It is possible to compile pcretest so that it links with the libreadline
383386
or libedit libraries, by specifying, respectively,
@@ -987,4 +990,4 @@ pcre_xxx, one with the name pcre16_xx, and a third with the name pcre32_xxx.
987990
Philip Hazel
988991
Email local part: ph10
989992
Email domain: cam.ac.uk
990-
Last updated: 05 November 2013
993+
Last updated: 24 October 2014

0 commit comments

Comments
 (0)