|
1 | 1 | ChangeLog for PCRE
|
2 | 2 | ------------------
|
3 | 3 |
|
| 4 | +Version 8.36 26-September-2014 |
| 5 | +------------------------------ |
| 6 | + |
| 7 | +1. Got rid of some compiler warnings in the C++ modules that were shown up by |
| 8 | + -Wmissing-field-initializers and -Wunused-parameter. |
| 9 | + |
| 10 | +2. The tests for quantifiers being too big (greater than 65535) were being |
| 11 | + applied after reading the number, and stupidly assuming that integer |
| 12 | + overflow would give a negative number. The tests are now applied as the |
| 13 | + numbers are read. |
| 14 | + |
| 15 | +3. Tidy code in pcre_exec.c where two branches that used to be different are |
| 16 | + now the same. |
| 17 | + |
| 18 | +4. The JIT compiler did not generate match limit checks for certain |
| 19 | + bracketed expressions with quantifiers. This may lead to exponential |
| 20 | + backtracking, instead of returning with PCRE_ERROR_MATCHLIMIT. This |
| 21 | + issue should be resolved now. |
| 22 | + |
| 23 | +5. Fixed an issue, which occures when nested alternatives are optimized |
| 24 | + with table jumps. |
| 25 | + |
| 26 | +6. Inserted two casts and changed some ints to size_t in the light of some |
| 27 | + reported 64-bit compiler warnings (Bugzilla 1477). |
| 28 | + |
| 29 | +7. Fixed a bug concerned with zero-minimum possessive groups that could match |
| 30 | + an empty string, which sometimes were behaving incorrectly in the |
| 31 | + interpreter (though correctly in the JIT matcher). This pcretest input is |
| 32 | + an example: |
| 33 | + |
| 34 | + '\A(?:[^"]++|"(?:[^"]*+|"")*+")++' |
| 35 | + NON QUOTED "QUOT""ED" AFTER "NOT MATCHED |
| 36 | + |
| 37 | + the interpreter was reporting a match of 'NON QUOTED ' only, whereas the |
| 38 | + JIT matcher and Perl both matched 'NON QUOTED "QUOT""ED" AFTER '. The test |
| 39 | + for an empty string was breaking the inner loop and carrying on at a lower |
| 40 | + level, when possessive repeated groups should always return to a higher |
| 41 | + level as they have no backtrack points in them. The empty string test now |
| 42 | + occurs at the outer level. |
| 43 | + |
| 44 | +8. Fixed a bug that was incorrectly auto-possessifying \w+ in the pattern |
| 45 | + ^\w+(?>\s*)(?<=\w) which caused it not to match "test test". |
| 46 | + |
| 47 | +9. Give a compile-time error for \o{} (as Perl does) and for \x{} (which Perl |
| 48 | + doesn't). |
| 49 | + |
| 50 | +10. Change 8.34/15 introduced a bug that caused the amount of memory needed |
| 51 | + to hold a pattern to be incorrectly computed (too small) when there were |
| 52 | + named back references to duplicated names. This could cause "internal |
| 53 | + error: code overflow" or "double free or corruption" or other memory |
| 54 | + handling errors. |
| 55 | + |
| 56 | +11. When named subpatterns had the same prefixes, back references could be |
| 57 | + confused. For example, in this pattern: |
| 58 | + |
| 59 | + /(?P<Name>a)?(?P<Name2>b)?(?(<Name>)c|d)*l/ |
| 60 | + |
| 61 | + the reference to 'Name' was incorrectly treated as a reference to a |
| 62 | + duplicate name. |
| 63 | + |
| 64 | +12. A pattern such as /^s?c/mi8 where the optional character has more than |
| 65 | + one "other case" was incorrectly compiled such that it would only try to |
| 66 | + match starting at "c". |
| 67 | + |
| 68 | +13. When a pattern starting with \s was studied, VT was not included in the |
| 69 | + list of possible starting characters; this should have been part of the |
| 70 | + 8.34/18 patch. |
| 71 | + |
| 72 | +14. If a character class started [\Qx]... where x is any character, the class |
| 73 | + was incorrectly terminated at the ]. |
| 74 | + |
| 75 | +15. If a pattern that started with a caseless match for a character with more |
| 76 | + than one "other case" was studied, PCRE did not set up the starting code |
| 77 | + unit bit map for the list of possible characters. Now it does. This is an |
| 78 | + optimization improvement, not a bug fix. |
| 79 | + |
| 80 | +16. The Unicode data tables have been updated to Unicode 7.0.0. |
| 81 | + |
| 82 | +17. Fixed a number of memory leaks in pcregrep. |
| 83 | + |
| 84 | +18. Avoid a compiler warning (from some compilers) for a function call with |
| 85 | + a cast that removes "const" from an lvalue by using an intermediate |
| 86 | + variable (to which the compiler does not object). |
| 87 | + |
| 88 | +19. Incorrect code was compiled if a group that contained an internal recursive |
| 89 | + back reference was optional (had quantifier with a minimum of zero). This |
| 90 | + example compiled incorrect code: /(((a\2)|(a*)\g<-1>))*/ and other examples |
| 91 | + caused segmentation faults because of stack overflows at compile time. |
| 92 | + |
| 93 | +20. A pattern such as /((?(R)a|(?1)))+/, which contains a recursion within a |
| 94 | + group that is quantified with an indefinite repeat, caused a compile-time |
| 95 | + loop which used up all the system stack and provoked a segmentation fault. |
| 96 | + This was not the same bug as 19 above. |
| 97 | + |
| 98 | +21. Add PCRECPP_EXP_DECL declaration to operator<< in pcre_stringpiece.h. |
| 99 | + Patch by Mike Frysinger. |
| 100 | + |
| 101 | + |
| 102 | +Version 8.35 04-April-2014 |
| 103 | +-------------------------- |
| 104 | + |
| 105 | +1. A new flag is set, when property checks are present in an XCLASS. |
| 106 | + When this flag is not set, PCRE can perform certain optimizations |
| 107 | + such as studying these XCLASS-es. |
| 108 | + |
| 109 | +2. The auto-possessification of character sets were improved: a normal |
| 110 | + and an extended character set can be compared now. Furthermore |
| 111 | + the JIT compiler optimizes more character set checks. |
| 112 | + |
| 113 | +3. Got rid of some compiler warnings for potentially uninitialized variables |
| 114 | + that show up only when compiled with -O2. |
| 115 | + |
| 116 | +4. A pattern such as (?=ab\K) that uses \K in an assertion can set the start |
| 117 | + of a match later then the end of the match. The pcretest program was not |
| 118 | + handling the case sensibly - it was outputting from the start to the next |
| 119 | + binary zero. It now reports this situation in a message, and outputs the |
| 120 | + text from the end to the start. |
| 121 | + |
| 122 | +5. Fast forward search is improved in JIT. Instead of the first three |
| 123 | + characters, any three characters with fixed position can be searched. |
| 124 | + Search order: first, last, middle. |
| 125 | + |
| 126 | +6. Improve character range checks in JIT. Characters are read by an inprecise |
| 127 | + function now, which returns with an unknown value if the character code is |
| 128 | + above a certain threshold (e.g: 256). The only limitation is that the value |
| 129 | + must be bigger than the threshold as well. This function is useful when |
| 130 | + the characters above the threshold are handled in the same way. |
| 131 | + |
| 132 | +7. The macros whose names start with RAWUCHAR are placeholders for a future |
| 133 | + mode in which only the bottom 21 bits of 32-bit data items are used. To |
| 134 | + make this more memorable for those maintaining the code, the names have |
| 135 | + been changed to start with UCHAR21, and an extensive comment has been added |
| 136 | + to their definition. |
| 137 | + |
| 138 | +8. Add missing (new) files sljitNativeTILEGX.c and sljitNativeTILEGX-encoder.c |
| 139 | + to the export list in Makefile.am (they were accidentally omitted from the |
| 140 | + 8.34 tarball). |
| 141 | + |
| 142 | +9. The informational output from pcretest used the phrase "starting byte set" |
| 143 | + which is inappropriate for the 16-bit and 32-bit libraries. As the output |
| 144 | + for "first char" and "need char" really means "non-UTF-char", I've changed |
| 145 | + "byte" to "char", and slightly reworded the output. The documentation about |
| 146 | + these values has also been (I hope) clarified. |
| 147 | + |
| 148 | +10. Another JIT related optimization: use table jumps for selecting the correct |
| 149 | + backtracking path, when more than four alternatives are present inside a |
| 150 | + bracket. |
| 151 | + |
| 152 | +11. Empty match is not possible, when the minimum length is greater than zero, |
| 153 | + and there is no \K in the pattern. JIT should avoid empty match checks in |
| 154 | + such cases. |
| 155 | + |
| 156 | +12. In a caseless character class with UCP support, when a character with more |
| 157 | + than one alternative case was not the first character of a range, not all |
| 158 | + the alternative cases were added to the class. For example, s and \x{17f} |
| 159 | + are both alternative cases for S: the class [RST] was handled correctly, |
| 160 | + but [R-T] was not. |
| 161 | + |
| 162 | +13. The configure.ac file always checked for pthread support when JIT was |
| 163 | + enabled. This is not used in Windows, so I have put this test inside a |
| 164 | + check for the presence of windows.h (which was already tested for). |
| 165 | + |
| 166 | +14. Improve pattern prefix search by a simplified Boyer-Moore algorithm in JIT. |
| 167 | + The algorithm provides a way to skip certain starting offsets, and usually |
| 168 | + faster than linear prefix searches. |
| 169 | + |
| 170 | +15. Change 13 for 8.20 updated RunTest to check for the 'fr' locale as well |
| 171 | + as for 'fr_FR' and 'french'. For some reason, however, it then used the |
| 172 | + Windows-specific input and output files, which have 'french' screwed in. |
| 173 | + So this could never have worked. One of the problems with locales is that |
| 174 | + they aren't always the same. I have now updated RunTest so that it checks |
| 175 | + the output of the locale test (test 3) against three different output |
| 176 | + files, and it allows the test to pass if any one of them matches. With luck |
| 177 | + this should make the test pass on some versions of Solaris where it was |
| 178 | + failing. Because of the uncertainty, the script did not used to stop if |
| 179 | + test 3 failed; it now does. If further versions of a French locale ever |
| 180 | + come to light, they can now easily be added. |
| 181 | + |
| 182 | +16. If --with-pcregrep-bufsize was given a non-integer value such as "50K", |
| 183 | + there was a message during ./configure, but it did not stop. This now |
| 184 | + provokes an error. The invalid example in README has been corrected. |
| 185 | + If a value less than the minimum is given, the minimum value has always |
| 186 | + been used, but now a warning is given. |
| 187 | + |
| 188 | +17. If --enable-bsr-anycrlf was set, the special 16/32-bit test failed. This |
| 189 | + was a bug in the test system, which is now fixed. Also, the list of various |
| 190 | + configurations that are tested for each release did not have one with both |
| 191 | + 16/32 bits and --enable-bar-anycrlf. It now does. |
| 192 | + |
| 193 | +18. pcretest was missing "-C bsr" for displaying the \R default setting. |
| 194 | + |
| 195 | +19. Little endian PowerPC systems are supported now by the JIT compiler. |
| 196 | + |
| 197 | +20. The fast forward newline mechanism could enter to an infinite loop on |
| 198 | + certain invalid UTF-8 input. Although we don't support these cases |
| 199 | + this issue can be fixed by a performance optimization. |
| 200 | + |
| 201 | +21. Change 33 of 8.34 is not sufficient to ensure stack safety because it does |
| 202 | + not take account if existing stack usage. There is now a new global |
| 203 | + variable called pcre_stack_guard that can be set to point to an external |
| 204 | + function to check stack availability. It is called at the start of |
| 205 | + processing every parenthesized group. |
| 206 | + |
| 207 | +22. A typo in the code meant that in ungreedy mode the max/min qualifier |
| 208 | + behaved like a min-possessive qualifier, and, for example, /a{1,3}b/U did |
| 209 | + not match "ab". |
| 210 | + |
| 211 | +23. When UTF was disabled, the JIT program reported some incorrect compile |
| 212 | + errors. These messages are silenced now. |
| 213 | + |
| 214 | +24. Experimental support for ARM-64 and MIPS-64 has been added to the JIT |
| 215 | + compiler. |
| 216 | + |
| 217 | +25. Change all the temporary files used in RunGrepTest to be different to those |
| 218 | + used by RunTest so that the tests can be run simultaneously, for example by |
| 219 | + "make -j check". |
| 220 | + |
| 221 | + |
4 | 222 | Version 8.34 15-December-2013
|
5 | 223 | -----------------------------
|
6 | 224 |
|
@@ -5311,7 +5529,7 @@ by an auxiliary program - but can then be edited by hand if required. There are
|
5311 | 5529 | now no calls to isalnum(), isspace(), isdigit(), isxdigit(), tolower() or
|
5312 | 5530 | toupper() in the code.
|
5313 | 5531 |
|
5314 |
| -7. Turn the malloc/free functions variables into pcre_malloc and pcre_free and |
| 5532 | +7. Turn the malloc/free funtions variables into pcre_malloc and pcre_free and |
5315 | 5533 | make them global. Abolish the function for setting them, as the caller can now
|
5316 | 5534 | set them directly.
|
5317 | 5535 |
|
|
0 commit comments