Skip to content

Commit 57a2d4a

Browse files
committed
Fix performance bug in regexp's citerdissect/creviterdissect.
After detecting a sub-match "dissect" failure (i.e., a backref match failure) in the i'th sub-match of an iteration node, we should proceed by adjusting the attempted length of the i'th submatch. As coded, though, these functions changed the attempted length of the *last* sub-match, and only after exhausting all possibilities for that would they back up to adjust the next-to-last sub-match, and then the second-from-last, etc; all of which is wasted effort, since only changing the start or length of the i'th sub-match can possibly make it succeed. This oversight creates the possibility for exponentially bad performance. Fortunately the problem is masked in most cases by optimizations or constraints applied elsewhere; which explains why we'd not noticed it before. But it is possible to reach the problem with fairly simple, if contrived, regexps. Oversight in my commit 173e29a. That's pretty ancient now, so back-patch to all supported branches. Discussion: https://postgr.es/m/1808998.1629412269@sss.pgh.pa.us
1 parent 92ce7f5 commit 57a2d4a

File tree

1 file changed

+10
-8
lines changed

1 file changed

+10
-8
lines changed

src/backend/regex/regexec.c

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1133,8 +1133,8 @@ citerdissect(struct vars *v,
11331133
* Our strategy is to first find a set of sub-match endpoints that are
11341134
* valid according to the child node's DFA, and then recursively dissect
11351135
* each sub-match to confirm validity. If any validity check fails,
1136-
* backtrack the last sub-match and try again. And, when we next try for
1137-
* a validity check, we need not recheck any successfully verified
1136+
* backtrack that sub-match and try again. And, when we next try for a
1137+
* validity check, we need not recheck any successfully verified
11381138
* sub-matches that we didn't move the endpoints of. nverified remembers
11391139
* how many sub-matches are currently known okay.
11401140
*/
@@ -1222,12 +1222,13 @@ citerdissect(struct vars *v,
12221222
return REG_OKAY;
12231223
}
12241224

1225-
/* match failed to verify, so backtrack */
1225+
/* i'th match failed to verify, so backtrack it */
1226+
k = i;
12261227

12271228
backtrack:
12281229

12291230
/*
1230-
* Must consider shorter versions of the current sub-match. However,
1231+
* Must consider shorter versions of the k'th sub-match. However,
12311232
* we'll only ask for a zero-length match if necessary.
12321233
*/
12331234
while (k > 0)
@@ -1338,8 +1339,8 @@ creviterdissect(struct vars *v,
13381339
* Our strategy is to first find a set of sub-match endpoints that are
13391340
* valid according to the child node's DFA, and then recursively dissect
13401341
* each sub-match to confirm validity. If any validity check fails,
1341-
* backtrack the last sub-match and try again. And, when we next try for
1342-
* a validity check, we need not recheck any successfully verified
1342+
* backtrack that sub-match and try again. And, when we next try for a
1343+
* validity check, we need not recheck any successfully verified
13431344
* sub-matches that we didn't move the endpoints of. nverified remembers
13441345
* how many sub-matches are currently known okay.
13451346
*/
@@ -1433,12 +1434,13 @@ creviterdissect(struct vars *v,
14331434
return REG_OKAY;
14341435
}
14351436

1436-
/* match failed to verify, so backtrack */
1437+
/* i'th match failed to verify, so backtrack it */
1438+
k = i;
14371439

14381440
backtrack:
14391441

14401442
/*
1441-
* Must consider longer versions of the current sub-match.
1443+
* Must consider longer versions of the k'th sub-match.
14421444
*/
14431445
while (k > 0)
14441446
{

0 commit comments

Comments
 (0)