Skip to content

Commit adbfde3

Browse files
committed
Fix performance bug in regexp's citerdissect/creviterdissect.
After detecting a sub-match "dissect" failure (i.e., a backref match failure) in the i'th sub-match of an iteration node, we should proceed by adjusting the attempted length of the i'th submatch. As coded, though, these functions changed the attempted length of the *last* sub-match, and only after exhausting all possibilities for that would they back up to adjust the next-to-last sub-match, and then the second-from-last, etc; all of which is wasted effort, since only changing the start or length of the i'th sub-match can possibly make it succeed. This oversight creates the possibility for exponentially bad performance. Fortunately the problem is masked in most cases by optimizations or constraints applied elsewhere; which explains why we'd not noticed it before. But it is possible to reach the problem with fairly simple, if contrived, regexps. Oversight in my commit 173e29a. That's pretty ancient now, so back-patch to all supported branches. Discussion: https://postgr.es/m/1808998.1629412269@sss.pgh.pa.us
1 parent 0c13ee1 commit adbfde3

File tree

1 file changed

+10
-8
lines changed

1 file changed

+10
-8
lines changed

src/backend/regex/regexec.c

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1098,8 +1098,8 @@ citerdissect(struct vars *v,
10981098
* Our strategy is to first find a set of sub-match endpoints that are
10991099
* valid according to the child node's DFA, and then recursively dissect
11001100
* each sub-match to confirm validity. If any validity check fails,
1101-
* backtrack the last sub-match and try again. And, when we next try for
1102-
* a validity check, we need not recheck any successfully verified
1101+
* backtrack that sub-match and try again. And, when we next try for a
1102+
* validity check, we need not recheck any successfully verified
11031103
* sub-matches that we didn't move the endpoints of. nverified remembers
11041104
* how many sub-matches are currently known okay.
11051105
*/
@@ -1187,12 +1187,13 @@ citerdissect(struct vars *v,
11871187
return REG_OKAY;
11881188
}
11891189

1190-
/* match failed to verify, so backtrack */
1190+
/* i'th match failed to verify, so backtrack it */
1191+
k = i;
11911192

11921193
backtrack:
11931194

11941195
/*
1195-
* Must consider shorter versions of the current sub-match. However,
1196+
* Must consider shorter versions of the k'th sub-match. However,
11961197
* we'll only ask for a zero-length match if necessary.
11971198
*/
11981199
while (k > 0)
@@ -1299,8 +1300,8 @@ creviterdissect(struct vars *v,
12991300
* Our strategy is to first find a set of sub-match endpoints that are
13001301
* valid according to the child node's DFA, and then recursively dissect
13011302
* each sub-match to confirm validity. If any validity check fails,
1302-
* backtrack the last sub-match and try again. And, when we next try for
1303-
* a validity check, we need not recheck any successfully verified
1303+
* backtrack that sub-match and try again. And, when we next try for a
1304+
* validity check, we need not recheck any successfully verified
13041305
* sub-matches that we didn't move the endpoints of. nverified remembers
13051306
* how many sub-matches are currently known okay.
13061307
*/
@@ -1394,12 +1395,13 @@ creviterdissect(struct vars *v,
13941395
return REG_OKAY;
13951396
}
13961397

1397-
/* match failed to verify, so backtrack */
1398+
/* i'th match failed to verify, so backtrack it */
1399+
k = i;
13981400

13991401
backtrack:
14001402

14011403
/*
1402-
* Must consider longer versions of the current sub-match.
1404+
* Must consider longer versions of the k'th sub-match.
14031405
*/
14041406
while (k > 0)
14051407
{

0 commit comments

Comments
 (0)