Skip to content

gh-127971: fix off-by-one read beyond the end of a string during search #132574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 13, 2025
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Tweak conditional phrasing to match loop terminating criteria, add co…
…mment

explaining why a guard is not necessary in adaptive_find.
  • Loading branch information
duaneg committed Jun 21, 2025
commit c07c23eb921bdb737cbf9fccd60bbdde32426f7d
19 changes: 15 additions & 4 deletions Objects/stringlib/fastsearch.h
Original file line number Diff line number Diff line change
Expand Up @@ -595,7 +595,7 @@ STRINGLIB(default_find)(const STRINGLIB_CHAR* s, Py_ssize_t n,
continue;
}
/* miss: check if next character is part of pattern */
if (i < w && !STRINGLIB_BLOOM(mask, ss[i+1])) {
if (i + 1 <= w && !STRINGLIB_BLOOM(mask, ss[i+1])) {
i = i + m;
}
else {
Expand All @@ -604,7 +604,7 @@ STRINGLIB(default_find)(const STRINGLIB_CHAR* s, Py_ssize_t n,
}
else {
/* skip: check if next character is part of pattern */
if (i < w && !STRINGLIB_BLOOM(mask, ss[i+1])) {
if (i + 1 <= w && !STRINGLIB_BLOOM(mask, ss[i+1])) {
i = i + m;
}
}
Expand Down Expand Up @@ -667,7 +667,16 @@ STRINGLIB(adaptive_find)(const STRINGLIB_CHAR* s, Py_ssize_t n,
return res + count;
}
}
/* miss: check if next character is part of pattern */

/* Miss: check if next character is part of pattern.
Note that in contrast to default_find and default_rfind we do
*not* need to prevent the algorithm from reading one character
beyond the last character in the input that the pattern could
start in. I.e. if i == w it is safe to read ss[i + 1] since the
input and pattern length requirements on when this variant
algorithm will be called ensure it will always be a valid part
of the input. In that case it doesn't matter what the character
read is since the loop will terminate regardless. */
if (!STRINGLIB_BLOOM(mask, ss[i+1])) {
i = i + m;
}
Expand All @@ -676,7 +685,9 @@ STRINGLIB(adaptive_find)(const STRINGLIB_CHAR* s, Py_ssize_t n,
}
}
else {
/* skip: check if next character is part of pattern */
/* Skip: check if next character is part of pattern.
See comment above re safety of accessing ss[i+1] when i == w.
*/
if (!STRINGLIB_BLOOM(mask, ss[i+1])) {
i = i + m;
}
Expand Down
Loading