-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Fix bug in mb_get_substr_slow (sometimes outputs wrong number of characters) #12983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…acters) Thanks to Maurício Fauth for finding and reporting this bug. The bug was introduced in October 2022. It originally only affected text encodings which do not have a fixed byte width per characters and for which mbstring does not have an mblen_table. However, I recently made another change to mbstring, such that mb_substr no longer relies on the mblen_table even if one is available. Because of this change, the bug earlier introduced in October 2022 now affected a greater number of text encodings, including UTF-8.
This makes sense to me. |
@nielsdos Absolutely it is. This change was intended to target PHP-8.3, but I didn't enter the base branch correctly when opening the PR. |
/* Alex messed up when reimplementing mb_substr and, in cases where `from` is non-zero and | ||
* the number of characters to extract is more than 128, miscalculated where to end the substring | ||
* Thanks to Maurício Fauth for finding the issue */ | ||
var_dump(mb_substr('Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum dapibus feugiat ex non cursus. Pellentesque vestibulum tellus sit lectus.', 19, -1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make this a separate test file altogether?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kamil-tekiela Thanks for the suggestion. Personally I prefer to have this particular test here rather than multiplying the number of different test files, but certainly some might prefer to have each test case in a different file. In any case, I do appreciate your bringing out this suggestion.
Makes sense to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks very much, all! The next releases of PHP 8.3 and 8.4 will include this fix. |
Fixes #12972.
Thanks to @MauricioFauth for finding and reporting this bug.
The bug was introduced in October 2022. It originally only affected text encodings which do not have a fixed byte width per characters and for which mbstring does not have an mblen_table. However, I recently made another change to mbstring, such that mb_substr no longer relies on the mblen_table even if one is available. Because of this change, the bug earlier introduced in October 2022 now affected a greater number of text encodings, including UTF-8.
@Girgias @cmb69 @youkidearitai @nielsdos @kamil-tekiela