Skip to content

Fix bug in mb_get_substr_slow (sometimes outputs wrong number of characters) #12983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 21, 2023

Conversation

alexdowad
Copy link
Contributor

Fixes #12972.

Thanks to @MauricioFauth for finding and reporting this bug.

The bug was introduced in October 2022. It originally only affected text encodings which do not have a fixed byte width per characters and for which mbstring does not have an mblen_table. However, I recently made another change to mbstring, such that mb_substr no longer relies on the mblen_table even if one is available. Because of this change, the bug earlier introduced in October 2022 now affected a greater number of text encodings, including UTF-8.

@Girgias @cmb69 @youkidearitai @nielsdos @kamil-tekiela

…acters)

Thanks to Maurício Fauth for finding and reporting this bug.

The bug was introduced in October 2022. It originally only affected
text encodings which do not have a fixed byte width per characters
and for which mbstring does not have an mblen_table. However, I recently
made another change to mbstring, such that mb_substr no longer relies
on the mblen_table even if one is available. Because of this change,
the bug earlier introduced in October 2022 now affected a greater
number of text encodings, including UTF-8.
@nielsdos
Copy link
Member

This makes sense to me.
One question: if this was introduced in 2022, then 8.3 is affected too for other encodings right?

@alexdowad
Copy link
Contributor Author

@nielsdos Absolutely it is.

This change was intended to target PHP-8.3, but I didn't enter the base branch correctly when opening the PR.

Comment on lines +136 to +139
/* Alex messed up when reimplementing mb_substr and, in cases where `from` is non-zero and
* the number of characters to extract is more than 128, miscalculated where to end the substring
* Thanks to Maurício Fauth for finding the issue */
var_dump(mb_substr('Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vestibulum dapibus feugiat ex non cursus. Pellentesque vestibulum tellus sit lectus.', 19, -1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this a separate test file altogether?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kamil-tekiela Thanks for the suggestion. Personally I prefer to have this particular test here rather than multiplying the number of different test files, but certainly some might prefer to have each test case in a different file. In any case, I do appreciate your bringing out this suggestion.

@youkidearitai
Copy link
Contributor

Makes sense to me.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexdowad alexdowad merged commit bb6ceec into php:master Dec 21, 2023
@alexdowad alexdowad deleted the fixbug branch December 21, 2023 07:35
@alexdowad
Copy link
Contributor Author

Thanks very much, all! The next releases of PHP 8.3 and 8.4 will include this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mb_substr returns a different value in PHP 8.4
6 participants