Improve mb_detect_encoding's recognition of Turkish text #10186

alexdowad · 2022-12-30T06:05:23Z

Add 4 codepoints commonly used to write Turkish text to our table of 'commonly used' Unicode codepoints. These are:

• U+011F LATIN SMALL LETTER G WITH BREVE
• U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE
• U+0131 LATIN SMALL LETTER DOTLESS I
• U+015F LATIN SMALL LETTER S WITH CEDILLA

This addresses the issue about Turkish which was mentioned in this thread: #8439

FYA @cmb69 @nikic @Girgias @kamil-tekiela @youkidearitai

Add 4 codepoints commonly used to write Turkish text to our table of 'commonly used' Unicode codepoints. These are: • U+011F LATIN SMALL LETTER G WITH BREVE • U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE • U+0131 LATIN SMALL LETTER DOTLESS I • U+015F LATIN SMALL LETTER S WITH CEDILLA

cmb69

Thank you! Looks like a nice improvement.

alexdowad · 2022-12-30T12:23:41Z

Merged. Thanks @cmb69 for the review.

github-actions bot added the Extension: mbstring label Dec 30, 2022

cmb69 approved these changes Dec 30, 2022

View reviewed changes

alexdowad closed this Dec 30, 2022

alexdowad deleted the turkish branch December 30, 2022 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve mb_detect_encoding's recognition of Turkish text #10186

Improve mb_detect_encoding's recognition of Turkish text #10186

Uh oh!

alexdowad commented Dec 30, 2022

Uh oh!

cmb69 left a comment

Uh oh!

alexdowad commented Dec 30, 2022

Uh oh!

Uh oh!

Improve mb_detect_encoding's recognition of Turkish text #10186

Improve mb_detect_encoding's recognition of Turkish text #10186

Uh oh!

Conversation

alexdowad commented Dec 30, 2022

Uh oh!

cmb69 left a comment

Choose a reason for hiding this comment

Uh oh!

alexdowad commented Dec 30, 2022

Uh oh!

Uh oh!