Skip to content

Improve mb_detect_encoding's recognition of Turkish text #10186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

alexdowad
Copy link
Contributor

Add 4 codepoints commonly used to write Turkish text to our table of 'commonly used' Unicode codepoints. These are:

• U+011F LATIN SMALL LETTER G WITH BREVE
• U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE
• U+0131 LATIN SMALL LETTER DOTLESS I
• U+015F LATIN SMALL LETTER S WITH CEDILLA

This addresses the issue about Turkish which was mentioned in this thread: #8439

FYA @cmb69 @nikic @Girgias @kamil-tekiela @youkidearitai

Add 4 codepoints commonly used to write Turkish text to our table
of 'commonly used' Unicode codepoints. These are:

• U+011F LATIN SMALL LETTER G WITH BREVE
• U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE
• U+0131 LATIN SMALL LETTER DOTLESS I
• U+015F LATIN SMALL LETTER S WITH CEDILLA
Copy link
Member

@cmb69 cmb69 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Looks like a nice improvement.

@alexdowad alexdowad closed this Dec 30, 2022
@alexdowad alexdowad deleted the turkish branch December 30, 2022 12:23
@alexdowad
Copy link
Contributor Author

Merged. Thanks @cmb69 for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants