-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Add new function string_is_ascii #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
--SKIPIF-- | ||
<?php | ||
|
||
if(!extension_loaded('utf8')) die('skip '); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally wrote this function in an extension and forgot to remove the skip condition. Fixed now.
We definitely need an RFC for that. But besides that, it looks good. |
I'm happy to write one. My understanding was for simple functions like this it wasn't necessary. |
about the function name exposed , I think it will be better to be consistent with is_string, is_int etc. so, maybe is_ascii? os is_ascii_string? |
string_is_ascii probably not the best name. We don't have consistent prefixes for string functions, unfortunately, but we have none starting with string_. We have str_something, we have strsomething, we have is_something, but no string_something. I think adding yet another prefix is not a good idea. Otherwise, the code looks good. |
No, the comparison with is_long, is_string, ... Doesn't apply as they compare with types. This function checks the actual value. |
Good point about the name. I've renamed the user facing function to str_is_ascii. I left the internal function untouched because there are a few functions that start with string_ in the same file (e.g. string_natural_compare_function) I gust got RFC karma, I'll try to write it up tonight. |
"mb_check_encoding($str, 'ASCII')" and "ctype_print($str) || ctype_cntrl($str)" are enough solutions. Symfony 2 spefify ctype functions for the requirements. Defining the fallback function for mb_check_encoding is easy since htmlspecialchars has an ability for validating various encodings (https://github.com/php/php-src/blob/master/ext/standard/html.c#L105). Considering PHP's history and existing projects, standardizing mb_check_encoding seems to be a good topic. RFC: Alternative implementation of mbstring using ICU is good stuff for that purpose.
|
What's the state of this? As @masakielastic already pointed out |
Thanks for this PR. However, I think we don't need a specialized function as mb_ can already handle this. Therefore I am closing this PR. |
This pull requests makes both a C and PHP function string_is_ascii available. The C function grapheme_ascii_check in intl is removed and its use replaced by this function.
I didn't find any other similar functions like this in the codebase, if there are some please point them out. I'll fix their usage as well.
The same behavior can be achieved in userland but it's a common process when handling unicode data (for performance reasons and to prevent bugs). For example in Joomlae for very redirect send to Internet Explorer we need to check whether it's in the ASCII range or not to work around a bug.
Benchmark results are here: https://gist.github.com/realityking/5690703