Skip to content

Add new function string_is_ascii #353

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

realityking
Copy link
Contributor

This pull requests makes both a C and PHP function string_is_ascii available. The C function grapheme_ascii_check in intl is removed and its use replaced by this function.

I didn't find any other similar functions like this in the codebase, if there are some please point them out. I'll fix their usage as well.

The same behavior can be achieved in userland but it's a common process when handling unicode data (for performance reasons and to prevent bugs). For example in Joomlae for very redirect send to Internet Explorer we need to check whether it's in the ASCII range or not to work around a bug.

Benchmark results are here: https://gist.github.com/realityking/5690703

--SKIPIF--
<?php

if(!extension_loaded('utf8')) die('skip ');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally wrote this function in an extension and forgot to remove the skip condition. Fixed now.

@lstrojny
Copy link
Contributor

lstrojny commented Jun 1, 2013

We definitely need an RFC for that. But besides that, it looks good.

@realityking
Copy link
Contributor Author

I'm happy to write one. My understanding was for simple functions like this it wasn't necessary.

@laruence
Copy link
Member

laruence commented Jun 2, 2013

about the function name exposed , I think it will be better to be consistent with is_string, is_int etc.
http://cn2.php.net/manual/en/function.is-string.php

so, maybe is_ascii? os is_ascii_string?

@smalyshev
Copy link
Contributor

string_is_ascii probably not the best name. We don't have consistent prefixes for string functions, unfortunately, but we have none starting with string_. We have str_something, we have strsomething, we have is_something, but no string_something. I think adding yet another prefix is not a good idea.

Otherwise, the code looks good.

@johannes
Copy link
Member

johannes commented Jun 2, 2013

No, the comparison with is_long, is_string, ... Doesn't apply as they compare with types. This function checks the actual value.

@realityking
Copy link
Contributor Author

Good point about the name. I've renamed the user facing function to str_is_ascii. I left the internal function untouched because there are a few functions that start with string_ in the same file (e.g. string_natural_compare_function) I gust got RFC karma, I'll try to write it up tonight.

@masakielastic
Copy link
Contributor

"mb_check_encoding($str, 'ASCII')" and "ctype_print($str) || ctype_cntrl($str)" are enough solutions. Symfony 2 spefify ctype functions for the requirements. Defining the fallback function for mb_check_encoding is easy since htmlspecialchars has an ability for validating various encodings (https://github.com/php/php-src/blob/master/ext/standard/html.c#L105). Considering PHP's history and existing projects, standardizing mb_check_encoding seems to be a good topic. RFC: Alternative implementation of mbstring using ICU is good stuff for that purpose.

if(!function_exists('mb_check_encoding')) {

    function mb_check_encoding($str, $encoding = 'UTF-8')
    {
        if ('ASCII' === $encoding) {
            return ctype_print($str) || ctype_cntrl($str);
        }

        $list = ['BIG-5' => 'BIG5', 'CP936' => 'GB2312'];

        if (array_key_exists($encoding, $list)) {

            $encoding = $list[$encoding];   

        }

        $str === htmlspecialchars_decode(htmlspecialchars($str, ENT_COMPAT, $encoding));
    }
}

function is_ascii($str)
{
    return mb_check_encoding($str, 'ASCII');
}

function is_utf8($str)
{
    return mb_check_encoding($str, 'UTF-8');
}

@nikic
Copy link
Member

nikic commented Sep 12, 2013

What's the state of this? As @masakielastic already pointed out str_is_ascii($str) is equivalent to mb_check_encoding($str, 'ASCII'). Does this mean that this PR can be closed or are there additional reasons to support this function?

@dsp
Copy link
Member

dsp commented Sep 16, 2013

Thanks for this PR. However, I think we don't need a specialized function as mb_ can already handle this. Therefore I am closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants