-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[WIP][Routing] Added unicode support #3629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you add some test using unicode to avoid regressions ? |
what if UTF-8 support in PCRE is not enabled? What if the project does not use UTF-8 as its primary charset? |
@fabpot - It's pretty rare not to see UTF-8 enabled on a system (because it is a dependency of many packages on enterprise linux OSs). OT: the useage of UTF-8 is growing by leaps and bounds http://w3techs.com/technologies/history_overview/character_encoding |
@fabpot: I could add support for detecting if UTF-8 is enabled or not in PCRE but IMHO this should be a must for any system by now anyway and I can't remember the last time I've encountered such a system without this. I could patch the app/check.php file from symfony-standard if you think that would help out as well. Tho I realize that not everyone might benefit from this patch, I don't think this will slow down the system in any forseeable way or impact the projects otherwise. I could do some performance tests as well to see where we stand with this. @Drak thanks for the link, pretty interesting. |
@stof can you tell me where/what other test should I add for this so that I can implement this and make this PR available for merging? Also, should I add the detection for UTF-8 in PCRE or not? If not I could just add something to the readme so that people know about the requirement. |
Hi, Do I have a go or no go for this PR? Thanks. |
@stof thanks for pointing this out. I'll rebase it once I have some feedback for this. I'll also add some changelog entry that should mention the change and open a PR for the docs but first I need to know if I should invest some time in this or not. |
Hi, I've rebased to the latest master, added changelog info and mentioned this in the docs as well. Best regards. |
It seems that the latest changes to the routing component made the unicode support on by default so I guess this PR original intent is pointless now. It can still be merged as it contains just missing tests for this case and, what I believe to be, a nice to have mention for the feature in the changelog. Let me know if I'm missing something about the support for unicode but as far as I can see both my computer and Travis seem to have the same opinion about the tests. |
ping @fabpot Can you please merge this so in case the unicode support for routes disappears at some point, people will know? Also, I've run the tests in Routing component separately as the latest changes in master regarding Redis/Memcached break the tests both on my laptop and on Travis. Thanks! |
@dlsniper what do you mean by unicode support ? (i.e. I don't think we can claim unicode support without the |
@vicb Then we have an interesting case. I can't check with PHP 5.3.2 but in the Ubuntu 12.04 PHP 5.3.10 I can clearly see the tests passing, regardless of input characters. Also I could do this before, using 5.3.6 (Ubuntu 11.10). Also, I've been running the dev environment against the changes in the routing component for about 2 weeks now and I couldn't see any problems with this. Should I add the |
I think this unicode thing is somewhat tricky. One could also argue the test you're doing are even wrong. Because UTF-8 chars are not allowed in URLs/URIs. So they need to be percent-encoded to be valid. So generating a URL needs to encode all unallowed chars (e.g. segments with unicode chars). |
@Tobion If the tests are not right, then please do tell. I'm very interested in having this the proper way as in fact, like I've said above, I'm using this patch in dev env and I don't have any problems with it and in production I'm using the older Routing component but with the same patch, the So far I haven't had any problems on production related to this but if in fact it's just pure luck, then I'll take any advice I can receive, Also while you might argue that pretty URLs are not something really needed, try explain that to my marketing director, to the SEO guys and so on :) @vicb meanwhile, I've updated the tests and added the missing modifier. |
You don't seem to have understood what I was saying. It has nothing to do with pretty URLs in SEO context.
It shows that only the variables get url encoded but in fact all segments must be url encoded, i.e. static text too. Otherwise the generated URL contains invalid chars and does not conform to the RFC. So it should be
|
Hm there was an issue reported by some Russian guy which had the problem of matching the URI with Cyrillic characters. |
The Matcher and Dumper etc should probably work on the encoded value instead of the decoded one. Then the preg_match would also not need the |
Too bad the concrete test case for #2375 is missing because I don't see how a simple requirement change would solve his problem with Cryillic chars. His problem and the solution does not seem to be related to each other. |
It seems that we are using rawurlencode which does the enconding part: https://github.com/symfony/symfony/blob/master/src/Symfony/Component/Routing/Generator/UrlGenerator.php#L121 I didn't had time to take a look at the matcher but my guess is that it's using rawurldecode :) This is limited unfortunately for default parameters so I'll have a look on how this can be fixed. Also I'll add some more tests, using Greek, Arabic, Armenian, Chinese, Vietnamese and HIndi. I believe that should do it, no? |
Hm, it get's even more tricky. What if the delimiter between two variables is a unicode character? This will probably break everything. Both the current implementation but also my idea with using url encoded value in all comparisions. That's because the default regex exclusion in url encoded form won't work, e.g.: [^%D0] |
So I would reason, that we don't have full unicode support currently. To support all cases requires much research and testing. |
@Tobion yes, I've seen the non-unicode safe functions being used in the Routing component. I've renamed this to WIP as there's indeed much work to be done and it seems that I'm just a lucky case in using this on live as I do have a small use case scenario :) I'm working on adding some tests to this, like mentioned above and see from there how we can work on this. Thanks for your help so far :) |
I was talking about: #2899 |
@mvrhov I believe that the current routing system should take care of the issue now but if it doesn't, let me know. |
The compiler is not unicode aware |
@vicb guess I've missed that because the tests are passing and for some reason the expression starts with Thanks |
I think adding a 'u' would not be enough: you would need to take into account the case where the separator could be a unicode character:
|
I've not read the entire discussion but allowing unicode characters for the separator is probably not needed (I don't see any valid use case for that -- we are already quite generous in the range od characters you can use for separators). Let's keep it simple. |
@fabpot so if you have a unicode char between two variables, e.g. |
@@ -124,7 +124,7 @@ protected function doGenerate($variables, $defaults, $requirements, $tokens, $pa | |||
$optional = false; | |||
} | |||
} elseif ('text' === $token[0]) { | |||
$url = $token[1].$url; | |||
$url = str_replace(array_keys($this->decodedChars), $this->decodedChars, rawurlencode($token[1])).$url; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strtr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will patch in future commit.
@Tobion If unicode is not allowed for separators Some questions:
|
Huh? You mean |
@vicb For PCRE detection I think we can use something like here: http://drupal.org/node/587362#comment-3411540 As for making it optional, I'm not sure, it could be done if we need to make it optional. @Tobion No, currently you won't receive an error if you try something like |
…s (nicolas-grekas) This PR was merged into the 3.2-dev branch. Discussion ---------- [Routing] Add seamless support for unicode requirements | Q | A | ------------- | --- | Branch? | master | Bug fix? | no | New feature? | yes | BC breaks? | no | Deprecations? | yes | Tests pass? | yes | Fixed tickets | #3629, #5236, #19562 | License | MIT | Doc PR | symfony/symfony-docs#6890 This PR adds unicode support to route matching and generation by automatically adding the `u` modifier to regexps that use either unicode characters or unicode enabled character classes (e.g. `\p...` `\x{...}` `\X`). As a side note, if one wants to match a single unicode character (vs a single byte), one should use `\PM` or `\X` instead of `.` *or* set the `unicode` parameter to true. Commits ------- a829d34 [Routing] Add seamless support for unicode requirements
Bug fix: yes
Feature addition: not sure
Backwards compatibility break: no
Symfony2 tests pass:
Fixes the following tickets: #4166
Todo: -
This is PR started when the Routing component didn't had support for unicode routes but the latest changes to it seems to have changed its state.
I've basically just added some missing tests for this feature and noted the change in the changelog as it seems pretty nice to know about.