Skip to content

Symfony language code mangling makes it hard to reuse #2468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
goba opened this issue Oct 25, 2011 · 13 comments
Closed

Symfony language code mangling makes it hard to reuse #2468

goba opened this issue Oct 25, 2011 · 13 comments

Comments

@goba
Copy link

goba commented Oct 25, 2011

Symfony mangles language codes for its internal use to (a) uppercase all components but the first and (b) replace component separating dashes with underscores. This is maybe conforming to a standard or common practice, but the W3C never suggested people use language codes like that and their recent recommendations with HTML5 are clearly very far from how Symfony treats language codes.

There is certainly nothing wrong with picking a language code format and doing conversion on incoming data and outgoing data. We in Drupal try to avoid this by using a standard that is much closer to the W3C specs. We use all lowercase codes (which is not strictly W3C) and we use dashes (like the W3C). Now that Drupal 8 is adapting Symfony in certain places, we cannot reuse the language handling code at all since the language codes are so far from how the web expects them, and we don't want to convert back and forth if we can work with formats that are exactly or at least much more closely resemble web standards.

More information on language codes on the web: http://www.w3.org/International/articles/language-tags/

@lsmith77
Copy link
Contributor

if we want to maintain BC, we could add an option to determine if to return unix or w3c "formatted" language codes ..

@goba
Copy link
Author

goba commented Oct 25, 2011

It is not really easy to generate W3C formatted language codes BTW unless you have a database of components. Some examples of language codes as per the W3C standards (directly from the page I linked)

  • en-UK
  • fr-CA
  • zh-Hans (note only first letter is capital in second component)
  • zh-yue (note, no capital)
  • sl-IT-nedis (yeah)

The capitalization depends on which component that language code part is coming from and components can be removed. So you can only generate W3C language codes proper if you have a registry of which string belongs to which component.

Drupal uses all lowercase components but otherwise conforms to the W3C standard.

@pounard
Copy link
Contributor

pounard commented Oct 25, 2011

FYI ZF2 uses UNIX internally too. Seems natural to me. I'd agree more to lsmith77 this can be an option.

@goba
Copy link
Author

goba commented Oct 26, 2011

Some competitive analysis of systems mostly focusing on application level solutions.

Ones using "web language tags":

Using Unix language codes:

Using a different approach:

Does this make the picture cleaner? (I doubt).

@stealth35
Copy link
Contributor

@goba
Copy link
Author

goba commented Oct 26, 2011

Yeah, looking at the link, it looks like ICU is using similar composition to W3C but with underscores. Eg. they use zh_Hant_HK, where W3C would say zh-Hant-HK and Symfony would say zh_HANT_HK (note case and underscore differences).

@stealth35
Copy link
Contributor

And ICU don't care about separators, 👍

<?php
print_r(locale_parse('zh_Hant_HK'));
print_r(locale_parse('zh-Hant-HK'));
Array
(
    [language] => zh
    [script] => Hant
    [region] => HK
)
Array
(
    [language] => zh
    [script] => Hant
    [region] => HK
)

@goba
Copy link
Author

goba commented Oct 26, 2011

Two more data points from the two most popular mobile systems.

All versions of iOS (and Mac OS X from Tiger (10.4) released 6 years ago) use BCP 47 language tags too (exactly as the W3C). http://developer.apple.com/library/ios/#documentation/MacOSX/Conceptual/BPInternational/Articles/LanguageDesignations.html#//apple_ref/doc/uid/20002144-SW3

Android is referring to ICU as their locale source, however they are using Unix language codes (eg. zh_CN and zh_TW vs. ICU's zh_Hant and zh_Hans): http://developer.android.com/reference/java/util/Locale.html

@goba
Copy link
Author

goba commented Oct 26, 2011

stealth35: well, its not just an underscore problem. If zh-Hant is coming in, Symfony converts that to zh_HANT, but that is not a Unix language code. It should convert it to zh_TW if it would use Unix language codes, right?

(Also looked at the list of languages supported by Ubuntu at https://translations.launchpad.net/ubuntu).

@pierrejoye
Copy link

I would go with ICU only as soon as possible. This is a standard used by many applications and won't change but to get new codes.

intl is available since 5.3.0 by default and should be used for any work related to internationalization. (set|get)locale should be banned from any modern application, it is not portable, crashes more than it should and is per process instead of being per resource/object or at least per request.

@stealth35
Copy link
Contributor

@goba for ICU zh_CN is an alias of zh_Hans_CN

@goba
Copy link
Author

goba commented Oct 26, 2011

stealth35: you are proposing Symfony include the alias resolving process in part of the normalization? Currently Symfony would give you zh_CN or zh_HANS_CN depending on incoming data and not treat them as equal.

@jakzal
Copy link
Contributor

jakzal commented Mar 9, 2016

Few clarifications:

  • Symfony doesn't "mangle language codes". It uses a locale, which might be a language code but it doesn't need to be. It is defined as (also here):

    Locale [...] is either the two letter ISO 639-1 language code (e.g. fr), or the language code followed by an underscore (_), then the ISO 3166-1 alpha-2 country code (e.g. fr_FR for French/France).

  • BCP47 is not an accepted standard yet. As far as I know there's no data or libraries we could import and use like we do with ICU. As you said W3C formatted language codes are hard to generate.

PHP's intl extension uses the ICU data. We've got pretty-much built in support for it, and I don't see why we should change (and break BC).

This is an old issue and I'm gonna close it. I guess Drupal has solved this issue by now in their own way. If anyone's got more input we can always re-start the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants