Symfony language code mangling makes it hard to reuse #2468

goba · 2011-10-25T11:34:32Z

Symfony mangles language codes for its internal use to (a) uppercase all components but the first and (b) replace component separating dashes with underscores. This is maybe conforming to a standard or common practice, but the W3C never suggested people use language codes like that and their recent recommendations with HTML5 are clearly very far from how Symfony treats language codes.

There is certainly nothing wrong with picking a language code format and doing conversion on incoming data and outgoing data. We in Drupal try to avoid this by using a standard that is much closer to the W3C specs. We use all lowercase codes (which is not strictly W3C) and we use dashes (like the W3C). Now that Drupal 8 is adapting Symfony in certain places, we cannot reuse the language handling code at all since the language codes are so far from how the web expects them, and we don't want to convert back and forth if we can work with formats that are exactly or at least much more closely resemble web standards.

More information on language codes on the web: http://www.w3.org/International/articles/language-tags/

lsmith77 · 2011-10-25T12:44:43Z

if we want to maintain BC, we could add an option to determine if to return unix or w3c "formatted" language codes ..

goba · 2011-10-25T12:51:55Z

It is not really easy to generate W3C formatted language codes BTW unless you have a database of components. Some examples of language codes as per the W3C standards (directly from the page I linked)

en-UK
fr-CA
zh-Hans (note only first letter is capital in second component)
zh-yue (note, no capital)
sl-IT-nedis (yeah)

The capitalization depends on which component that language code part is coming from and components can be removed. So you can only generate W3C language codes proper if you have a registry of which string belongs to which component.

Drupal uses all lowercase components but otherwise conforms to the W3C standard.

pounard · 2011-10-25T15:11:44Z

FYI ZF2 uses UNIX internally too. Seems natural to me. I'd agree more to lsmith77 this can be an option.

goba · 2011-10-26T07:41:51Z

Some competitive analysis of systems mostly focusing on application level solutions.

Ones using "web language tags":

PHP's locale class uses RFC4646 language codes (which is predecessor to BCP 47 that W3C language tags currently refer to): http://php.net/manual/en/class.locale.php
Joomla does the same: http://docs.joomla.org/Localisation
Drupal (although all lowercase)

Using Unix language codes:

Moodle: http://docs.moodle.org/dev/Table_of_locales
Wordpress: http://codex.wordpress.org/Installing_WordPress_in_Your_Language
Typo3: http://www.training-typo3.com/2008/12/15/typo3-multi-language/

Using a different approach:

ezPublish uses three letter language codes with hyphens and country codes: http://doc.ez.no/eZ-Publish/Technical-manual/4.4/Features/Multi-language/Configuring-your-site-locale
Plone: just uses two letter language codes, period: http://plone.org/documentation/manual/plone-community-developer-documentation/i18n/language
Zend: seems to just use two letter language codes (?): http://framework.zend.com/manual/en/zend.translate.using.html
CakePHP uses three letter language codes: http://www.sanisoft.com/blog/2007/06/09/multilingual-apps-with-cakephp/

Does this make the picture cleaner? (I doubt).

stealth35 · 2011-10-26T07:52:01Z

For ICU : http://source.icu-project.org/repos/icu/icu/trunk/source/data/locales/ , use by PHP Intl

goba · 2011-10-26T09:04:41Z

Yeah, looking at the link, it looks like ICU is using similar composition to W3C but with underscores. Eg. they use zh_Hant_HK, where W3C would say zh-Hant-HK and Symfony would say zh_HANT_HK (note case and underscore differences).

stealth35 · 2011-10-26T09:18:20Z

And ICU don't care about separators, 👍

<?php
print_r(locale_parse('zh_Hant_HK'));
print_r(locale_parse('zh-Hant-HK'));

Array
(
    [language] => zh
    [script] => Hant
    [region] => HK
)
Array
(
    [language] => zh
    [script] => Hant
    [region] => HK
)

goba · 2011-10-26T09:23:51Z

Two more data points from the two most popular mobile systems.

All versions of iOS (and Mac OS X from Tiger (10.4) released 6 years ago) use BCP 47 language tags too (exactly as the W3C). http://developer.apple.com/library/ios/#documentation/MacOSX/Conceptual/BPInternational/Articles/LanguageDesignations.html#//apple_ref/doc/uid/20002144-SW3

Android is referring to ICU as their locale source, however they are using Unix language codes (eg. zh_CN and zh_TW vs. ICU's zh_Hant and zh_Hans): http://developer.android.com/reference/java/util/Locale.html

goba · 2011-10-26T10:06:15Z

stealth35: well, its not just an underscore problem. If zh-Hant is coming in, Symfony converts that to zh_HANT, but that is not a Unix language code. It should convert it to zh_TW if it would use Unix language codes, right?

(Also looked at the list of languages supported by Ubuntu at https://translations.launchpad.net/ubuntu).

pierrejoye · 2011-10-26T11:23:33Z

I would go with ICU only as soon as possible. This is a standard used by many applications and won't change but to get new codes.

intl is available since 5.3.0 by default and should be used for any work related to internationalization. (set|get)locale should be banned from any modern application, it is not portable, crashes more than it should and is per process instead of being per resource/object or at least per request.

stealth35 · 2011-10-26T13:07:27Z

@goba for ICU zh_CN is an alias of zh_Hans_CN

goba · 2011-10-26T13:28:59Z

stealth35: you are proposing Symfony include the alias resolving process in part of the normalization? Currently Symfony would give you zh_CN or zh_HANS_CN depending on incoming data and not treat them as equal.

jakzal · 2016-03-09T17:53:46Z

Few clarifications:

Symfony doesn't "mangle language codes". It uses a locale, which might be a language code but it doesn't need to be. It is defined as (also here):

Locale [...] is either the two letter ISO 639-1 language code (e.g. fr), or the language code followed by an underscore (_), then the ISO 3166-1 alpha-2 country code (e.g. fr_FR for French/France).
BCP47 is not an accepted standard yet. As far as I know there's no data or libraries we could import and use like we do with ICU. As you said W3C formatted language codes are hard to generate.

PHP's intl extension uses the ICU data. We've got pretty-much built in support for it, and I don't see why we should change (and break BC).

This is an old issue and I'm gonna close it. I guess Drupal has solved this issue by now in their own way. If anyone's got more input we can always re-start the discussion.

stof added Enhancement and removed Bug labels Aug 18, 2014

cblegare mentioned this issue Sep 18, 2014

Documentation for language form type field is misleading #11950

Closed

antonioribeiro mentioned this issue Jul 19, 2015

Subtagged languages are inconsistent with Symfony jenssegers/date#145

Closed

jakzal closed this as completed Mar 9, 2016

fabpot mentioned this issue Sep 11, 2017

[Intl] Locale::getFallback does not split on dash #24154

Closed

tabacitu mentioned this issue Oct 30, 2018

Changing Brazilian portuguese lang code to follow the international standard Laravel-Backpack/Base#294

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Symfony language code mangling makes it hard to reuse #2468

Symfony language code mangling makes it hard to reuse #2468

goba commented Oct 25, 2011

lsmith77 commented Oct 25, 2011

goba commented Oct 25, 2011

pounard commented Oct 25, 2011

goba commented Oct 26, 2011

stealth35 commented Oct 26, 2011

goba commented Oct 26, 2011

stealth35 commented Oct 26, 2011

goba commented Oct 26, 2011

goba commented Oct 26, 2011

pierrejoye commented Oct 26, 2011

stealth35 commented Oct 26, 2011

goba commented Oct 26, 2011

jakzal commented Mar 9, 2016

Symfony language code mangling makes it hard to reuse #2468

Symfony language code mangling makes it hard to reuse #2468

Comments

goba commented Oct 25, 2011

lsmith77 commented Oct 25, 2011

goba commented Oct 25, 2011

pounard commented Oct 25, 2011

goba commented Oct 26, 2011

stealth35 commented Oct 26, 2011

goba commented Oct 26, 2011

stealth35 commented Oct 26, 2011

goba commented Oct 26, 2011

goba commented Oct 26, 2011

pierrejoye commented Oct 26, 2011

stealth35 commented Oct 26, 2011

goba commented Oct 26, 2011

jakzal commented Mar 9, 2016