Skip to content

[2.2] [Translation] Po/Mo file readers do not parse plurization rules #2630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Nov 12, 2011 · 31 comments
Closed

[2.2] [Translation] Po/Mo file readers do not parse plurization rules #2630

ghost opened this issue Nov 12, 2011 · 31 comments

Comments

@ghost
Copy link

ghost commented Nov 12, 2011

I've been experimenting with the new PO/MO file readers that @stealth35 added recently. There is one very unfortunate thing missing from this however, is that they don't parse the plurization rules. In Gettext, these are part of the header so the framework doesn't need to know hard coded values as is required in Symfony2. In order for these readers to be useful, they really need to parse this information. It will require an extra property in the MessageCatalog class I guess.

@stealth35
Copy link
Contributor

@Drak I just add the Dumpers not Loaders, The loaders has very poors, but I work on it, see #2412

@ghost
Copy link
Author

ghost commented Dec 25, 2011

bump :-)

fabpot added a commit that referenced this issue Jan 4, 2012
Commits
-------

f4890c2 [Translation] Po/MoFileLoader parse plurization rules

Discussion
----------

[Translation] Po/MoFileLoader parse plurization rules

Bug fix: yes
Feature addition: no
Backwards compatibility break: no
Symfony2 tests pass: [![Build Status](https://secure.travis-ci.org/stealth35/symfony.png?branch=fix_2630)](http://travis-ci.org/stealth35/symfony)Fixes the following tickets: #2630
Todo: Not happy with the pluralize style

```
msgid "foo"
msgid_plural "foos"
msgstr[0] "bar"
msgstr[1] "bars"
```
to

```
array (
    'foo' => 'bar',
    'foos' => '{0} bar|{1} bars'
)
```
@ghost
Copy link
Author

ghost commented Jan 5, 2012

@fabpot @stealth35 I am not sure what the associated PR does, but I am not sure it address what I talked about, that is, that .PO and .MO files have a header which gives the formula of the language puralisation. e.g.

https://github.com/zikula-communities/german/blob/master/locale/de/LC_MESSAGES/zikula.po#L14

"Plural-Forms: nplurals=2; plural=(n != 1);\n"

This formula is required to understand whether to use singular or plural rules and also which plural translation to use (since there can be multiple). The formula needs to be parsed somehow, e.g.

https://github.com/zikula/core/blob/master/src/lib/i18n/ZMO.php#L487 although this implementation cannot be used since it's polluted by the GPL.

Parsing the plural formula decouples the Translation component from having to know about pluralisation rules entirely (for gettext).

@mvrhov
Copy link

mvrhov commented Jan 5, 2012

drak, but other translation formats don't support this so afterall this has to be hard coded into the symfony. Hell even the native Delphi gettext mo file reader implementation I'm using in my Delphi apps has this information hard coded.

@ghost
Copy link
Author

ghost commented Jan 5, 2012

@mvrhov For other formats yes, but for the Gettext implementation, it should read the headers because that is part of the Gettext standard - we can't half implement a standard - I don't think being half-baked here does Symfony2 any favours.

There are dozens of pluralization rules and hundred and hundreds of locales - there is simply no way to be complete and .PO and .MO must have a pluralisation header, so it must be used. Symfony2 has a list of pluralisation rules for the other formats because I assume that makes up for the lack of pluralisation standards in those formats, but Gettext is the most mature and the most widely implemented standard of them all so implementing it fully is important.

@stealth35
Copy link
Contributor

@Drak I got you, left open your issue (I waited for you for the #3023 TODO, but @fabpot merge)

@ghost
Copy link
Author

ghost commented Jan 5, 2012

@stealth35 - ah, I see. You need to keep [WIP] in the pull request title to tell @fabpot it's not ready for merge, but no problem, we wait for your next PR, I was just worried :)

Thanks!

@ghost
Copy link
Author

ghost commented Mar 14, 2012

@stealth35 - hey, any update on this PR?

@stealth35
Copy link
Contributor

hi @Drak, it's an hard thing, the easy way it's to use eval, but I see a security hole here, so actually I don't know how to handle this

@ghost
Copy link
Author

ghost commented Mar 15, 2012

@stealth35 I am not sure eval is a security hole since the .mo file should not be writable and frankly if someone has access to the FS, you have more to worry about than injecting a header to a mo file. For this reason I'm not sure we really need to lex the nplurals rule.

@sstok
Copy link
Contributor

sstok commented Mar 25, 2012

Using eval() is only a problem when its done in production (as it may been disabled).
About the security concern, is there a way to-do some validation (like only allowing normal charters for example)?

@stof
Copy link
Member

stof commented Apr 4, 2012

@Drak what is still missing here ?

@ghost
Copy link
Author

ghost commented Apr 4, 2012

@stof - Gettext .mo files have the pluralisation formula in the header. It must be read by the mo reader and used when making plural calculations. If @stealth35 has no time for this, I can take up the task no problem.

@fabpot
Copy link
Member

fabpot commented May 7, 2012

Anyone working on this? That would be better if we can ship that with Symfony 2.1.

@jmccaffrey42
Copy link

CakePHP has this functionality I believe, I think it would be trivial to import it from there. Would anyone object to me doing that and creating a pull request?

@jmccaffrey42
Copy link

It seems the way they implemented it is incredibly simple, yet because the landscape of languages is finite, pretty effective I would immagine.

https://github.com/cakephp/cakephp/blob/2.1/lib/Cake/I18n/I18n.php#L254

@sstok
Copy link
Contributor

sstok commented May 8, 2012

Well CakePHP is released under MIT so thats great news! And it does not use eval() 👍

@ghost
Copy link
Author

ghost commented May 9, 2012

I the cake code is missing a case for nplurals=6 (Arabic).

I don't really like that it's guessing - the plural rule is precise: but if it means a quick solution for this ticket, then OK.

I'd like to reiterate that inline evaluation is perfectly ok in this case and makes plurals 100% accurate - there is no security vector that wouldn't exist anyway if the filesystem was writable and the Plural-Forms: header could and should be validated via regex anyway once at parse-time.

So long as any implementation is clean-room, we are OK because the PHP Gettext authors are extremely difficult people to deal with (may people have tried to get them to change the license to LGPL but they refused time and again).

For reference this is a list of rules that need to be covered: http://translate.sourceforge.net/wiki/l10n/pluralforms - you a see it's pretty complex and without real lexing, inline evaluation is really the only way to accurately do this.

@jmccaffrey42
Copy link

Thanks, Ill go and mess around with the eval path and see what I come up with. That website is great BTW, we can build tests around all those cases to ensure we are supporting the right stuff.

@ghost ghost mentioned this issue May 14, 2012
@clemens-tolboom
Copy link
Contributor

In #4249 I added support for using Gettext Headers as just strings.

In the mentioned #3023 by @stealth35 I don't see anything about parsing headers ... the implemented plural parsing is using interval and not indexed plural. (I hope I understood that correctly).

I fixed plural loading/dumping in #4249.

@blaugueux
Copy link
Contributor

Any news about this PR for 2.1 ?

@fabpot
Copy link
Member

fabpot commented Jul 10, 2012

@Drak any update?

@stof
Copy link
Member

stof commented Jul 15, 2012

@Drak ping

@ghost
Copy link
Author

ghost commented Jul 16, 2012

Work on gettext is scheduled for 2.2

@blaugueux
Copy link
Contributor

@Drak So the PO/MO translation system shall not be used before 2.2 ?

@ghost
Copy link
Author

ghost commented Jul 17, 2012

@blaugueux - Gettext implementation is usable, but understanding that the current implementations do not respect the pluralisation headers or support contexts. Plurals are taken from a hard coded list. I'll be making PRs for 2.2 soon.

@stof
Copy link
Member

stof commented Aug 18, 2014

@Drak do you still plan to work on it sometimes ?

@jiru
Copy link

jiru commented Feb 20, 2015

If that’s any interest to you, I implemented a Gettext pluralization rules parser as part of a PR to CakePHP: cakephp/cakephp#5348. They ended up keeping hardcoded rules though. You can reuse my code freely if you want.

@javiereguiluz
Copy link
Member

@aitboudad do you think that this feature is needed and doable without complicating code much? Or do you think we could close it as "won't fix"? Thanks.

@aitboudad
Copy link
Contributor

@javiereguiluz Well I think this require many changes, the way we handle pluralization is a bit different than this one which need to define the pluralization rules per files so I'm ok to mark it as wont fix

@javiereguiluz
Copy link
Member

Closing it as "won't fix" as per @aitboudad's comments. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests