Skip to content

How to refer to CLDR data on plural rules? #538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eemeli opened this issue Nov 28, 2023 · 3 comments · Fixed by #815
Closed

How to refer to CLDR data on plural rules? #538

eemeli opened this issue Nov 28, 2023 · 3 comments · Fixed by #815
Labels
functions Issue pertains to the default function set Future Deferred for future standardization question Further information is requested

Comments

@eemeli
Copy link
Collaborator

eemeli commented Nov 28, 2023

As discussed on #534 (comment) and previously, it would be Very Good to be able to make use of the existing CLDR data at least in plurals.xml and ordinals.xml, which effectively describe which locales use which plural categories, and how those are selected. The structure of these files is described in ldmlSupplemental.dtd.

As far as I'm aware, only information for plural and ordinal categories is available with this specific format. For example, the CLDR grammaticalFeatures.xml file has a rather different structure for its presentation of other grammatical features, which may become useful for other formatters and selectors.

As @aphillips notes, "Perhaps we should have a referencing mechanism to CLDR instead of replicating data [for plural matching]."

However, this isn't eminently straightforward, as evidenced by the fact that this hasn't been done yet. In terms of what's theoretically achievable here, we have the following capability levels:

  1. We can just go with the full set of categories with a <match values="zero one two few many other">, which does not require any additional data. We'll need to provide this baseline in any case; everything else filters this to some subset.
  2. If we can make the <plurals type="..."><pluralRules locales="..."><pluralRule count="..."> attribute information available to registry users, they can determine that given a type (cardinal or ordinal) and a locale code, the count attributes of the set of <pluralRule> elements defines the available locales.
  3. If we can parse and process the contents of the <pluralRule> elements, we can further restrict the locales in many cases. For example, we could determine that in English, a numeric selector with minimumFractionDigits=2 will only ever resolve to the other category, or that in Polish an :integer plural selector would only match one, few, or many, and never other.

In order for us to go beyond Level 0 in the core registry definition without actually duplicating data, I think we would need something like an XSL Transform specifically for plural and ordinal data, and a referencing style where we could say something like (syntax only indicative):

<matchRef href="path/to/plurals.xml" transform="path/to/plural-match-mapper.xsl"/>

I'm not sure that there's a reasonable way to extract the @integer / @decimal -ness of the rules with XSLT, to allow for reaching Level 2.

Without the transform itself, we could also leave a SHOULD-ish statement in the description of numerical selectors for tool builders to narrow the full set using CLDR data where appropriate.


We should decide how much of this we consider to be within scope of the core registry, and how much we intend to get done for next Spring's release.

@eemeli eemeli added question Further information is requested functions Issue pertains to the default function set labels Nov 28, 2023
@aphillips
Copy link
Member

I think that a minimum set for the spring release would be a list of functions and their options and, in the case of selectors, a general description of available matching keys. For plural for example, this would be anyNumber and a list of the CLDR keywords (zero, one, two, few, many, other/*), but not the locale-based tailoring of same.

As @eemeli notes, ideally we would establish a link to CLDR data where that's appropriate. If a transform were needed strictly for MF purposes, maybe that could be produced?

@aphillips aphillips added the LDML45 LDML45 Release (Tech Preview) label Jan 8, 2024
@eemeli
Copy link
Collaborator Author

eemeli commented Jan 26, 2024

A possible solution to the titular question here is presented in #558 (comment), including a proof of concept parametric XSLT transform.

@aphillips
Copy link
Member

I think this can be moved to Future as the actual format of the registry will be post-45? I still want to solve this, but looking to manage scope.

@eemeli eemeli added Future Deferred for future standardization and removed LDML45 LDML45 Release (Tech Preview) labels Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions Issue pertains to the default function set Future Deferred for future standardization question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants