How to refer to CLDR data on plural rules? #538

eemeli · 2023-11-28T20:54:18Z

As discussed on #534 (comment) and previously, it would be Very Good to be able to make use of the existing CLDR data at least in plurals.xml and ordinals.xml, which effectively describe which locales use which plural categories, and how those are selected. The structure of these files is described in ldmlSupplemental.dtd.

As far as I'm aware, only information for plural and ordinal categories is available with this specific format. For example, the CLDR grammaticalFeatures.xml file has a rather different structure for its presentation of other grammatical features, which may become useful for other formatters and selectors.

As @aphillips notes, "Perhaps we should have a referencing mechanism to CLDR instead of replicating data [for plural matching]."

However, this isn't eminently straightforward, as evidenced by the fact that this hasn't been done yet. In terms of what's theoretically achievable here, we have the following capability levels:

We can just go with the full set of categories with a <match values="zero one two few many other">, which does not require any additional data. We'll need to provide this baseline in any case; everything else filters this to some subset.
If we can make the <plurals type="..."><pluralRules locales="..."><pluralRule count="..."> attribute information available to registry users, they can determine that given a type (cardinal or ordinal) and a locale code, the count attributes of the set of <pluralRule> elements defines the available locales.
If we can parse and process the contents of the <pluralRule> elements, we can further restrict the locales in many cases. For example, we could determine that in English, a numeric selector with minimumFractionDigits=2 will only ever resolve to the other category, or that in Polish an :integer plural selector would only match one, few, or many, and never other.

In order for us to go beyond Level 0 in the core registry definition without actually duplicating data, I think we would need something like an XSL Transform specifically for plural and ordinal data, and a referencing style where we could say something like (syntax only indicative):

<matchRef href="path/to/plurals.xml" transform="path/to/plural-match-mapper.xsl"/>

I'm not sure that there's a reasonable way to extract the @integer / @decimal -ness of the rules with XSLT, to allow for reaching Level 2.

Without the transform itself, we could also leave a SHOULD-ish statement in the description of numerical selectors for tool builders to narrow the full set using CLDR data where appropriate.

We should decide how much of this we consider to be within scope of the core registry, and how much we intend to get done for next Spring's release.

The text was updated successfully, but these errors were encountered:

aphillips · 2023-11-29T13:36:07Z

I think that a minimum set for the spring release would be a list of functions and their options and, in the case of selectors, a general description of available matching keys. For plural for example, this would be anyNumber and a list of the CLDR keywords (zero, one, two, few, many, other/*), but not the locale-based tailoring of same.

As @eemeli notes, ideally we would establish a link to CLDR data where that's appropriate. If a transform were needed strictly for MF purposes, maybe that could be produced?

eemeli · 2024-01-26T09:57:24Z

A possible solution to the titular question here is presented in #558 (comment), including a proof of concept parametric XSLT transform.

aphillips · 2024-02-07T23:47:23Z

I think this can be moved to Future as the actual format of the registry will be post-45? I still want to solve this, but looking to manage scope.

eemeli added question Further information is requested functions Issue pertains to the default function set labels Nov 28, 2023

eemeli mentioned this issue Dec 9, 2023

Add <when> to help select the right <match> #558

Closed

aphillips added the LDML45 LDML45 Release (Tech Preview) label Jan 8, 2024

eemeli added Future Deferred for future standardization and removed LDML45 LDML45 Release (Tech Preview) labels Feb 8, 2024

eemeli mentioned this issue Jul 8, 2024

Drop machine-readable registry definition from spec #815

Merged

aphillips closed this as completed in #815 Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to refer to CLDR data on plural rules? #538

How to refer to CLDR data on plural rules? #538

eemeli commented Nov 28, 2023

aphillips commented Nov 29, 2023

eemeli commented Jan 26, 2024

aphillips commented Feb 7, 2024

How to refer to CLDR data on plural rules? #538

How to refer to CLDR data on plural rules? #538

Comments

eemeli commented Nov 28, 2023

aphillips commented Nov 29, 2023

eemeli commented Jan 26, 2024

aphillips commented Feb 7, 2024