Skip to content

Add a <matchSignature> for :number, together with :ordinal & :plural aliases #560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 29, 2024

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Dec 12, 2023

This implements the number selection design.

Separately from this, we should add a "default functions" section to registry.md.

The only <match> included in this proposal is the fallback one; further changes could follow if/once #558 is accepted.

@eemeli eemeli added the functions Issue pertains to the default function set label Dec 12, 2023
Comment on lines 252 to 257
With `exact` selection, only a variant with a key matching
the JSON representation of the input number (without an exponent) will be selected.
With `plural` and `ordinal` selection, if no such exactly matching key is present,
a variant with the matching CLDR plural category (`zero`/`one`/`two`/`few`/`many`/`other`)
for the corresponding selection type will be selected,
according to the rules of the current locale
Copy link
Member

@aphillips aphillips Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using markdown in the descriptions (in an XML registry)?

Start with the default. Describe each separately. Also, your phrasing suggests binary matching, but, in fact, our selectors rank and filter options. You don't mention the * handling.

Suggested change
With `exact` selection, only a variant with a key matching
the JSON representation of the input number (without an exponent) will be selected.
With `plural` and `ordinal` selection, if no such exactly matching key is present,
a variant with the matching CLDR plural category (`zero`/`one`/`two`/`few`/`many`/`other`)
for the corresponding selection type will be selected,
according to the rules of the current locale
One of three selection models for numeric values, defaulting to "plural".
Each selector type matches numeric values to keys using the "number-literal"
production from the ABNF.
The "plural" and "ordinal" selectors also use plural or ordinal rules from CLDR
to match categories "zero", "one", "two", "few", "many" and "other" with a lower
match quality than exact numeric match.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The choice of markdown is intentional, because:

  1. It's remarkably human-readable even in syntax form.
  2. It's a relatively well-defined formatting language that doesn't require any <> that would conflict with the surrounding XML.
  3. It's rather well supported as a syntax for code comments.
  4. What's the alternative here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged the suggestion here, but kept the thread open to account for the markdown discussion. Also: not mentioning e.g. "match quality" was intentional, as this selector only has two quality levels: exact and category, with the former always pre-empting the latter.

The * is not mentioned here because it is not a concern for a selector function; its behaviour is defined directly in the selection algorithm, and the selector is never consulted about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no objection to markdown. I just didn't want to let it pass by unremarked?

The * is not mentioned here because it is not a concern for a selector function; its behaviour is defined directly in the selection algorithm, and the selector is never consulted about it.

I'm not sure that's correct. In the case of plural I would be unsurprised to see the keyword other in CLDR be treated as a synonym for * in MF2 (to prevent having two defaults). If that's not the case, some mention of the relationship is probably called for??

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other is not special in the syntax the way * is. In Pattern Selection, it gets special treatment in:

  • Resolve Preferences 2.ii.b. (this is where it's left out of the keys sent to the selector)
  • Filter Variants 2.i.b.
  • Sort Variants 5.iii.c.

To make it possible to use e.g. only one and other as English plural variant keys, leaving out the * variant, we would need to:

  1. Remove the syntax validity requirement to always have all-* variant.
  2. Refactor the Missing Fallback Variant error into a formatting error for when we fail to select any variant.
  3. Add a way for the registry <match> element to state that its values attribute exhaustively covers the input space, so that a build-time validator could still verify that the variant set is complete even if it does not include a * variant.

I would enthusiastically support such a set of changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not talking about leaving out * but about documenting that * and other are equivalent. What we don't want are tools expanding a variant set from 0/one/* to 0/one/other/*. I wouldn't make the latter set invalid (other is just a string) but it is pointless to write.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think defining that here would require us to define the behaviour of e.g. selection on {horse :plural}. Would that silently match an other variant, or would it emit an error and end up selecting the * variant? And in e.g. Polish, could that match a many variant?

Do we really need to define in the spec such internal behaviour of a selector in error cases?

Copy link
Member

@aphillips aphillips Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation of ICU's PluralFormat says:

You always have to define a message text for the default plural case "other" which is contained in every rule set. If you do not specify a message text for a particular plural case, the message text of the plural case "other" gets assigned to this plural case.

This is equivalent functionally to *. Do you disagree that tools should be told about this equivalence? Do you think we should promote developers having to write 0/one/other/* messages? Note that the error being handled here isn't "horse" but rather "missing translation" in which the runtime locale would prefer to match few but the message doesn't provide few (usually because the message hasn't been localized and we're getting e.g. the root locale message).

This says nothing about error handling. Note that we do specify that horse is an error by making :plural and :number's operand anyNumber, but that's wholly separate from describing what the plural cases are.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you disagree that tools should be told about this equivalence?

No, I agree that we should communicate clearly how * works, and that it's very similar to other in MF1. But I don't think this is the right place for that, because the rules about * apply to all selectors, not just :number.

Do you think we should promote developers having to write 0/one/other/* messages?

No, that's unlikely to be helpful in most cases. But we should let them know that it's at least possible, for the rarer cases when distinguishing other from * is important.

This says nothing about error handling. Note that we do specify that horse is an error by making :plural and :number's operand anyNumber, but that's wholly separate from describing what the plural cases are.

This sounds like a detail we ought to talk about some more, later. My position here would be that we can define in the spec a minimum of what must work, but that we should not seek to define behaviour for all possible inputs. Also, we ought to have normative text separately from what's in the <description> saying what a :number ought to be doing.

Co-authored-by: Addison Phillips <addisonI18N@gmail.com>
@mihnita
Copy link
Collaborator

mihnita commented Dec 18, 2023

This PR is strictly about implementing the consensus achieved on number selection.

I don't think there is agreement that the selection function name for plural is :number
I think is it worse than :plural, and I explained why.

Can I live with :number instead? Yes.
But I don't think there was an agreement on it :number.

@eemeli
Copy link
Collaborator Author

eemeli commented Jan 26, 2024

I believe that consensus on this was reached during the call this week.

@eemeli eemeli requested a review from aphillips January 26, 2024 09:53
@aphillips aphillips added LDML45 LDML45 Release (Tech Preview) and removed Agenda+ Requested for upcoming teleconference labels Jan 26, 2024
@aphillips
Copy link
Member

We have a consensus on what functions should appear in the registry. There are open questions about the registry structure (that this PR cannot address). There are also questions about the options for the aliases, which I think can be decided separately. I don't think that merging this will have an impact on what we ultimately choose to do with these details, so I'm giving my approval.

@aphillips
Copy link
Member

Without prejudice in favor of any specific functions, merging to clear.

@aphillips aphillips merged commit 3ed0972 into unicode-org:main Jan 29, 2024
@eemeli eemeli deleted the plural-ordinal branch January 29, 2024 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions Issue pertains to the default function set LDML45 LDML45 Release (Tech Preview)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants