Skip to content

(Design) Number Selection #471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 11, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 200 additions & 0 deletions exploration/number-selection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
# Selection on Numerical Values

Status: **Accepted**

<details>
<summary>Metadata</summary>
<dl>
<dt>Contributors</dt>
<dd>@eemeli</dd>
<dt>First proposed</dt>
<dd>2023-09-06</dd>
<dt>Pull Request</dt>
<dd><a href="https://github.com/unicode-org/message-format-wg/pull/471">#471</a></dd>
</dl>
</details>

## Objective

Define how selection on numbers happens.

## Background

As discussed by the working group and explicitly identified in
<a href="https://github.com/unicode-org/message-format-wg/pull/457">discussion of #457</a>,
there is a need to support multiple types of selection on numeric values in MF2.

MF1 supported selection on either cardinal plurals or ordinal numbers,
via the `plural` and `selectordinal` selectors.
It also customized this selection beyond the capabilities of `com.ibm.icu.text.PluralRules`
by allowing for explicit value matching and an `offset` parameter.

As pointed out by <a href="https://github.com/mihnita">@mihnita</a> in particular,
category selection is not always appropriate for selection on a number:
the number may be representing some completely other quantity,
such as a four-digit year or the integer value of an enumerator.

Furthermore, as pointed out by <a href="https://github.com/ryzokuken">@ryzokuken</a>
in <a href="https://github.com/unicode-org/message-format-wg/pull/457#discussion_r1307443288">#457 (comment)</a>,
ordinal selection works similarly to plural selection, but uses a separate set of rules
for each locale.
This is visible in English, for example, where plural rules use only `one` and `other`
but ordinal rules use `one` (_1st_, _21st_, etc.), `few` (_2nd_, _22nd_, etc.),
`many` (_3rd_, _23rd_, etc.), and `other` (_4th_, _5th_, etc.).

Additionally,
MF1 provides `ChoiceFormat` selection based on a complex rule set
(and which allows determining if a number falls into a specific range).

Both JS and ICU PluralRules implementations provide for determining the plural category
of a range based on its start and end values.
Range-based selectors are not initially considered here.

## Use-Cases

As a user, I want to write messages that use the correct plural for
my locale and enables translation to locales that use different rules.

As a user, I want to write messages that use the correct ordinal for
my locale and enables translation to locales that use different rules.

As a user, I want to write messages in which the pattern used depends on exactly matching
a numeric value.

As a user, I want to write messages that mix exact matching and
either plural or ordinal selection in a single message.
> For example:
>```
>.match {$numRemaining}
>0 {{You have no more chances remaining (exact match)}}
>1 {{You have one more chance remaining (exact match)}}
>one {{You have {$numRemaining} chance remaining (plural)}}
> * {{You have {$numRemaining} chances remaining (plural)}}
>```


## Requirements

- Enable cardinal plural selection.
- Enable ordinal number selection.
- Enable exact match selection.
- Enable simultaneous "exact match" and either plural or ordinal selection.
> For example, allow variants `[1]` and `[one]` in the same message.
- Selection must support relevant formatting options, such as `minimumFractionDigits`.
> For example, in English the value `1` matches plural rule `one` while the value `1.0`
> matches the plural rule `other` (`*`)
- Encourage developers to provide the formatting options used in patterns to the selector
so that proper selection can be done.

## Constraints

ICU MF1 messages using `plural` and `selectordinal` should be representable in MF2.

## Proposed Design

Given that we already have a `:number`,
Copy link
Collaborator

@mihnita mihnita Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we already have a :number,
it makes sense to add a <matchSignature> to it with an option

Sorry, but it is not that obvious.
:number is not a question, so one can't decide on it.

You have to ask a question about the number.

"Hey, :number 43?"
Is that true or false? If you answer correctly you win 1000$"

That means nothing. It is not a good question.

Hey, :number 43 is even?" is a question, you can answer.
Hey, :number 43 is prime?" is a question, you can answer.
Hey, :number 43 is plural?" is a question, you can answer.

Copy link
Member

@aphillips aphillips Dec 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two questions here.

The first is: should the :number function used for formatting also do selection? @eemeli has already pointed out that we want users to use the same configuration for selection (e.g. with :plural) as will ultimately be used to format the number. This message produces grammatically inferior results:

.local $tenths = {$num :number minFractionDigits=1}
.match {$num :plural}
one {{You have {$tenths} value}}  // not desirable
* {{You have {$tenths} values}}

The second question is: if :number is a selector, what sort of selector is it by default? @eemeli answers that :plural is the most common usage. Alternatives would be :ordinal and whatever we call exact match. I agree with @eemeli that :plural is the far-and-away most common case and thus should be the default.

So... @mihnita are you objecting to the phrasing here or the conclusion? If the phrasing, do you want to suggest alternate text?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aphillips Your "won't be seen" comment is a bit misleading, as the problem with that message is that the one variant may indeed be seen, formatted as "You have 1.0 value".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right.. that comment goes against what I'm trying to say, so I fixed it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already defined $tenths and use it in the body of the options, then I expect :plural on it:

.local $tenths = {$num :number minFractionDigits=1}
.match {$tenths :plural}
one {{You have {$tenths} value}}  // not desirable
* {{You have {$tenths} values}}

And not doing that is easy to do at lint time.

it makes sense to add a `<matchSignature>` to it with an option

```xml
<option name="select" values="plural ordinal exact" default="plural" />
```

The default `plural` value is presumed to be the most common use case,
and it affords the least bad fallback when used incorrectly:
Using "plural" for "exact" still selects exactly matching cases,
whereas using "exact" for "plural" will not select LDML category matches.
This might not be noticeable in the source language,
but can cause problems in target locales that the original developer is not considering.

> For example, a naive developer might use a special message for the value `1` without
> considering other locale's need for a `one` plural:
>
> ```
> .match {$var}
> [1] {{You have one last chance}}
> [one] {{You have {$var} chance remaining}} // needed by languages such as Polish or Russian
> [*] {{You have {$var} chances remaining}}
> ```

Additional options such as `minimumFractionDigits` and others already supported by `:number`
should also be supported.

If PR [#532](https://github.com/unicode-org/message-format-wg/pull/532) is accepted,
also add the following `<alias>` definitions to `<function name="number">`:

```xml
<alias name="plural" supports="match">
<setOption name="select" value="plural"/>
</alias>
<alias name="ordinal" supports="match">
<setOption name="select" value="ordinal"/>
</alias>
```
Comment on lines +125 to +132
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this list exhaustive? #532 includes :integer. Presumably :integer is an integer plural selector by default (in addition to being a formatter).

Other candidates for shorthands might be :percent, :currency and (maybe??) :scientific. The first two are shorthands in ICU4J MF1. scientific is a numeric formatting option (but not exposed by MF1).

There is also the potentially lamentable :spellout from ICU4J MF1.

I'm guessing that what we want to promote is single-annotation messages, e.g.:

.input {$var :spellout}
.match {$var}                               // it's already a plural
[0]   {{You have no more bone dragons}}
[one] {{You have {$var} more bone dragon}}  // "You have one more..."
[*]   {{You have {$var} more bone dragons}}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ones you mention are all formatters? The list here is purely about .match selectors, and for that I think just :plural and :ordinal could be enough. If you need exact matching, then the relatively rare :number select=exact is pretty good?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify a bit, a proposed alias like :integer or :spellout still ends up calling the :number function, and so if the expression doesn't have an explicit select attribute and the alias does not set supports="format", it'll end up inheriting the default select="plural" for selection, together with the rest of the expression & alias options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are, but your assertion is that formatters and selectors are the same thing. There's some convenience to configuring the formatter and then using it as the selector also. None of the ones I mention would use exact or ordinal as the default selector.

We could just require that users use type=XXX for formatting options. That would make the example I gave:

.input {$var :number type=spellout}
.match {$var}
[one] {{You have {$var} more bone dragon}}
[*] {{...}}

Maybe more interesting examples would be:

.input {$savingsPercent :number type=percent maximumFractionDigits=0}
.input {$savingsAmount :number type=currency currency=$savingsCurrency}
.match {$savingsPercent} {$savingsAmount}
[one one] {{You saved {$savingsAmount} ({$savingsPercent})}}
[* *] {{...etc...}}

vs.

.input {$savingsPercent :percent maximumFractionDigits=0}
.input {$savingsAmount :currency currency=$savingsCurrency}
.match {$savingsPercent} {$savingsAmount}
[one one] {{You saved {$savingsAmount} ({$savingsPercent})}}
[* *] {{...etc...}}

and avoiding...

.match {$savingsPercent :plural} {$savingsAmount :plural}
[one one] {{You saved {$savingsAmount :number type=currency currency=$savingsCurrency} 
    ({$savingsPercent :number type=percent maxiumumFractionDigits=0})}}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's how I would classify the different aliases that have been mentioned so far:

  • :integer -- The most obvious one, and therefore included already in #532.
  • :plural and :ordinal -- Functions for selection only with supports="match". Included in this design doc.
  • :percent, :currency,:scientific, :spellout -- Possible aliases for later/separate consideration. As with :integer, they might be used also for selection, but I don't see how any of them would specify a <setOption name="select">. So I don't think they belong in a design doc on Number Selection.

Once #532 lands, would a PR adding that third set be the right focal point for discussion of their merits, rather than this PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:ordinal is a formatter too

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that :ordinal could also be a formatter, but I am not here proposing that it be one of the core formatters that we expect all MF2 implementations to provide. That could be considered as a separate further step, should this design on number selection first introduce it as a selector.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to? Can we finish up numbers and be done with them? 😁

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would very strongly prefer taking this smaller step first, and not expanding the scope of this design document.


## Alternatives Considered

### Completely Separate Functions

An alternative approach to this problem could be to leave the `:number` `<matchSignature>` undefined,
and to define three further functions, each with a `<matchSignature>`:

- `:plural`
- `:ordinal`
- `:exact` (actual name TBD, pending the resolution of [#433](https://github.com/unicode-org/message-format-wg/issues/433)

which would each need the same set of options as `:number`, except for `type`.

This approach would also mostly work, but it introduces new failure modes:

- If a `:number` is used directly as a selector, this should produce a runtime error.
- If a `:plural`/`:ordinal`/`:exact` is used as a formatter, this should produce a runtime error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted elsewhere, :ordinal is also a formatter or formatting option, at least in MF1. The name for the selector (selectordinal) and the formatter (ordinal) are different because MF1's parser needed separate keywords (the syntax doesn't separate selectors and formatters the way ours does), but we don't have that problem.

- Developers are less encouraged to use the same formatting and selection options.

To expand on the last of these,
consider this message:

```
.match {$count :plural minimumFractionDigits=1}
0 {{You have no apples}}
1 {{You have exactly one apple}}
* {{You have {$count :number minimumFractionDigits=1} apples}}
```

Here, because selection on `:number` is not allowed,
it's easy to duplicate the options because _some_ annotation is required on the selector.
It would also be relatively easy to leave out the `minimumFractionDigits=1` option from the selector,
as it's not required for the English source.

With the proposed design, this message would much more naturally be written as:

```
.input {$count :number minimumFractionDigits=1}
.match {$count}
0 {{You have no apples}}
1 {{You have exactly one apple}}
one {{You have {$count} apple}}
* {{You have {$count} apples}}
```

#### Pros

- None?

#### Cons

- Naïve selection on `:number` will fail, leading to user confusion and annoyance.
- No encouragement to use the same options for selection and formatting.

### Do Not Standardize Number Selection

We could leave number selection undefined in the spec, making it an implementation concern.
Each implementation could/would then provide their own selectors,
and they _might_ converge on some overlap that users could safely use across platforms.

#### Pros

- The spec is a little bit lighter, as we've left out this part.

#### Cons

- No guarantees about interoperability for a relatively core feature.