Skip to content

Implement the default registry in the spec #659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Feb 22, 2024

Conversation

aphillips
Copy link
Member

This PR copies the number and date/time default functionality from design to the specification.

The copy was done verbatim, except:

  • I moved some parts of number to make the structure logically consistent
  • I added an example to the style=percent note
  • I turned the information about the default selector into an "important" note
  • I added the surrounding document structure

This PR copies the number and date/time default functionality from design to the specification.

The copy was done verbatim, except:

- I moved some parts of number to make the structure logically consistent
- I added an example to the `style=percent` note
- I turned the information about the default selector into an "important" note
@aphillips aphillips added functions Issue pertains to the default function set fast-track Editorial change permitted to use fast-track merge rules normative Issue affects normative text in the specification LDML45 LDML45 Release (Tech Preview) labels Feb 16, 2024
aphillips and others added 2 commits February 16, 2024 09:00
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
@aphillips aphillips requested a review from eemeli February 16, 2024 17:04
Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a couple more incorrect title levels, and a few of my previous line comments are still open. Most significantly, the style=percent multiplication instead of division.

aphillips and others added 4 commits February 16, 2024 10:22
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
@aphillips aphillips requested a review from eemeli February 16, 2024 21:40
Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be addressed: #659 (comment).

spec/registry.md Outdated

The _operand_ of a number function is either an implementation-defined type or
a literal that matches the `number-literal` production in the [ABNF](/spec/message.abnf).
All other values produce a _Resolution Error_.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a place to record conformance tests? This requires rejection of inputs that naïve implementations might accept, e.g.

  • | 1|
  • |.1|
  • |1.|
  • |01|
  • |+1|
  • |0x1|

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do! See the folder 'test' in our repo. I have filed an issue to capture this comment.

spec/registry.md Outdated
Comment on lines 311 to 314
### Options

The following options and their values are required in the default registry to be available on the
function `:number`:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This arrangement seems odd to me; I would expect options for :number to be defined in the same section as the function itself rather than in a section introducing both :number and :integer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggested organization of the sections is superior to what I have here. Once all other comments are cleared, I will reorganize that way.

spec/registry.md Outdated
Comment on lines 370 to 382
> [!NOTE]
> The following options do not have default values because they are only to be used
> as overrides for an existing locale-and-value dependent implementation-defined
> default

- `minimumFractionDigits`
- (non-negative integer)
- `maximumFractionDigits`
- (non-negative integer)
- `minimumSignificantDigits`
- (non-negative integer)
- `maximumSignificantDigits`
- (non-negative integer)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that this note is helpful; the same dynamic default behavior is also true of numberingSystem.

Suggested change
> [!NOTE]
> The following options do not have default values because they are only to be used
> as overrides for an existing locale-and-value dependent implementation-defined
> default
- `minimumFractionDigits`
- (non-negative integer)
- `maximumFractionDigits`
- (non-negative integer)
- `minimumSignificantDigits`
- (non-negative integer)
- `maximumSignificantDigits`
- (non-negative integer)
- `minimumFractionDigits`
- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options)
- `maximumFractionDigits`
- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options)
- `minimumSignificantDigits`
- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options)
- `maximumSignificantDigits`
- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options)

Copy link
Member Author

@aphillips aphillips Feb 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead put something above all of the options in a non-NOTE style that says something like:

Some options do not have default values defined in this specification.
The defaults for these options are implementation-dependent.
In general, these types of defaults depend on the locale, the value of other options,
or both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this. See what you think.

aphillips and others added 2 commits February 17, 2024 08:07
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
Also clarifies that the "data model" in this section is for the registry, not the other data model deliverable.
Comment on lines +649 to +653
A **_<dfn>date/time literal value</dfn>_** is a non-empty string consisting of
one of the following:
- an XMLSchema 1.1 [dateTime](https://www.w3.org/TR/xmlschema11-2/#dateTime)
- an XMLSchema 1.1 [time](https://www.w3.org/TR/xmlschema11-2/#time)
- an XMLSchema 1.1 [date](https://www.w3.org/TR/xmlschema11-2/#date)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should clarify which functions take which of these values. All can't be valid for all, as e.g. an XML time is insufficient input for :date.

Also, are any timezones in literal inputs used in the formatted outputs? For example, when formatting

{12:34+06:00 :time}

is the resulting formatted time in the system timezone, or in +06?

How about UTC, what timezone does this render in?

{12:34Z :time}

As we've left out timeZone from the available options, also leaving out timezone parsing would be one way to resolve this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh boy, where do I start? 😈

The values can parse into various kinds of date/time objects. Many (most?) of these can be formatted by any of the functions given here, although the results may be not be very useful. Specifically, a time formatted as a date or a date formatted as a time is probably not very interesting.

XMLSchema date/time values do not have timezones. They only support zone offsets. The only zone offset that can be 100% certainly made into a time zone is Z because that one means "UTC".

The time zone is very important in rendering date and time values. I argued at length in the previous PR (and tried to make the point in the call) that we should not excommunicate timezone as an option to "future". Without that option, there are only the following options for managing time zone in formatting:

  • the implementation uses the system default
  • the implementation provides an API for setting the time zone of an entire message
    (this is not a behavior we specify because it is beyond our remit, although probably
    we should point it out)
  • the implementation assumes UTC

In addition, date/time literals with no offset should be handled as floating time values (even if the implementation uses some specific time zone, the value should be formatted so that the field values don't change--often this is done using UTC as the time zone, ignoring local settings)

Note well that "time zone parsing" doesn't exist here. The zone offset (usually include Z) parses into the underlying incremental ("instant") value. That is, all of the following are the same time value:

  • 2007-01-01T01:00:00.000-01:00
  • 2007-01-01T01:00:00.000Z
  • 2007-01-01T01:00:00.000+01:00
  • 2006-12-31T23:00:00.000-01:00

Literals missing one part should be permitted to treat that part as 0. Some examples:

Literal Pattern Output Comment
2024-02-17 {$d :datetime dateStyle=medium} Feb 17, 2024 00:00 a.m. No time; treat as floating
12:34:56.000 {$d :datetime dateStyle=medium}` Jan 1, 1970 12:34 p.m. No date; treat as floating
2024-02-17T12:34:56.000Z {$d :dateTime dateStyle=medium} Feb 17, 2024 4:34:56 a.m. Pacific Surprised?
2024-02-17T12:34:56.000-08:00 {$d :dateTime dateStyle=medium} Feb 17, 2024 12:34 p.m. Pacific

If we didn't do this, users used to working with partial values would have messages fail instead of doing what their code usually does. The group will need to decide if we bring timeZone back in Tech Preview. We should also discuss whether to specify any of this behavior.

Some of what I express above might be controversial, in that we could make different decisions about forcing floating time behavior or permitting partial values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to @eemeli... see disconnected conversation below ⬇️

spec/registry.md Outdated

### Operands

The _operand_ is any literal or an implementation-defined set of string or
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The _operand_ is any literal or an implementation-defined set of string or
The _operand_ is any string generated by the
`*(quoted-char / quoted-escape)`, `name`, or `number-literal` productions in the grammar;
or an implementation-defined set of string or

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're doing!

The name or number-literal productions are narrower than the first description (which is the guts of a quoted literal). That narrower production defines what's allowed in an unquoted literal, which has no real bearing here (it's a detail of the syntax).

The verb "generated" also bothers me, as one doesn't generate the operand. I kind of want to say the "resolved value of a literal", since one of the things that should happen is that the quoted-escape becomes unescaped in the data-model or in-memory representation. It's only escaped in the syntax to avoid conflicting with the grammar.

Perhaps:

Suggested change
The _operand_ is any literal or an implementation-defined set of string or
The _operand_ is any _literal_'s value less any quotes and with any `quoted-escape` sequences
converted to their escaped characters
or an implementation-defined set of string or

We should add an example too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote and added an example. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the typical processing model is:

  1. Given the string form of the literal embedded in the message.
  2. We fix any features that were used to make the literal fit the syntax: quotes, or quoted-escape sequences, so that we have it in a clean string.
  3. We convert into the internal datatype (eg BigDecimal, or Datetime) expected by the implementation's version of the function (eg :number or :datetime).
  4. We convert the accumulated options for the function also into the internal datatypes expected by the function.
  5. We pass everything to the function, plus an enum value specifying whether we want a string or a data-model part
  6. We get the result and do something interesting with it (insert it into a string buffer that we are building, or into a data model

The problem with "resolved value" is that some people might think of it as #2, and some people might think of it as #3. Maybe we can have a special term for just #2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@macchiati That specific "typical" model_does not work for MessageFormat. Specifically, MF cannot convert the value to any kind of datatype. That is the responsibility of the function. This is because MF itself is typeless.

This does not mean that typed values passed in as arguments "lose their shape". Implementations can (and are expected) to do reflection or type checking upon them. But only string literal values exist inside MF2 syntax (you can use a variable to refer to a typed argument value).

This is why this specific PR talks about the literal form of the operand and then various "implementation-defined types". The expression {|2024-02-17| :datetime} is converted to a date inside the :datetime function. This is important because it allows types that MessageFormat does not know about itself to be supported via plug-in functions.

The term for 2 is "the resolved value of the literal". (More elsewhere on the problem of 'resolved value').

Note that "resolved" deals with transitive assignment cases:

.local $b = {$a}
.local $c = {$b}

... to make the "resolved value" of $c be the value contained inside of $a (plus whatever happened along the way, e.g. if the first line were .local $b = {$a :transform operation=uppercase} you might expect $c to be uppercase.).

In this specific case, we are talking about the :string function and we're talking about passing it a string. What concrete change is needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused as to what you meant by a 'resolved value', and could see two possible interpretations. My comment was specific to :string (because as you say the results are always string comparisons anyway).

The spec always has to work in terms of strings, because it knows nothing about datatypes, and can't require particular datatypes because they vary by environment and implementation capabilities.

However, I would expect many MF2 implementations to actually operate with inputs that are not strings, but rather datatypes: both primitive and non-primitive. We can't have a spec that precludes implementations of MF2 from doing that. And those implementations will want to keep the values in datatypes where feasible, instead of always stringifying.

So, for example, it should be possible for an implementation to compile a MF2 into a data structure whereby when it knows that $input is a long, it can convert a literal for matching also into a long in that data structure, and just do long comparison.

Similarly, for plural matching, the compiled data structure could contain an enum value `few', instead of converting the an internal value for the plural category into a string and then doing a string comparison.

Copy link
Collaborator

@catamorphism catamorphism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still approving, just added more elaboration on the "literal" terminology

spec/registry.md Outdated

### Operands

The _operand_ is any literal or an implementation-defined set of string or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the typical processing model is:

  1. Given the string form of the literal embedded in the message.
  2. We fix any features that were used to make the literal fit the syntax: quotes, or quoted-escape sequences, so that we have it in a clean string.
  3. We convert into the internal datatype (eg BigDecimal, or Datetime) expected by the implementation's version of the function (eg :number or :datetime).
  4. We convert the accumulated options for the function also into the internal datatypes expected by the function.
  5. We pass everything to the function, plus an enum value specifying whether we want a string or a data-model part
  6. We get the result and do something interesting with it (insert it into a string buffer that we are building, or into a data model

The problem with "resolved value" is that some people might think of it as #2, and some people might think of it as #3. Maybe we can have a special term for just #2?

spec/registry.md Outdated


> [!NOTE]
> This should probably include individual character types, such as `char`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this means; wouldn't we just treat a char as a string with a single code point?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what I mean: don't overlook char and java.lang.Character

spec/registry.md Outdated
- Let `keys` be a list of strings containing keys to match.
(Hint: this list is an argument to `MatchSelectorKeys`)
- For each string `key` in `keys`:
- If the value of `key` is equal to the string value of `operand`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not clear. If the key is |1| and the operand is "1", they wouldn't match. Is there some place that specifies that at this point in the algorithm the key has had its quotes removed and been unescaped?

> in each string.
> As a result, variations in how text can be encoded can affect the performance of matching.
> The function `:string` does not perform case folding or Unicode Normalization of string values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dangerous. The message is not in NFC and the operand is; do we really want that to fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at a minimum we should recommend that messages be in NFC.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.w3.org/TR/charmod-norm/

Requiring implementations to do normalization can be a big ask (I spent a long time at W3C asking...) and there are counterarguments. We should certainly recommend that users stick to normalized sequences. That is what this note is trying to do. We can mention NFC here if you think that would help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would be useful. (I wasn't proposing requiring it.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also appreciate a recommendation for NFC normalization specifically for this comparison.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also appreciate a recommendation for NFC normalization specifically for this comparison.

If you mean requiring or suggesting that implementations normalize the key and value when performing the comparison, it is important that implementations NOT do that. That would make implementations work differently depending on whether they normalize or not.

spec/registry.md Outdated
- Else, if the value of `key` is a keyword:
- Let `keyword` be a string which is the result of [rule selection](#rule-selection).
- If `keyword` equals `key`, then `key` matches the selector.
Append `key` to the end of the `return_value` list.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree: 'onto' does not convey 'at the end of', and the goal is to prioritize these below the literal matches.

spec/registry.md Outdated

Number literals in the MessageFormat 2 syntax use the
[format defined for a JSON number](https://www.rfc-editor.org/rfc/rfc8259#section-6).
The resolved value of an `operand` exactly matches a numeric literal `key`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find anywhere where we define what "resolved value" means.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "resolved value" lacks a formal definition, but the various sorts of resolved values are defined here (in the formatting spec). This may be a gap.

I will do a PR with a formal definition because you're not the first to ask.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments should not stand in the way of approval.

> Unquoted string literals in a _variant_ do not include spaces.
> If users wish to match strings that include whitespace
> (including U+3000 `IDEOGRAPHIC SPACE`)
> to a key, the `key` needs to be quoted.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick:
s / needs to be quoted / MUST be quoted /
?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a non-normative note, so using MUST would be out of place.

Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comments marked as Blocking will need to be resolved; the others can be considered nitpicks.

@eemeli
Copy link
Collaborator

eemeli commented Feb 22, 2024

I'm removing the fast-track label from this PR, as it's not applicable for a normative spec change. Yes, we did discuss the earlier changes to the design doc with the intent to land them directly in the spec, but from the depth of discussion here it's obvious that we didn't manage to do so.

Just to be clear, I'm still fine with this landing without further discussion in a live call as long as the issues above are resolved; I just don't think it's appropriate to consider this "fast-tracked".

@eemeli eemeli removed the fast-track Editorial change permitted to use fast-track merge rules label Feb 22, 2024
aphillips and others added 9 commits February 22, 2024 06:33
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
@aphillips
Copy link
Member Author

Just to be clear, I'm still fine with this landing without further discussion in a live call as long as the issues above are resolved; I just don't think it's appropriate to consider this "fast-tracked".

Removing the label technically means holding this for telecon. I'm not going to do that. Note that, as promised in email, this will be followed "immediately" (in some hours, after I have prepared it) with an editorial-only PR (which will also have a fast-track label).

@aphillips
Copy link
Member Author

Merging this per my email on this topic.

@aphillips aphillips merged commit e5ba81e into main Feb 22, 2024
@aphillips aphillips deleted the aphillips-implement-default-registry branch February 22, 2024 16:38
Comment on lines +363 to +364
> .local $example = {|-1234.567| :number}
> {{{$num :number} == {$example}}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm? Doesn't $example contain the locale-dependent string -1,234.567 which doesn't conform to the grammar?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because $example is a resolved rather than a formatted value. So it's not a formatted string, but some implementation-defined value which when formatted to a string would format as e.g. -1,234.567, but it also contains the numerical value 1234.567.

> * {{You have {$var} chances remaining}}
>```

- `compactDisplay` (this option only has meaning when combined with the option `notation=compact`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not super comfortable with the decision of MF2 to name options without defining how they work, but the compromise I'm okay with is that things that have semantic meaning should be well specified, and things that are formatting hints can be left implementation-defined. We discussed this point at length in the extended MF meeting last Wednesday and decided to cut number formatting options that carry semantic meaning, such as currency and unit, until we can discuss their design further.

@aphillips
Copy link
Member Author

@sffc wrote (in part):

... and decided to cut number formatting options that carry semantic meaning, such as currency and unit, until we can discuss their design further.

Please notice #664

eemeli added a commit to messageformat/messageformat that referenced this pull request Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions Issue pertains to the default function set LDML45 LDML45 Release (Tech Preview) normative Issue affects normative text in the specification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants