Implement the default registry in the spec #659

aphillips · 2024-02-16T16:02:41Z

This PR copies the number and date/time default functionality from design to the specification.

The copy was done verbatim, except:

I moved some parts of number to make the structure logically consistent
I added an example to the style=percent note
I turned the information about the default selector into an "important" note
I added the surrounding document structure

This PR copies the number and date/time default functionality from design to the specification. The copy was done verbatim, except: - I moved some parts of number to make the structure logically consistent - I added an example to the `style=percent` note - I turned the information about the default selector into an "important" note

spec/registry.md

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

eemeli

Noticed a couple more incorrect title levels, and a few of my previous line comments are still open. Most significantly, the style=percent multiplication instead of division.

spec/registry.md

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

eemeli

This still needs to be addressed: #659 (comment).

spec/registry.md

gibson042 · 2024-02-16T22:28:52Z

spec/registry.md

+
+The _operand_ of a number function is either an implementation-defined type or
+a literal that matches the `number-literal` production in the [ABNF](/spec/message.abnf).
+All other values produce a _Resolution Error_.


Do we have a place to record conformance tests? This requires rejection of inputs that naïve implementations might accept, e.g.

| 1|

|.1|

|1.|

|01|

|+1|

|0x1|

Yes, we do! See the folder 'test' in our repo. I have filed an issue to capture this comment.

spec/registry.md

gibson042 · 2024-02-16T22:34:46Z

spec/registry.md

+### Options
+
+The following options and their values are required in the default registry to be available on the 
+function `:number`:


This arrangement seems odd to me; I would expect options for :number to be defined in the same section as the function itself rather than in a section introducing both :number and :integer.

Your suggested organization of the sections is superior to what I have here. Once all other comments are cleared, I will reorganize that way.

spec/registry.md

gibson042 · 2024-02-16T23:11:42Z

spec/registry.md

+> [!NOTE]
+> The following options do not have default values because they are only to be used
+> as overrides for an existing locale-and-value dependent implementation-defined
+> default
+
+- `minimumFractionDigits`
+  - (non-negative integer)
+- `maximumFractionDigits`
+  - (non-negative integer)
+- `minimumSignificantDigits`
+  - (non-negative integer)
+- `maximumSignificantDigits`
+  - (non-negative integer)


I'm not convinced that this note is helpful; the same dynamic default behavior is also true of numberingSystem.

Suggested change

> [!NOTE]

> The following options do not have default values because they are only to be used

> as overrides for an existing locale-and-value dependent implementation-defined

> default

- `minimumFractionDigits`

- (non-negative integer)

- `maximumFractionDigits`

- (non-negative integer)

- `minimumSignificantDigits`

- (non-negative integer)

- `maximumSignificantDigits`

- (non-negative integer)

- `minimumFractionDigits`

- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options)

- `maximumFractionDigits`

- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options)

- `minimumSignificantDigits`

- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options)

- `maximumSignificantDigits`

- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options)

Could we instead put something above all of the options in a non-NOTE style that says something like:

Some options do not have default values defined in this specification.
The defaults for these options are implementation-dependent.
In general, these types of defaults depend on the locale, the value of other options,
or both.

I implemented this. See what you think.

spec/registry.md

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

Also clarifies that the "data model" in this section is for the registry, not the other data model deliverable.

eemeli · 2024-02-17T18:32:53Z

spec/registry.md

+A **_<dfn>date/time literal value</dfn>_** is a non-empty string consisting of 
+one of the following:
+- an XMLSchema 1.1 [dateTime](https://www.w3.org/TR/xmlschema11-2/#dateTime)
+- an XMLSchema 1.1 [time](https://www.w3.org/TR/xmlschema11-2/#time)
+- an XMLSchema 1.1 [date](https://www.w3.org/TR/xmlschema11-2/#date)


We should clarify which functions take which of these values. All can't be valid for all, as e.g. an XML time is insufficient input for :date.

Also, are any timezones in literal inputs used in the formatted outputs? For example, when formatting

{12:34+06:00 :time}

is the resulting formatted time in the system timezone, or in +06?

How about UTC, what timezone does this render in?

{12:34Z :time}

As we've left out timeZone from the available options, also leaving out timezone parsing would be one way to resolve this.

Oh boy, where do I start? 😈

The values can parse into various kinds of date/time objects. Many (most?) of these can be formatted by any of the functions given here, although the results may be not be very useful. Specifically, a time formatted as a date or a date formatted as a time is probably not very interesting.

XMLSchema date/time values do not have timezones. They only support zone offsets. The only zone offset that can be 100% certainly made into a time zone is Z because that one means "UTC".

The time zone is very important in rendering date and time values. I argued at length in the previous PR (and tried to make the point in the call) that we should not excommunicate timezone as an option to "future". Without that option, there are only the following options for managing time zone in formatting:

the implementation uses the system default

the implementation provides an API for setting the time zone of an entire message
(this is not a behavior we specify because it is beyond our remit, although probably
we should point it out)

the implementation assumes UTC

In addition, date/time literals with no offset should be handled as floating time values (even if the implementation uses some specific time zone, the value should be formatted so that the field values don't change--often this is done using UTC as the time zone, ignoring local settings)

Note well that "time zone parsing" doesn't exist here. The zone offset (usually include Z) parses into the underlying incremental ("instant") value. That is, all of the following are the same time value:

2007-01-01T01:00:00.000-01:00

2007-01-01T01:00:00.000Z

2007-01-01T01:00:00.000+01:00

2006-12-31T23:00:00.000-01:00

Literals missing one part should be permitted to treat that part as 0. Some examples:

Literal Pattern Output Comment

2024-02-17 {$d :datetime dateStyle=medium} Feb 17, 2024 00:00 a.m. No time; treat as floating

12:34:56.000 {$d :datetime dateStyle=medium}` Jan 1, 1970 12:34 p.m. No date; treat as floating

2024-02-17T12:34:56.000Z {$d :dateTime dateStyle=medium} Feb 17, 2024 4:34:56 a.m. Pacific Surprised?

2024-02-17T12:34:56.000-08:00 {$d :dateTime dateStyle=medium} Feb 17, 2024 12:34 p.m. Pacific

If we didn't do this, users used to working with partial values would have messages fail instead of doing what their code usually does. The group will need to decide if we bring timeZone back in Tech Preview. We should also discuss whether to specify any of this behavior.

Some of what I express above might be controversial, in that we could make different decisions about forcing floating time behavior or permitting partial values.

Note to @eemeli... see disconnected conversation below ⬇️

catamorphism · 2024-02-21T22:30:58Z

spec/registry.md

+
+### Operands
+
+The _operand_ is any literal or an implementation-defined set of string or


Suggested change

The _operand_ is any literal or an implementation-defined set of string or

The _operand_ is any string generated by the

`*(quoted-char / quoted-escape)`, `name`, or `number-literal` productions in the grammar;

or an implementation-defined set of string or

I see what you're doing!

The name or number-literal productions are narrower than the first description (which is the guts of a quoted literal). That narrower production defines what's allowed in an unquoted literal, which has no real bearing here (it's a detail of the syntax).

The verb "generated" also bothers me, as one doesn't generate the operand. I kind of want to say the "resolved value of a literal", since one of the things that should happen is that the quoted-escape becomes unescaped in the data-model or in-memory representation. It's only escaped in the syntax to avoid conflicting with the grammar.

Perhaps:

Suggested change

The _operand_ is any literal or an implementation-defined set of string or

The _operand_ is any _literal_'s value less any quotes and with any `quoted-escape` sequences

converted to their escaped characters

or an implementation-defined set of string or

We should add an example too.

I rewrote and added an example. Thanks!

I think the typical processing model is:

Given the string form of the literal embedded in the message.

We fix any features that were used to make the literal fit the syntax: quotes, or quoted-escape sequences, so that we have it in a clean string.

We convert into the internal datatype (eg BigDecimal, or Datetime) expected by the implementation's version of the function (eg :number or :datetime).

We convert the accumulated options for the function also into the internal datatypes expected by the function.

We pass everything to the function, plus an enum value specifying whether we want a string or a data-model part

We get the result and do something interesting with it (insert it into a string buffer that we are building, or into a data model

The problem with "resolved value" is that some people might think of it as #2, and some people might think of it as #3. Maybe we can have a special term for just #2?

@macchiati That specific "typical" model_does not work for MessageFormat. Specifically, MF cannot convert the value to any kind of datatype. That is the responsibility of the function. This is because MF itself is typeless.

This does not mean that typed values passed in as arguments "lose their shape". Implementations can (and are expected) to do reflection or type checking upon them. But only string literal values exist inside MF2 syntax (you can use a variable to refer to a typed argument value).

This is why this specific PR talks about the literal form of the operand and then various "implementation-defined types". The expression {|2024-02-17| :datetime} is converted to a date inside the :datetime function. This is important because it allows types that MessageFormat does not know about itself to be supported via plug-in functions.

The term for 2 is "the resolved value of the literal". (More elsewhere on the problem of 'resolved value').

Note that "resolved" deals with transitive assignment cases:

.local $b = {$a} .local $c = {$b}

... to make the "resolved value" of $c be the value contained inside of $a (plus whatever happened along the way, e.g. if the first line were .local $b = {$a :transform operation=uppercase} you might expect $c to be uppercase.).

In this specific case, we are talking about the :string function and we're talking about passing it a string. What concrete change is needed?

I was confused as to what you meant by a 'resolved value', and could see two possible interpretations. My comment was specific to :string (because as you say the results are always string comparisons anyway).

The spec always has to work in terms of strings, because it knows nothing about datatypes, and can't require particular datatypes because they vary by environment and implementation capabilities.

However, I would expect many MF2 implementations to actually operate with inputs that are not strings, but rather datatypes: both primitive and non-primitive. We can't have a spec that precludes implementations of MF2 from doing that. And those implementations will want to keep the values in datatypes where feasible, instead of always stringifying.

So, for example, it should be possible for an implementation to compile a MF2 into a data structure whereby when it knows that $input is a long, it can convert a literal for matching also into a long in that data structure, and just do long comparison.

Similarly, for plural matching, the compiled data structure could contain an enum value `few', instead of converting the an internal value for the plural category into a string and then doing a string comparison.

catamorphism

Still approving, just added more elaboration on the "literal" terminology

macchiati · 2024-02-21T23:01:12Z

spec/registry.md

+
+### Operands
+
+The _operand_ is any literal or an implementation-defined set of string or


I think the typical processing model is:

Given the string form of the literal embedded in the message.

We fix any features that were used to make the literal fit the syntax: quotes, or quoted-escape sequences, so that we have it in a clean string.

We convert into the internal datatype (eg BigDecimal, or Datetime) expected by the implementation's version of the function (eg :number or :datetime).

We convert the accumulated options for the function also into the internal datatypes expected by the function.

We pass everything to the function, plus an enum value specifying whether we want a string or a data-model part

We get the result and do something interesting with it (insert it into a string buffer that we are building, or into a data model

The problem with "resolved value" is that some people might think of it as #2, and some people might think of it as #3. Maybe we can have a special term for just #2?

macchiati · 2024-02-21T23:44:01Z

spec/registry.md

+
+
+> [!NOTE]
+> This should probably include individual character types, such as `char`.


Not sure what this means; wouldn't we just treat a char as a string with a single code point?

That's exactly what I mean: don't overlook char and java.lang.Character

macchiati · 2024-02-21T23:51:15Z

spec/registry.md

+- Let `keys` be a list of strings containing keys to match.
+  (Hint: this list is an argument to `MatchSelectorKeys`)
+- For each string `key` in `keys`:
+   - If the value of `key` is equal to the string value of `operand`


This is not clear. If the key is |1| and the operand is "1", they wouldn't match. Is there some place that specifies that at this point in the algorithm the key has had its quotes removed and been unescaped?

macchiati · 2024-02-21T23:53:29Z

spec/registry.md

+> in each string.
+> As a result, variations in how text can be encoded can affect the performance of matching.
+> The function `:string` does not perform case folding or Unicode Normalization of string values.
+


This is dangerous. The message is not in NFC and the operand is; do we really want that to fail?

I think at a minimum we should recommend that messages be in NFC.

https://www.w3.org/TR/charmod-norm/

Requiring implementations to do normalization can be a big ask (I spent a long time at W3C asking...) and there are counterarguments. We should certainly recommend that users stick to normalized sequences. That is what this note is trying to do. We can mention NFC here if you think that would help.

Yes, I think that would be useful. (I wasn't proposing requiring it.)

I would also appreciate a recommendation for NFC normalization specifically for this comparison.

I would also appreciate a recommendation for NFC normalization specifically for this comparison.

If you mean requiring or suggesting that implementations normalize the key and value when performing the comparison, it is important that implementations NOT do that. That would make implementations work differently depending on whether they normalize or not.

macchiati · 2024-02-21T23:57:34Z

spec/registry.md

+   - Else, if the value of `key` is a keyword:
+      - Let `keyword` be a string which is the result of [rule selection](#rule-selection).
+      - If `keyword` equals `key`, then `key` matches the selector.
+        Append `key` to the end of the `return_value` list.


Disagree: 'onto' does not convey 'at the end of', and the goal is to prioritize these below the literal matches.

macchiati · 2024-02-22T00:01:29Z

spec/registry.md

+
+Number literals in the MessageFormat 2 syntax use the 
+[format defined for a JSON number](https://www.rfc-editor.org/rfc/rfc8259#section-6).
+The resolved value of an `operand` exactly matches a numeric literal `key`


I can't find anywhere where we define what "resolved value" means.

The term "resolved value" lacks a formal definition, but the various sorts of resolved values are defined here (in the formatting spec). This may be a gap.

I will do a PR with a formal definition because you're not the first to ask.

Sounds good.

macchiati

My comments should not stand in the way of approval.

mihnita · 2024-02-22T04:18:14Z

spec/registry.md

+> Unquoted string literals in a _variant_ do not include spaces.
+> If users wish to match strings that include whitespace
+> (including U+3000 `IDEOGRAPHIC SPACE`)
+> to a key, the `key` needs to be quoted.


Nitpick:
s / needs to be quoted / MUST be quoted /
?

This is a non-normative note, so using MUST would be out of place.

eemeli

The inline comments marked as Blocking will need to be resolved; the others can be considered nitpicks.

spec/registry.md

eemeli · 2024-02-22T12:55:56Z

I'm removing the fast-track label from this PR, as it's not applicable for a normative spec change. Yes, we did discuss the earlier changes to the design doc with the intent to land them directly in the spec, but from the depth of discussion here it's obvious that we didn't manage to do so.

Just to be clear, I'm still fine with this landing without further discussion in a live call as long as the issues above are resolved; I just don't think it's appropriate to consider this "fast-tracked".

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

aphillips · 2024-02-22T16:37:14Z

Just to be clear, I'm still fine with this landing without further discussion in a live call as long as the issues above are resolved; I just don't think it's appropriate to consider this "fast-tracked".

Removing the label technically means holding this for telecon. I'm not going to do that. Note that, as promised in email, this will be followed "immediately" (in some hours, after I have prepared it) with an editorial-only PR (which will also have a fast-track label).

aphillips · 2024-02-22T16:38:26Z

Merging this per my email on this topic.

sffc · 2024-02-22T15:54:09Z

spec/registry.md

+> .local $example = {|-1234.567| :number}
+> {{{$num :number} == {$example}}}


Hm? Doesn't $example contain the locale-dependent string -1,234.567 which doesn't conform to the grammar?

No, because $example is a resolved rather than a formatted value. So it's not a formatted string, but some implementation-defined value which when formatted to a string would format as e.g. -1,234.567, but it also contains the numerical value 1234.567.

sffc · 2024-02-22T15:59:03Z

spec/registry.md

+> *   {{You have {$var} chances remaining}}
+>```
+
+- `compactDisplay` (this option only has meaning when combined with the option `notation=compact`)


I'm still not super comfortable with the decision of MF2 to name options without defining how they work, but the compromise I'm okay with is that things that have semantic meaning should be well specified, and things that are formatting hints can be left implementation-defined. We discussed this point at length in the extended MF meeting last Wednesday and decided to cut number formatting options that carry semantic meaning, such as currency and unit, until we can discuss their design further.

aphillips · 2024-02-22T18:45:33Z

@sffc wrote (in part):

... and decided to cut number formatting options that carry semantic meaning, such as currency and unit, until we can discuss their design further.

Please notice #664

aphillips added functions Issue pertains to the default function set fast-track Editorial change permitted to use fast-track merge rules normative Issue affects normative text in the specification LDML45 LDML45 Release (Tech Preview) labels Feb 16, 2024

aphillips requested review from stasm, catamorphism, eemeli, sffc, gibson042, mihnita and macchiati February 16, 2024 16:02

eemeli requested changes Feb 16, 2024

View reviewed changes

aphillips and others added 2 commits February 16, 2024 09:00

Apply suggestions from code review

5eb39b1

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

Update spec/registry.md

8dcea1b

aphillips requested a review from eemeli February 16, 2024 17:04

eemeli requested changes Feb 16, 2024

View reviewed changes

spec/registry.md Outdated Show resolved Hide resolved

spec/registry.md Outdated Show resolved Hide resolved

aphillips and others added 4 commits February 16, 2024 10:22

Update spec/registry.md

41dcf0a

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

Update spec/registry.md

4ed4bef

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

Update spec/registry.md

7d29fc2

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

Fix heading levels

33c2f59

aphillips requested a review from eemeli February 16, 2024 21:40

eemeli requested changes Feb 16, 2024

View reviewed changes

gibson042 reviewed Feb 16, 2024

View reviewed changes

aphillips and others added 2 commits February 17, 2024 08:07

Apply suggestions from code review

bae3d60

Co-authored-by: Richard Gibson <richard.gibson@gmail.com>

Address @gibson042 comments

69fba37

aphillips requested review from eemeli and gibson042 February 17, 2024 16:15

Add :string

1856353

Also clarifies that the "data model" in this section is for the registry, not the other data model deliverable.

eemeli requested changes Feb 17, 2024

View reviewed changes

catamorphism reviewed Feb 21, 2024

View reviewed changes

catamorphism approved these changes Feb 21, 2024

View reviewed changes

Address comments about literals

4faea71

macchiati mentioned this pull request Feb 21, 2024

Other issues in the Registry section. #677

Closed

macchiati reviewed Feb 22, 2024

View reviewed changes

Mention/require (SHOULD) users to use NFC

b17e183

macchiati approved these changes Feb 22, 2024

View reviewed changes

mihnita approved these changes Feb 22, 2024

View reviewed changes

eemeli requested changes Feb 22, 2024

View reviewed changes

eemeli removed the fast-track Editorial change permitted to use fast-track merge rules label Feb 22, 2024

aphillips and others added 9 commits February 22, 2024 06:33

Update spec/registry.md

669679c

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

Update spec/registry.md

1dd0a48

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

Update spec/registry.md

22176c7

Update spec/registry.md

23f42db

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

Update spec/registry.md

8bc08f2

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

Apply suggestions from code review

59f7b99

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

Fix my mistake about _style options_

6464d70

Update spec/registry.md

e4133b4

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

Update spec/registry.md

42a39e7

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

eemeli approved these changes Feb 22, 2024

View reviewed changes

Fix the Java example

26d8f58

aphillips merged commit e5ba81e into main Feb 22, 2024

aphillips deleted the aphillips-implement-default-registry branch February 22, 2024 16:38

sffc reviewed Feb 22, 2024

View reviewed changes

aphillips mentioned this pull request Feb 22, 2024

Reformat default registry #685

Merged

eemeli added a commit to messageformat/messageformat that referenced this pull request Feb 23, 2024

feat(mf2): Add :date and :time (unicode-org/message-format-wg#659)

2f3f6a5

Literal	Pattern	Output	Comment
`2024-02-17`	`{$d :datetime dateStyle=medium}`	Feb 17, 2024 00:00 a.m.	No time; treat as floating
`12:34:56.000`	{$d :datetime dateStyle=medium}`	Jan 1, 1970 12:34 p.m.	No date; treat as floating
`2024-02-17T12:34:56.000Z`	{$d :dateTime dateStyle=medium}	Feb 17, 2024 4:34:56 a.m. Pacific	Surprised?
`2024-02-17T12:34:56.000-08:00`	{$d :dateTime dateStyle=medium}	Feb 17, 2024 12:34 p.m. Pacific


		### Operands

		The _operand_ is any literal or an implementation-defined set of string or



		> [!NOTE]
		> This should probably include individual character types, such as `char`.

		> .local $example = {\|-1234.567\| :number}
		> {{{$num :number} == {$example}}}

Uh oh!

Implement the default registry in the spec #659

Implement the default registry in the spec #659

Uh oh!

Conversation

aphillips commented Feb 16, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eemeli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

eemeli left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aphillips Feb 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

catamorphism left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eemeli left a comment •

edited

Loading

aphillips Feb 17, 2024 •

edited

Loading