-
-
Notifications
You must be signed in to change notification settings - Fork 36
Implement the default registry in the spec #659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR copies the number and date/time default functionality from design to the specification. The copy was done verbatim, except: - I moved some parts of number to make the structure logically consistent - I added an example to the `style=percent` note - I turned the information about the default selector into an "important" note
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed a couple more incorrect title levels, and a few of my previous line comments are still open. Most significantly, the style=percent
multiplication instead of division.
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be addressed: #659 (comment).
spec/registry.md
Outdated
|
||
The _operand_ of a number function is either an implementation-defined type or | ||
a literal that matches the `number-literal` production in the [ABNF](/spec/message.abnf). | ||
All other values produce a _Resolution Error_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a place to record conformance tests? This requires rejection of inputs that naïve implementations might accept, e.g.
| 1|
|.1|
|1.|
|01|
|+1|
|0x1|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do! See the folder 'test' in our repo. I have filed an issue to capture this comment.
spec/registry.md
Outdated
### Options | ||
|
||
The following options and their values are required in the default registry to be available on the | ||
function `:number`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This arrangement seems odd to me; I would expect options for :number
to be defined in the same section as the function itself rather than in a section introducing both :number
and :integer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggested organization of the sections is superior to what I have here. Once all other comments are cleared, I will reorganize that way.
spec/registry.md
Outdated
> [!NOTE] | ||
> The following options do not have default values because they are only to be used | ||
> as overrides for an existing locale-and-value dependent implementation-defined | ||
> default | ||
|
||
- `minimumFractionDigits` | ||
- (non-negative integer) | ||
- `maximumFractionDigits` | ||
- (non-negative integer) | ||
- `minimumSignificantDigits` | ||
- (non-negative integer) | ||
- `maximumSignificantDigits` | ||
- (non-negative integer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this note is helpful; the same dynamic default behavior is also true of numberingSystem
.
> [!NOTE] | |
> The following options do not have default values because they are only to be used | |
> as overrides for an existing locale-and-value dependent implementation-defined | |
> default | |
- `minimumFractionDigits` | |
- (non-negative integer) | |
- `maximumFractionDigits` | |
- (non-negative integer) | |
- `minimumSignificantDigits` | |
- (non-negative integer) | |
- `maximumSignificantDigits` | |
- (non-negative integer) | |
- `minimumFractionDigits` | |
- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options) | |
- `maximumFractionDigits` | |
- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options) | |
- `minimumSignificantDigits` | |
- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options) | |
- `maximumSignificantDigits` | |
- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we instead put something above all of the options in a non-NOTE style that says something like:
Some options do not have default values defined in this specification.
The defaults for these options are implementation-dependent.
In general, these types of defaults depend on the locale, the value of other options,
or both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this. See what you think.
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
Also clarifies that the "data model" in this section is for the registry, not the other data model deliverable.
A **_<dfn>date/time literal value</dfn>_** is a non-empty string consisting of | ||
one of the following: | ||
- an XMLSchema 1.1 [dateTime](https://www.w3.org/TR/xmlschema11-2/#dateTime) | ||
- an XMLSchema 1.1 [time](https://www.w3.org/TR/xmlschema11-2/#time) | ||
- an XMLSchema 1.1 [date](https://www.w3.org/TR/xmlschema11-2/#date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should clarify which functions take which of these values. All can't be valid for all, as e.g. an XML time is insufficient input for :date
.
Also, are any timezones in literal inputs used in the formatted outputs? For example, when formatting
{12:34+06:00 :time}
is the resulting formatted time in the system timezone, or in +06?
How about UTC, what timezone does this render in?
{12:34Z :time}
As we've left out timeZone
from the available options, also leaving out timezone parsing would be one way to resolve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh boy, where do I start? 😈
The values can parse into various kinds of date/time objects. Many (most?) of these can be formatted by any of the functions given here, although the results may be not be very useful. Specifically, a time
formatted as a date or a date
formatted as a time is probably not very interesting.
XMLSchema date/time values do not have timezones. They only support zone offsets. The only zone offset that can be 100% certainly made into a time zone is Z
because that one means "UTC".
The time zone is very important in rendering date and time values. I argued at length in the previous PR (and tried to make the point in the call) that we should not excommunicate timezone
as an option to "future". Without that option, there are only the following options for managing time zone in formatting:
- the implementation uses the system default
- the implementation provides an API for setting the time zone of an entire message
(this is not a behavior we specify because it is beyond our remit, although probably
we should point it out) - the implementation assumes UTC
In addition, date/time literals with no offset should be handled as floating time values (even if the implementation uses some specific time zone, the value should be formatted so that the field values don't change--often this is done using UTC as the time zone, ignoring local settings)
Note well that "time zone parsing" doesn't exist here. The zone offset (usually include Z
) parses into the underlying incremental ("instant") value. That is, all of the following are the same time value:
2007-01-01T01:00:00.000-01:00
2007-01-01T01:00:00.000Z
2007-01-01T01:00:00.000+01:00
2006-12-31T23:00:00.000-01:00
Literals missing one part should be permitted to treat that part as 0
. Some examples:
Literal | Pattern | Output | Comment |
---|---|---|---|
2024-02-17 |
{$d :datetime dateStyle=medium} |
Feb 17, 2024 00:00 a.m. | No time; treat as floating |
12:34:56.000 |
{$d :datetime dateStyle=medium}` | Jan 1, 1970 12:34 p.m. | No date; treat as floating |
2024-02-17T12:34:56.000Z |
{$d :dateTime dateStyle=medium} | Feb 17, 2024 4:34:56 a.m. Pacific | Surprised? |
2024-02-17T12:34:56.000-08:00 |
{$d :dateTime dateStyle=medium} | Feb 17, 2024 12:34 p.m. Pacific |
If we didn't do this, users used to working with partial values would have messages fail instead of doing what their code usually does. The group will need to decide if we bring timeZone
back in Tech Preview. We should also discuss whether to specify any of this behavior.
Some of what I express above might be controversial, in that we could make different decisions about forcing floating time behavior or permitting partial values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to @eemeli... see disconnected conversation below ⬇️
spec/registry.md
Outdated
|
||
### Operands | ||
|
||
The _operand_ is any literal or an implementation-defined set of string or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _operand_ is any literal or an implementation-defined set of string or | |
The _operand_ is any string generated by the | |
`*(quoted-char / quoted-escape)`, `name`, or `number-literal` productions in the grammar; | |
or an implementation-defined set of string or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're doing!
The name
or number-literal
productions are narrower than the first description (which is the guts of a quoted literal). That narrower production defines what's allowed in an unquoted literal, which has no real bearing here (it's a detail of the syntax).
The verb "generated" also bothers me, as one doesn't generate the operand. I kind of want to say the "resolved value of a literal", since one of the things that should happen is that the quoted-escape
becomes unescaped in the data-model or in-memory representation. It's only escaped in the syntax to avoid conflicting with the grammar.
Perhaps:
The _operand_ is any literal or an implementation-defined set of string or | |
The _operand_ is any _literal_'s value less any quotes and with any `quoted-escape` sequences | |
converted to their escaped characters | |
or an implementation-defined set of string or |
We should add an example too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote and added an example. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the typical processing model is:
- Given the string form of the literal embedded in the message.
- We fix any features that were used to make the literal fit the syntax: quotes, or quoted-escape sequences, so that we have it in a clean string.
- We convert into the internal datatype (eg BigDecimal, or Datetime) expected by the implementation's version of the function (eg :number or :datetime).
- We convert the accumulated options for the function also into the internal datatypes expected by the function.
- We pass everything to the function, plus an enum value specifying whether we want a string or a data-model part
- We get the result and do something interesting with it (insert it into a string buffer that we are building, or into a data model
The problem with "resolved value" is that some people might think of it as #2, and some people might think of it as #3. Maybe we can have a special term for just #2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@macchiati That specific "typical" model_does not work for MessageFormat. Specifically, MF cannot convert the value to any kind of datatype. That is the responsibility of the function. This is because MF itself is typeless.
This does not mean that typed values passed in as arguments "lose their shape". Implementations can (and are expected) to do reflection or type checking upon them. But only string literal values exist inside MF2 syntax (you can use a variable to refer to a typed argument value).
This is why this specific PR talks about the literal form of the operand and then various "implementation-defined types". The expression {|2024-02-17| :datetime}
is converted to a date inside the :datetime
function. This is important because it allows types that MessageFormat does not know about itself to be supported via plug-in functions.
The term for 2 is "the resolved value of the literal". (More elsewhere on the problem of 'resolved value').
Note that "resolved" deals with transitive assignment cases:
.local $b = {$a}
.local $c = {$b}
... to make the "resolved value" of $c
be the value contained inside of $a
(plus whatever happened along the way, e.g. if the first line were .local $b = {$a :transform operation=uppercase}
you might expect $c
to be uppercase.).
In this specific case, we are talking about the :string
function and we're talking about passing it a string. What concrete change is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused as to what you meant by a 'resolved value', and could see two possible interpretations. My comment was specific to :string (because as you say the results are always string comparisons anyway).
The spec always has to work in terms of strings, because it knows nothing about datatypes, and can't require particular datatypes because they vary by environment and implementation capabilities.
However, I would expect many MF2 implementations to actually operate with inputs that are not strings, but rather datatypes: both primitive and non-primitive. We can't have a spec that precludes implementations of MF2 from doing that. And those implementations will want to keep the values in datatypes where feasible, instead of always stringifying.
So, for example, it should be possible for an implementation to compile a MF2 into a data structure whereby when it knows that $input is a long, it can convert a literal for matching also into a long in that data structure, and just do long comparison.
Similarly, for plural matching, the compiled data structure could contain an enum value `few', instead of converting the an internal value for the plural category into a string and then doing a string comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still approving, just added more elaboration on the "literal" terminology
spec/registry.md
Outdated
|
||
### Operands | ||
|
||
The _operand_ is any literal or an implementation-defined set of string or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the typical processing model is:
- Given the string form of the literal embedded in the message.
- We fix any features that were used to make the literal fit the syntax: quotes, or quoted-escape sequences, so that we have it in a clean string.
- We convert into the internal datatype (eg BigDecimal, or Datetime) expected by the implementation's version of the function (eg :number or :datetime).
- We convert the accumulated options for the function also into the internal datatypes expected by the function.
- We pass everything to the function, plus an enum value specifying whether we want a string or a data-model part
- We get the result and do something interesting with it (insert it into a string buffer that we are building, or into a data model
The problem with "resolved value" is that some people might think of it as #2, and some people might think of it as #3. Maybe we can have a special term for just #2?
spec/registry.md
Outdated
|
||
|
||
> [!NOTE] | ||
> This should probably include individual character types, such as `char`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this means; wouldn't we just treat a char
as a string with a single code point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly what I mean: don't overlook char
and java.lang.Character
spec/registry.md
Outdated
- Let `keys` be a list of strings containing keys to match. | ||
(Hint: this list is an argument to `MatchSelectorKeys`) | ||
- For each string `key` in `keys`: | ||
- If the value of `key` is equal to the string value of `operand` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not clear. If the key is |1| and the operand is "1", they wouldn't match. Is there some place that specifies that at this point in the algorithm the key has had its quotes removed and been unescaped?
> in each string. | ||
> As a result, variations in how text can be encoded can affect the performance of matching. | ||
> The function `:string` does not perform case folding or Unicode Normalization of string values. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dangerous. The message is not in NFC and the operand is; do we really want that to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at a minimum we should recommend that messages be in NFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.w3.org/TR/charmod-norm/
Requiring implementations to do normalization can be a big ask (I spent a long time at W3C asking...) and there are counterarguments. We should certainly recommend that users stick to normalized sequences. That is what this note is trying to do. We can mention NFC here if you think that would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that would be useful. (I wasn't proposing requiring it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also appreciate a recommendation for NFC normalization specifically for this comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also appreciate a recommendation for NFC normalization specifically for this comparison.
If you mean requiring or suggesting that implementations normalize the key and value when performing the comparison, it is important that implementations NOT do that. That would make implementations work differently depending on whether they normalize or not.
spec/registry.md
Outdated
- Else, if the value of `key` is a keyword: | ||
- Let `keyword` be a string which is the result of [rule selection](#rule-selection). | ||
- If `keyword` equals `key`, then `key` matches the selector. | ||
Append `key` to the end of the `return_value` list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disagree: 'onto' does not convey 'at the end of', and the goal is to prioritize these below the literal matches.
spec/registry.md
Outdated
|
||
Number literals in the MessageFormat 2 syntax use the | ||
[format defined for a JSON number](https://www.rfc-editor.org/rfc/rfc8259#section-6). | ||
The resolved value of an `operand` exactly matches a numeric literal `key` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find anywhere where we define what "resolved value" means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term "resolved value" lacks a formal definition, but the various sorts of resolved values are defined here (in the formatting spec). This may be a gap.
I will do a PR with a formal definition because you're not the first to ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments should not stand in the way of approval.
> Unquoted string literals in a _variant_ do not include spaces. | ||
> If users wish to match strings that include whitespace | ||
> (including U+3000 `IDEOGRAPHIC SPACE`) | ||
> to a key, the `key` needs to be quoted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick:
s / needs to be quoted / MUST be quoted /
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a non-normative note, so using MUST would be out of place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inline comments marked as Blocking will need to be resolved; the others can be considered nitpicks.
I'm removing the Just to be clear, I'm still fine with this landing without further discussion in a live call as long as the issues above are resolved; I just don't think it's appropriate to consider this "fast-tracked". |
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Removing the label technically means holding this for telecon. I'm not going to do that. Note that, as promised in email, this will be followed "immediately" (in some hours, after I have prepared it) with an editorial-only PR (which will also have a |
Merging this per my email on this topic. |
> .local $example = {|-1234.567| :number} | ||
> {{{$num :number} == {$example}}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm? Doesn't $example
contain the locale-dependent string -1,234.567
which doesn't conform to the grammar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because $example
is a resolved rather than a formatted value. So it's not a formatted string, but some implementation-defined value which when formatted to a string would format as e.g. -1,234.567
, but it also contains the numerical value 1234.567.
> * {{You have {$var} chances remaining}} | ||
>``` | ||
|
||
- `compactDisplay` (this option only has meaning when combined with the option `notation=compact`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not super comfortable with the decision of MF2 to name options without defining how they work, but the compromise I'm okay with is that things that have semantic meaning should be well specified, and things that are formatting hints can be left implementation-defined. We discussed this point at length in the extended MF meeting last Wednesday and decided to cut number formatting options that carry semantic meaning, such as currency
and unit
, until we can discuss their design further.
This PR copies the number and date/time default functionality from design to the specification.
The copy was done verbatim, except:
style=percent
note