-
-
Notifications
You must be signed in to change notification settings - Fork 35
Implement the default registry in the spec #659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR copies the number and date/time default functionality from design to the specification. The copy was done verbatim, except: - I moved some parts of number to make the structure logically consistent - I added an example to the `style=percent` note - I turned the information about the default selector into an "important" note
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed a couple more incorrect title levels, and a few of my previous line comments are still open. Most significantly, the style=percent
multiplication instead of division.
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs to be addressed: #659 (comment).
spec/registry.md
Outdated
|
||
The _operand_ of a number function is either an implementation-defined type or | ||
a literal that matches the `number-literal` production in the [ABNF](/spec/message.abnf). | ||
All other values produce a _Resolution Error_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a place to record conformance tests? This requires rejection of inputs that naïve implementations might accept, e.g.
| 1|
|.1|
|1.|
|01|
|+1|
|0x1|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do! See the folder 'test' in our repo. I have filed an issue to capture this comment.
spec/registry.md
Outdated
### Options | ||
|
||
The following options and their values are required in the default registry to be available on the | ||
function `:number`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This arrangement seems odd to me; I would expect options for :number
to be defined in the same section as the function itself rather than in a section introducing both :number
and :integer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggested organization of the sections is superior to what I have here. Once all other comments are cleared, I will reorganize that way.
spec/registry.md
Outdated
> [!NOTE] | ||
> The following options do not have default values because they are only to be used | ||
> as overrides for an existing locale-and-value dependent implementation-defined | ||
> default | ||
|
||
- `minimumFractionDigits` | ||
- (non-negative integer) | ||
- `maximumFractionDigits` | ||
- (non-negative integer) | ||
- `minimumSignificantDigits` | ||
- (non-negative integer) | ||
- `maximumSignificantDigits` | ||
- (non-negative integer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that this note is helpful; the same dynamic default behavior is also true of numberingSystem
.
> [!NOTE] | |
> The following options do not have default values because they are only to be used | |
> as overrides for an existing locale-and-value dependent implementation-defined | |
> default | |
- `minimumFractionDigits` | |
- (non-negative integer) | |
- `maximumFractionDigits` | |
- (non-negative integer) | |
- `minimumSignificantDigits` | |
- (non-negative integer) | |
- `maximumSignificantDigits` | |
- (non-negative integer) | |
- `minimumFractionDigits` | |
- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options) | |
- `maximumFractionDigits` | |
- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options) | |
- `minimumSignificantDigits` | |
- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options) | |
- `maximumSignificantDigits` | |
- (non-negative integer) (default is implementation-dependent and may depend upon locale and/or value and/or other options) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we instead put something above all of the options in a non-NOTE style that says something like:
Some options do not have default values defined in this specification.
The defaults for these options are implementation-dependent.
In general, these types of defaults depend on the locale, the value of other options,
or both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this. See what you think.
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
Also clarifies that the "data model" in this section is for the registry, not the other data model deliverable.
A **_<dfn>date/time literal value</dfn>_** is a non-empty string consisting of | ||
one of the following: | ||
- an XMLSchema 1.1 [dateTime](https://www.w3.org/TR/xmlschema11-2/#dateTime) | ||
- an XMLSchema 1.1 [time](https://www.w3.org/TR/xmlschema11-2/#time) | ||
- an XMLSchema 1.1 [date](https://www.w3.org/TR/xmlschema11-2/#date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should clarify which functions take which of these values. All can't be valid for all, as e.g. an XML time is insufficient input for :date
.
Also, are any timezones in literal inputs used in the formatted outputs? For example, when formatting
{12:34+06:00 :time}
is the resulting formatted time in the system timezone, or in +06?
How about UTC, what timezone does this render in?
{12:34Z :time}
As we've left out timeZone
from the available options, also leaving out timezone parsing would be one way to resolve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh boy, where do I start? 😈
The values can parse into various kinds of date/time objects. Many (most?) of these can be formatted by any of the functions given here, although the results may be not be very useful. Specifically, a time
formatted as a date or a date
formatted as a time is probably not very interesting.
XMLSchema date/time values do not have timezones. They only support zone offsets. The only zone offset that can be 100% certainly made into a time zone is Z
because that one means "UTC".
The time zone is very important in rendering date and time values. I argued at length in the previous PR (and tried to make the point in the call) that we should not excommunicate timezone
as an option to "future". Without that option, there are only the following options for managing time zone in formatting:
- the implementation uses the system default
- the implementation provides an API for setting the time zone of an entire message
(this is not a behavior we specify because it is beyond our remit, although probably
we should point it out) - the implementation assumes UTC
In addition, date/time literals with no offset should be handled as floating time values (even if the implementation uses some specific time zone, the value should be formatted so that the field values don't change--often this is done using UTC as the time zone, ignoring local settings)
Note well that "time zone parsing" doesn't exist here. The zone offset (usually include Z
) parses into the underlying incremental ("instant") value. That is, all of the following are the same time value:
2007-01-01T01:00:00.000-01:00
2007-01-01T01:00:00.000Z
2007-01-01T01:00:00.000+01:00
2006-12-31T23:00:00.000-01:00
Literals missing one part should be permitted to treat that part as 0
. Some examples:
Literal | Pattern | Output | Comment |
---|---|---|---|
2024-02-17 |
{$d :datetime dateStyle=medium} |
Feb 17, 2024 00:00 a.m. | No time; treat as floating |
12:34:56.000 |
{$d :datetime dateStyle=medium}` | Jan 1, 1970 12:34 p.m. | No date; treat as floating |
2024-02-17T12:34:56.000Z |
{$d :dateTime dateStyle=medium} | Feb 17, 2024 4:34:56 a.m. Pacific | Surprised? |
2024-02-17T12:34:56.000-08:00 |
{$d :dateTime dateStyle=medium} | Feb 17, 2024 12:34 p.m. Pacific |
If we didn't do this, users used to working with partial values would have messages fail instead of doing what their code usually does. The group will need to decide if we bring timeZone
back in Tech Preview. We should also discuss whether to specify any of this behavior.
Some of what I express above might be controversial, in that we could make different decisions about forcing floating time behavior or permitting partial values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to @eemeli... see disconnected conversation below ⬇️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still approving, just added more elaboration on the "literal" terminology
spec/registry.md
Outdated
|
||
### Operands | ||
|
||
The _operand_ is any literal or an implementation-defined set of string or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the typical processing model is:
- Given the string form of the literal embedded in the message.
- We fix any features that were used to make the literal fit the syntax: quotes, or quoted-escape sequences, so that we have it in a clean string.
- We convert into the internal datatype (eg BigDecimal, or Datetime) expected by the implementation's version of the function (eg :number or :datetime).
- We convert the accumulated options for the function also into the internal datatypes expected by the function.
- We pass everything to the function, plus an enum value specifying whether we want a string or a data-model part
- We get the result and do something interesting with it (insert it into a string buffer that we are building, or into a data model
The problem with "resolved value" is that some people might think of it as #2, and some people might think of it as #3. Maybe we can have a special term for just #2?
spec/registry.md
Outdated
|
||
|
||
> [!NOTE] | ||
> This should probably include individual character types, such as `char`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this means; wouldn't we just treat a char
as a string with a single code point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's exactly what I mean: don't overlook char
and java.lang.Character
spec/registry.md
Outdated
- Let `keys` be a list of strings containing keys to match. | ||
(Hint: this list is an argument to `MatchSelectorKeys`) | ||
- For each string `key` in `keys`: | ||
- If the value of `key` is equal to the string value of `operand` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not clear. If the key is |1| and the operand is "1", they wouldn't match. Is there some place that specifies that at this point in the algorithm the key has had its quotes removed and been unescaped?
> in each string. | ||
> As a result, variations in how text can be encoded can affect the performance of matching. | ||
> The function `:string` does not perform case folding or Unicode Normalization of string values. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is dangerous. The message is not in NFC and the operand is; do we really want that to fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at a minimum we should recommend that messages be in NFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://www.w3.org/TR/charmod-norm/
Requiring implementations to do normalization can be a big ask (I spent a long time at W3C asking...) and there are counterarguments. We should certainly recommend that users stick to normalized sequences. That is what this note is trying to do. We can mention NFC here if you think that would help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that would be useful. (I wasn't proposing requiring it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also appreciate a recommendation for NFC normalization specifically for this comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also appreciate a recommendation for NFC normalization specifically for this comparison.
If you mean requiring or suggesting that implementations normalize the key and value when performing the comparison, it is important that implementations NOT do that. That would make implementations work differently depending on whether they normalize or not.
spec/registry.md
Outdated
- Else, if the value of `key` is a keyword: | ||
- Let `keyword` be a string which is the result of [rule selection](#rule-selection). | ||
- If `keyword` equals `key`, then `key` matches the selector. | ||
Append `key` to the end of the `return_value` list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disagree: 'onto' does not convey 'at the end of', and the goal is to prioritize these below the literal matches.
spec/registry.md
Outdated
|
||
Number literals in the MessageFormat 2 syntax use the | ||
[format defined for a JSON number](https://www.rfc-editor.org/rfc/rfc8259#section-6). | ||
The resolved value of an `operand` exactly matches a numeric literal `key` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find anywhere where we define what "resolved value" means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The term "resolved value" lacks a formal definition, but the various sorts of resolved values are defined here (in the formatting spec). This may be a gap.
I will do a PR with a formal definition because you're not the first to ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments should not stand in the way of approval.
> Unquoted string literals in a _variant_ do not include spaces. | ||
> If users wish to match strings that include whitespace | ||
> (including U+3000 `IDEOGRAPHIC SPACE`) | ||
> to a key, the `key` needs to be quoted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick:
s / needs to be quoted / MUST be quoted /
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a non-normative note, so using MUST would be out of place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inline comments marked as Blocking will need to be resolved; the others can be considered nitpicks.
I'm removing the Just to be clear, I'm still fine with this landing without further discussion in a live call as long as the issues above are resolved; I just don't think it's appropriate to consider this "fast-tracked". |
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Removing the label technically means holding this for telecon. I'm not going to do that. Note that, as promised in email, this will be followed "immediately" (in some hours, after I have prepared it) with an editorial-only PR (which will also have a |
Merging this per my email on this topic. |
> .local $example = {|-1234.567| :number} | ||
> {{{$num :number} == {$example}}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm? Doesn't $example
contain the locale-dependent string -1,234.567
which doesn't conform to the grammar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because $example
is a resolved rather than a formatted value. So it's not a formatted string, but some implementation-defined value which when formatted to a string would format as e.g. -1,234.567
, but it also contains the numerical value 1234.567.
> * {{You have {$var} chances remaining}} | ||
>``` | ||
|
||
- `compactDisplay` (this option only has meaning when combined with the option `notation=compact`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not super comfortable with the decision of MF2 to name options without defining how they work, but the compromise I'm okay with is that things that have semantic meaning should be well specified, and things that are formatting hints can be left implementation-defined. We discussed this point at length in the extended MF meeting last Wednesday and decided to cut number formatting options that carry semantic meaning, such as currency
and unit
, until we can discuss their design further.
This PR copies the number and date/time default functionality from design to the specification.
The copy was done verbatim, except:
style=percent
note