-
-
Notifications
You must be signed in to change notification settings - Fork 36
Add Literal Resolution section to formatting.md #382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -8,6 +8,15 @@ when formatting a message for display in a user interface, or for some later pro | |||
The document is part of the MessageFormat 2.0 specification, | |||
the successor to ICU MessageFormat, henceforth called ICU MessageFormat 1.0. | |||
|
|||
## Literal Resolution | |||
|
|||
The resolved value of _text_, _literal_ and _nmtoken_ tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to include unquoted
if/once #364 is accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "value" seems to be getting used in multiple ways in this paragraph.
The first sentence refers to "the resolved value of text, literal and nmtoken tokens"; if resolution is the relation that maps character strings in the language defined by the ABNF onto values, then I understand "value" as being used semantically here. The spec doesn't (yet?) define criteria for membership in this set of semantic values, not in the precise way that the ABNF defines membership in the set of syntactically valid messages.
However, the next sentence refers to an "option value", which I take as being a syntactic concept: the token that appears on the right-hand side of the '=' in the option
nonterminal into the ABNF.
Defining "value" and "resolution" before these terms are used, and replacing "or option value" with "on the right-hand side of an option", might help clarify things. (This could be done in the glossary, which uses "value" many times without defining it (possibly not always with the same meaning), and doesn't define "resolution", and cross-referenced here; could be in a future PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intended meaning of "resolved value" here is the value that will ultimately get formatted. So for an unquoted literal 42
, it would be the string '42'
, while for a quoted literal |foo\|bar|
, it would be the string 'foo|bar'
. For a variable reference $foo
, it would be the value of the variable, which could really be anything.
My intent would be to explain this term as a part of the bigger formatting PR I'm now working on.
Renaming "option value" does sound like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as "resolved"/"resolution" are defined in the bigger PR, I'm fine with leaving those terms undefined in this one.
spec/formatting.md
Outdated
The resolved value of _text_, _literal_ and _nmtoken_ tokens | ||
is always a string concatenation of its parts, | ||
with escape sequences resolving to their escaped characters. | ||
When a literal value is used as a formatting function argument or option value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is a normative requirement, I would suggest making this its own paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only non-normative part here is the "... such that e.g. ..." example. Would it be better if that were separated into its own paragraph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm saying something different, which is: make each normative requirement start its own paragraph (unless there is a good reason not to). That makes it easier to find each requirement and check that, for example, it has tests or is complied to by one's implementation.
spec/formatting.md
Outdated
the formatting function MUST treat option values the same independently of their presentation, | ||
such that e.g. the options `foo=42` and `foo=|42|` have the same effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might not be correct or that it has the potential to be incorrect?
The reason we define nmtoken
separate from literal
(or unquoted
) is because it fulfills a separate role. For example, in a plural
selector, the key
values can include nmtoken
values from an array of keywords (zero
, one
, two
, few
, many
, other
/*
) and separately certain literal values. A formatter such as the number formatter might accept a number of nmtoken
keyword arguments (integer
, percent
) but might also allow literals in the same argument.
Admittedly I can't think of a use case like this currently and it would be a poor formatter design that depended solely on the difference between integer
and |integer|
to know if the value were meant to be a token or a string. In fact, the more I look at this, the more I tend to think that nmtoken
might need to go (reserving literal values as keywords in key
and option
could be done in the registry)? It would certainly simplify the ABNF.
The downside of removing nmtoken
is that the nmtoken
production allows key
values that start with -
and :
, such as when -42
or when :foo
, without quoting the values (|-42|
and |:foo|
). I don't care about :
(in fact, I think it's confusing to allow it), but the minus sign feels important for operating with numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the more I look at this, the more I tend to think that
nmtoken
might need to go (reserving literal values as keywords inkey
andoption
could be done in the registry)? It would certainly simplify the ABNF.
Could you clarify whether you're suggesting that the registry ought to influence what's valid syntax? That would be rather problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not suggesting that the registry would influence what is valid syntax at the ABNF level. What I am suggesting is that the registry might determine what options are valid for a given formatter or selector, e.g. when few
is valid (or perhaps "valid") for :plural
but when fex
is not, even though both are syntactically valid. (In fact, according to formatting.md, it is a selector error).
Another way of saying what I'm saying above is that the key
of a when
statement is always a literal, whose interpretation is selector-specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. That's included in the current registry PR #368 as e.g. <match values="one other"/>
.
message-format-wg/spec/registry.dtd
Lines 34 to 36 in 39f13a9
<!ELEMENT match EMPTY> | |
<!ATTLIST match values NMTOKENS #IMPLIED> | |
<!ATTLIST match pattern NMTOKEN #IMPLIED> |
spec/formatting.md
Outdated
is always a string concatenation of its parts, | ||
with escape sequences resolving to their escaped characters. | ||
When a literal value is used as a formatting function argument or option value, | ||
the formatting function MUST treat option values the same independently of their presentation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way of saying this -- if a definition of the term "value" (that is, the range of the resolution function whose domain is the formal language defined by the ABNF) is added -- is that the formatting function is defined on semantic values rather than syntactically. In other words, the resolution relation maps both 42
and |42|
to the same semantic value; if semantic values are the domain of the formatting function, then it's impossible by construction for the formatting function to distinguish the two. (Unless the value domain is defined so that 42
and |42|
are distinguishable, but even in that case, writing down the meta-language that describes the value domain would make it easier to define where the formatting function should treat different values as equivalent to each other.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going a bit further, I sensed on our call yesterday for us to have a consensus on formatting functions not being able to determine if an option has been set from a literal or variable. As in, a formatting function receiving a value '42'
for the foo
option would not know if the message had set foo=42
, foo=|42|
, or foo=$bar
where $bar
had a value of '42'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, given the consensus, I think it would make sense to rephrase this as a requirement on whatever calls the functions (the formatter?), not on the functions itself. It's true that the formatting function must treat option values the same independently of their presentation, but it's also impossible for it to do otherwise!
A way to phrase it might be that the domain of the formatting function is resolved values, and then if you wanted to add examples, one example could be that 42
, |42|
, and $bar
all map to the same resolved value (assuming an environment in which bar
is bound to 42), specifically '42'
.
spec/formatting.md
Outdated
The resolved value of _text_, _literal_ and _nmtoken_ tokens | ||
is always a string concatenation of its parts, | ||
with escape sequences resolving to their escaped characters. | ||
When a literal value is used as a formatting function argument or option value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure whether the word "value" is meant to be syntactic or semantic in "a literal value". It might be clearer (if more verbose) to write:
"When a text, literal, or nmtoken token is used as a formatting function argument or option value..."
The current wording, if read literally, implies that the requirement to not distinguish between different syntactic constructs with the same meaning only implies to literals. Which would be confusing, since I don't think it's possible to write two lexically distinct literals with the same meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If #364 is accepted, this should clear up a bit as the ABNF literal becomes the only thing this applies to. But yes, also here should avoid the dread term "value".
@aphillips @catamorphism I've updated the PR following your suggestions; could you re-review and potentially close any/all of the above discussions, or clarify if there are further changes you'd like to see? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's just one change I would still request (left as a line comment just now). Everything else looks good now! I don't think that the Github UI gives me the ability to close the previous discussions because I submitted them as "comments" rather than "request changes", but if you have that ability, it's fine to close them.
with escape sequences resolving to their escaped characters. | ||
When a _literal_ or _nmtoken_ is used as an _expression_ argument | ||
or on the right-hand side of an _option_, | ||
the formatting function MUST treat their resolved values the same independently of their presentation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth re-wording this to clarify that the contract between the caller of the formatting function, and the callee (formatting function), makes it impossible to do otherwise. (See #382 (comment) ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not necessarily impossible in all implementations. A formatting function needs to be passed some amount of contextual information, such as the current locale, and it's possible to consider an implementation that also includes in that context something like an AST of the current expression. This might make sense for instance in order to enable errors in specific options to be positioned exactly in terms of source offsets.
This statement is specifying that even in such a hypothetical situation, a valid formatting function is not allowed to vary its behaviour based on the quoting style of the literal value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do errors count as "behavior"? It sounds like you're saying the error might be different based on the AST of the current expression, which suggests not treating resolved values the same independently of their presentation (to me, a different error is different behavior).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that an implementation may exist which, for reasonable reasons, does enable for a pathway to exist by which a formatting function could determine whether an option value was originally quoted or not.
For errors, I think the current spec shape of specifying the type of error is appropriate.
## Literal Resolution | ||
|
||
The resolved value of _text_, _literal_ and _nmtoken_ tokens | ||
is always a string concatenation of its parts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the "parts" here? I would think these items (text, literal and nmtoken) are part-less?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is meant to refer to the *-char and *-escape parts of text and literal, and name-char for nmtoken, as hinted by the rest of this sentence.
This is in part a follow-up from this conversation with @aphillips: #364 (comment)
The intent here is to be clear and explicit about the meaning of literal values. While putting this together, I started to think that we might need yet another section discussing other treatment of message parts than formatting. For instance, I understand us to be aligned with expecting literal values to by default be presented as non-translatable. It would be good to note this somewhere, but "formatting" isn't really the right place for it.