Add Literal Resolution section to formatting.md #382

eemeli · 2023-05-17T14:51:54Z

This is in part a follow-up from this conversation with @aphillips: #364 (comment)

The intent here is to be clear and explicit about the meaning of literal values. While putting this together, I started to think that we might need yet another section discussing other treatment of message parts than formatting. For instance, I understand us to be aligned with expecting literal values to by default be presented as non-translatable. It would be good to note this somewhere, but "formatting" isn't really the right place for it.

eemeli · 2023-05-17T14:52:23Z

spec/formatting.md

@@ -8,6 +8,15 @@ when formatting a message for display in a user interface, or for some later pro
 The document is part of the MessageFormat 2.0 specification,
 the successor to ICU MessageFormat, henceforth called ICU MessageFormat 1.0.

+## Literal Resolution
+
+The resolved value of _text_, _literal_ and _nmtoken_ tokens


This will need to include unquoted if/once #364 is accepted.

The word "value" seems to be getting used in multiple ways in this paragraph.

The first sentence refers to "the resolved value of text, literal and nmtoken tokens"; if resolution is the relation that maps character strings in the language defined by the ABNF onto values, then I understand "value" as being used semantically here. The spec doesn't (yet?) define criteria for membership in this set of semantic values, not in the precise way that the ABNF defines membership in the set of syntactically valid messages.

However, the next sentence refers to an "option value", which I take as being a syntactic concept: the token that appears on the right-hand side of the '=' in the option nonterminal into the ABNF.

Defining "value" and "resolution" before these terms are used, and replacing "or option value" with "on the right-hand side of an option", might help clarify things. (This could be done in the glossary, which uses "value" many times without defining it (possibly not always with the same meaning), and doesn't define "resolution", and cross-referenced here; could be in a future PR.)

The intended meaning of "resolved value" here is the value that will ultimately get formatted. So for an unquoted literal 42, it would be the string '42', while for a quoted literal |foo\|bar|, it would be the string 'foo|bar'. For a variable reference $foo, it would be the value of the variable, which could really be anything.

My intent would be to explain this term as a part of the bigger formatting PR I'm now working on.

Renaming "option value" does sound like a good idea.

As long as "resolved"/"resolution" are defined in the bigger PR, I'm fine with leaving those terms undefined in this one.

aphillips · 2023-05-18T06:01:41Z

spec/formatting.md

+The resolved value of _text_, _literal_ and _nmtoken_ tokens
+is always a string concatenation of its parts,
+with escape sequences resolving to their escaped characters.
+When a literal value is used as a formatting function argument or option value,


Because this is a normative requirement, I would suggest making this its own paragraph.

I think the only non-normative part here is the "... such that e.g. ..." example. Would it be better if that were separated into its own paragraph?

I'm saying something different, which is: make each normative requirement start its own paragraph (unless there is a good reason not to). That makes it easier to find each requirement and check that, for example, it has tests or is complied to by one's implementation.

aphillips · 2023-05-18T06:29:27Z

spec/formatting.md

+the formatting function MUST treat option values the same independently of their presentation,
+such that e.g. the options `foo=42` and `foo=|42|` have the same effect.


I think this might not be correct or that it has the potential to be incorrect?

The reason we define nmtoken separate from literal (or unquoted) is because it fulfills a separate role. For example, in a plural selector, the key values can include nmtoken values from an array of keywords (zero, one, two, few, many, other/*) and separately certain literal values. A formatter such as the number formatter might accept a number of nmtoken keyword arguments (integer, percent) but might also allow literals in the same argument.

Admittedly I can't think of a use case like this currently and it would be a poor formatter design that depended solely on the difference between integer and |integer| to know if the value were meant to be a token or a string. In fact, the more I look at this, the more I tend to think that nmtoken might need to go (reserving literal values as keywords in key and option could be done in the registry)? It would certainly simplify the ABNF.

The downside of removing nmtoken is that the nmtoken production allows key values that start with - and :, such as when -42 or when :foo, without quoting the values (|-42| and |:foo|). I don't care about : (in fact, I think it's confusing to allow it), but the minus sign feels important for operating with numbers.

In fact, the more I look at this, the more I tend to think that nmtoken might need to go (reserving literal values as keywords in key and option could be done in the registry)? It would certainly simplify the ABNF.

Could you clarify whether you're suggesting that the registry ought to influence what's valid syntax? That would be rather problematic.

I am not suggesting that the registry would influence what is valid syntax at the ABNF level. What I am suggesting is that the registry might determine what options are valid for a given formatter or selector, e.g. when few is valid (or perhaps "valid") for :plural but when fex is not, even though both are syntactically valid. (In fact, according to formatting.md, it is a selector error).

Another way of saying what I'm saying above is that the key of a when statement is always a literal, whose interpretation is selector-specific.

Agreed. That's included in the current registry PR #368 as e.g. <match values="one other"/>.

message-format-wg/spec/registry.dtd

Lines 34 to 36 in 39f13a9

<!ELEMENT match EMPTY>

<!ATTLIST match values NMTOKENS #IMPLIED>

<!ATTLIST match pattern NMTOKEN #IMPLIED>

catamorphism · 2023-05-22T20:45:07Z

spec/formatting.md

+is always a string concatenation of its parts,
+with escape sequences resolving to their escaped characters.
+When a literal value is used as a formatting function argument or option value,
+the formatting function MUST treat option values the same independently of their presentation,


Another way of saying this -- if a definition of the term "value" (that is, the range of the resolution function whose domain is the formal language defined by the ABNF) is added -- is that the formatting function is defined on semantic values rather than syntactically. In other words, the resolution relation maps both 42 and |42| to the same semantic value; if semantic values are the domain of the formatting function, then it's impossible by construction for the formatting function to distinguish the two. (Unless the value domain is defined so that 42 and |42| are distinguishable, but even in that case, writing down the meta-language that describes the value domain would make it easier to define where the formatting function should treat different values as equivalent to each other.)

Going a bit further, I sensed on our call yesterday for us to have a consensus on formatting functions not being able to determine if an option has been set from a literal or variable. As in, a formatting function receiving a value '42' for the foo option would not know if the message had set foo=42, foo=|42|, or foo=$bar where $bar had a value of '42'.

Then, given the consensus, I think it would make sense to rephrase this as a requirement on whatever calls the functions (the formatter?), not on the functions itself. It's true that the formatting function must treat option values the same independently of their presentation, but it's also impossible for it to do otherwise!

A way to phrase it might be that the domain of the formatting function is resolved values, and then if you wanted to add examples, one example could be that 42, |42|, and $bar all map to the same resolved value (assuming an environment in which bar is bound to 42), specifically '42'.

catamorphism · 2023-05-22T20:52:48Z

spec/formatting.md

+The resolved value of _text_, _literal_ and _nmtoken_ tokens
+is always a string concatenation of its parts,
+with escape sequences resolving to their escaped characters.
+When a literal value is used as a formatting function argument or option value,


I'm also not sure whether the word "value" is meant to be syntactic or semantic in "a literal value". It might be clearer (if more verbose) to write:

"When a text, literal, or nmtoken token is used as a formatting function argument or option value..."

The current wording, if read literally, implies that the requirement to not distinguish between different syntactic constructs with the same meaning only implies to literals. Which would be confusing, since I don't think it's possible to write two lexically distinct literals with the same meaning.

If #364 is accepted, this should clear up a bit as the ABNF literal becomes the only thing this applies to. But yes, also here should avoid the dread term "value".

eemeli · 2023-05-23T12:46:25Z

@aphillips @catamorphism I've updated the PR following your suggestions; could you re-review and potentially close any/all of the above discussions, or clarify if there are further changes you'd like to see?

catamorphism

There's just one change I would still request (left as a line comment just now). Everything else looks good now! I don't think that the Github UI gives me the ability to close the previous discussions because I submitted them as "comments" rather than "request changes", but if you have that ability, it's fine to close them.

catamorphism · 2023-05-23T22:09:11Z

spec/formatting.md

+with escape sequences resolving to their escaped characters.
+When a _literal_ or _nmtoken_ is used as an _expression_ argument
+or on the right-hand side of an _option_,
+the formatting function MUST treat their resolved values the same independently of their presentation,


I think it's worth re-wording this to clarify that the contract between the caller of the formatting function, and the callee (formatting function), makes it impossible to do otherwise. (See #382 (comment) ).

It's not necessarily impossible in all implementations. A formatting function needs to be passed some amount of contextual information, such as the current locale, and it's possible to consider an implementation that also includes in that context something like an AST of the current expression. This might make sense for instance in order to enable errors in specific options to be positioned exactly in terms of source offsets.

This statement is specifying that even in such a hypothetical situation, a valid formatting function is not allowed to vary its behaviour based on the quoting style of the literal value.

Do errors count as "behavior"? It sounds like you're saying the error might be different based on the AST of the current expression, which suggests not treating resolved values the same independently of their presentation (to me, a different error is different behavior).

I meant that an implementation may exist which, for reasonable reasons, does enable for a pathway to exist by which a formatting function could determine whether an option value was originally quoted or not.

For errors, I think the current spec shape of specifying the type of error is appropriate.

mihnita · 2023-06-05T16:14:35Z

spec/formatting.md

+## Literal Resolution
+
+The resolved value of _text_, _literal_ and _nmtoken_ tokens
+is always a string concatenation of its parts,


What are the "parts" here? I would think these items (text, literal and nmtoken) are part-less?

This is meant to refer to the *-char and *-escape parts of text and literal, and name-char for nmtoken, as hinted by the rest of this sentence.

Add Literal Resolution section to formatting.md

c481396

eemeli added spec-text Agenda+ Requested for upcoming teleconference labels May 17, 2023

eemeli requested review from aphillips, stasm, zbraniecki, echeran and mihnita May 17, 2023 14:51

eemeli commented May 17, 2023

View reviewed changes

aphillips reviewed May 18, 2023

View reviewed changes

eemeli mentioned this pull request May 19, 2023

Replace nmtoken with unquoted #364

Merged

catamorphism reviewed May 22, 2023

View reviewed changes

Apply suggestions from code review

4e02803

eemeli requested review from aphillips and catamorphism May 23, 2023 12:46

eemeli mentioned this pull request May 23, 2023

How are variant keys decoded? #384

Closed

catamorphism requested changes May 23, 2023

View reviewed changes

catamorphism mentioned this pull request May 23, 2023

Clarifications to pattern selection #385

Merged

mihnita reviewed Jun 5, 2023

View reviewed changes

aphillips merged commit 4e33d64 into unicode-org:main Jun 5, 2023

eemeli deleted the resolve-literals branch June 5, 2023 19:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Literal Resolution section to formatting.md #382

Add Literal Resolution section to formatting.md #382

eemeli commented May 17, 2023

eemeli May 17, 2023

catamorphism May 22, 2023 •

edited

Loading

eemeli May 23, 2023

catamorphism May 23, 2023

aphillips May 18, 2023

eemeli May 18, 2023

aphillips May 18, 2023

aphillips May 18, 2023

eemeli May 18, 2023

aphillips May 18, 2023

eemeli May 19, 2023

catamorphism May 22, 2023 •

edited

Loading

eemeli May 23, 2023

catamorphism May 23, 2023

catamorphism May 22, 2023

eemeli May 23, 2023

eemeli commented May 23, 2023

catamorphism left a comment

catamorphism May 23, 2023

eemeli Jun 5, 2023

catamorphism Jun 5, 2023

eemeli Jun 5, 2023

mihnita Jun 5, 2023

eemeli Jun 5, 2023

		the formatting function MUST treat option values the same independently of their presentation,
		such that e.g. the options `foo=42` and `foo=\|42\|` have the same effect.

	<!ELEMENT match EMPTY>
	<!ATTLIST match values NMTOKENS #IMPLIED>
	<!ATTLIST match pattern NMTOKEN #IMPLIED>

Add Literal Resolution section to formatting.md #382

Add Literal Resolution section to formatting.md #382

Conversation

eemeli commented May 17, 2023

Choose a reason for hiding this comment

catamorphism May 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

catamorphism May 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eemeli commented May 23, 2023

catamorphism left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

catamorphism May 22, 2023 •

edited

Loading

catamorphism May 22, 2023 •

edited

Loading