Address #703: make syntax/data model error fallback clear #704

aphillips · 2024-03-05T14:46:59Z

Fixes #703

This fix moves the syntax/data model error resolution text verbatim from the bottom of the pattern selection section to the formatting intro (except to replace the words 'pattern selection' with 'formatting', making it more general).

In addition, this changes uses US vs. UK spelling in the first sentence of the intro (an editorial nit) and removes the existing similar instruction about fallbacks. This is a normative change because the previous text had "MAY" for the fallback.

I also rephrased the "To start..." paragraph to be less chatty by using an imperative (this is an editorial change).

Fixes #703 This fix moves the syntax/data model error resolution text verbatim from the bottom of the pattern selection section to the formatting intro (except to replace the words 'pattern selection' with 'formatting', making it more general). In addition, this changes uses US vs. UK spelling in the first sentence of the intro (an editorial nit) and removes the existing similar instruction about fallbacks. This is a normative change because the previous text had "MAY" for the fallback. I also rephrased the "To start..." paragraph to be less chatty by using an imperative (this is an editorial change).

aphillips · 2024-03-05T14:50:31Z

We don't have a teleconference for two weeks, so this is likely to be the first change committed after issuing the tech preview. While this marks a normative change, I believe that it reflects WG consensus.

eemeli · 2024-03-05T15:26:41Z

spec/formatting.md

-an appropriate error MUST be emitted and a _fallback value_ MAY be used as the formatting result.
+To start, parse the _message_ from its syntax or create it from a data model description.
+
+If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,


Should keep the bit about emitting the error. I'm also not really sure about the "message has errors" language here; I think the previous language on this was less opinionated about error handling.

Suggested change

If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,

If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,

an appropriate error MUST be emitted and

I'm not sure what else we would really say here?

We could also do:

Suggested change

If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,

If the _message_ being formatted contains a _Syntax Error_ or _Data Model Error_

emit an appropriate error.

The result of formatting a _message_ containing such an error is a single _fallback value_

that uses the _message_'s fallback string defined in the _formatting context_

or, if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`.

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

catamorphism

I think the PR is fine if we all want to require option 1 from #703 (comment) . I personally don't have a strong opinion.

catamorphism · 2024-03-05T18:43:27Z

spec/formatting.md

+
+If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,
+an appropriate error MUST be emitted and
+the result of formatting the _message_ MUST be a pattern resolving to a single _fallback value_


This implies that no syntax errors are recoverable, right? For example, that:

.local $bar {|foo|} {{{$bar}}}

formats to � and not {$bar}.

It would be an improvement to disambiguate the spec in that way, but I'm not sure if that's the behavior everyone wants?

I'm having a hard time figuring out a situation where it would be overall beneficial to allow for messages with syntax errors to format to anything other than {�} or something like {msg-id}, in particular if that behaviour is different between implementations.

What is the process generating such a broken message that we want to support?

catamorphism · 2024-03-05T18:47:29Z

spec/formatting.md

+To start, parse the _message_ from its syntax or create it from a data model description.
+
+If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,
+an appropriate error MUST be emitted and


This seems like it's requiring eager evaluation. It's reasonable for a lazy implementation to have to report syntax errors even in expressions that are never used. But when it comes to data model errors, I'm not sure. For example, duplicate option name errors: do we want to require an implementation to error out on:

.local $foo = {horse :func one=1 two=2 one=1} {{I don't use any variables!}}

even though $foo is never referenced? Maybe this is a weird message and should always be an error. I don't have a strong opinion, I just want to make sure it's considered.

One of the things I did in my mock is to discard variables that were never used. And since they were never used, I never even tried to check them further. I don't think we need to force errors to be omitted in those cases, although it could be useful to have a 'strict' mode that emitted warnings, etc.

BTW — and this is not in any way suggesting a change in v45 — I have a somewhat different breakdown of errors, based information available at a given point. Categories 1-4 are all errors that can be handled through static analysis, before any runtime use.

Errors that can be detected syntactically, without reference to any function registry.

That includes any BNF breakages

But other failures that don't require building a model or having any access to information about functions and their options: a mismatch between number of keys in a variant and the number of selectors in the .match, no fallback variant, duplicate declarations or option names, ...

Errors that can be detected with reference to reference to the registry, but without having any input parameters

This includes illegal options, illegal option values, some illegal literals (that the function can't handle), .match with functions that are not selector functions, placeholders with functions that aren't formatting functions, etc.

Errors that require calling functions, but need no access to input parameters

Literals that can't be handled, but regex in the registry is not powerful enough to limit

Mismatches like .input {$person :personName ..} .local $date {$person :datetime ...), if the registry is not detailed enough to handle.

Errors that can be detected at call sites (in strongly-typed programming languages, depending on APIs)

There are not enough input parameters, or the types are not accepted by the functions that will be called on them.

Runtime errors, that cannot be assessed except with particular input parameters

A function can't handle a particular input datatype, etc.

@catamorphism notes:

This seems like it's requiring eager evaluation.

We have a very strong statement that we never require eager evaluation. As you note, syntax errors are not what is in question here. They are always errors and they have nothing to do with expression evaluation.

Of the data model errors, most are not related to eager vs. lazy evaluation--except duplicate option name and maybe missing selector annotation. That seems like something that should not be included in the MUST I suggest above. The others are structural errors with the message that have to be evaluated in order for pattern selection to run. This is in part why the text here was originally in the pattern selection section.

Probably we should split syntax from data model error and push the DM errors back into pattern selection where they started. (Expression evaluation already has the one for duplicate option names)

@macchiati

You're welcome to discard unused variables--including never evaluating the expressions assigned to them--in your implementation (we say this explicitly). However, of the data model errors (which are not related to anything in the function registry), as noted, only the duplication of options is really "looking inside" the expression. The missing selector annotation has to be checked because it is on the .match statement. Any message with a .match has to evaluate the selectors.

I have more thoughts, but not time tonight to delve into them.

aphillips · 2024-03-06T15:21:54Z

Looking closely this morning, I see that data model errors are tricky.

Variant Key Mismatch has specific handling in the pattern selection section. I put some error handling text there in 10481be.
Missing Fallback Variant is something we're asking for feedback on in Tech Preview. That error would be emitted in pattern selection if we keep it an error. We should permit lazy evaluation of this error.
Missing Selector Annotation has no home. We decided (after arguing at great length) to require that the annotation be visible in the message (and not to allow inferring the selector via reflection, for example). But we don't check for it anywhere to enforce it. Neither the message parsing nor pattern selection processes care.
Duplicate Declaration might be ignored if the variable in question is never used in selection or inside the resulting pattern. There should be a home for it. Currently the syntax spec says that the error is produced, but not when or what the fallback behavior is:

Variables, once declared, MUST NOT be redeclared. A message that does any of the following is not valid
and will produce a Duplicate Declaration error during processing:
Duplicate Option Name is handled in the "option resolution" section of formatting. The error is optional and non-fatal.

Note that the text at the top of formatting talks about creating a message from a data model. Parsing the data model can result in any of these errors and some of them probably can't be ignored by the implementation, so perhaps some text explicitly about creating the message from a DM representation is called for.

mihnita · 2024-03-06T18:43:45Z

spec/formatting.md

+If _Syntax Errors_ are encountered, emit those errors.
+The result of formatting a _message_ with a _Syntax Error_ is a single _fallback value_
+that uses the _message_'s fallback string defined in the _formatting context_
+or, if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`.


Nitpick:

Suggested change

or, if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`.

or, if this is not available or empty, the `U+FFFD` `REPLACEMENT CHARACTER` `�`.

The suggestion doesn't match the style we otherwise use for character names in the spec.

mihnita

Thank you!
M.

eemeli

I think differentiating the treatment of syntax and data model errors is a significant departure from our practice so far. Quoting @aphillips from the closing comment of #367 (comment) a few months ago:

Note that the categorization [as syntax or data model errors] in the spec doesn't matter (is not normative in any meaningful way).

If the intent here is to introduce such a separation, then this PR should include more explicit language stating exactly when data model errors are checked. But I don't think it should, and data model errors should continue to be treated the same as syntax errors.

Regarding Duplicate Option Name, its mention was correctly removed from formatting.md in #577, but it was accidentally re-added during conflict resolution for #593. As discussed in the former PR, it should not be emitted during formatting.

Re-reading this discussion and the spec, I think one part that the current text (written by me...) does not communicate clearly enough is that "Pattern Selection" is expected to apply to all messages, even if they only have one syntax pattern. So its current last paragraph applies in all cases, leading to the selection of a � fallback pattern for any data model errors. Changing that to only apply for Variant Key Mismatch would be a mistake.

aphillips · 2024-03-08T19:15:02Z

@eemeli I think the challenge for me is:

Syntax errors relate solely to the ABNF and parsing errors generated from that.

Data model errors are aspects of MF2 that cannot be represented strictly with the syntax. If one is constructing a message from a data model representation, data model errors will be visible as immediate failures, much like syntax errors.

However, when we parse a message from a string, we allow implementations to be lazy about evaluating expressions (including declarations). Most implementations will probably not be so lazy that this comes into play, but we go out of our way to permit it. At least some of the DM errors can avoid detection when evaluated lazily (particularly duplicate option name). The tension between eager and lazy evaluation is why a lot of my comments above exist.

In that light, I would probably go back towards an earlier version of this PR and just call out the eager/lazy problem. Along the lines of:

To start, create the message by parsing a string or creating it from a data model.
Any syntax or data model errors result in the fallback pattern.

[!NOTE]
Some types of data model error might not be detected during the
construction of the message, if the implementation chooses not to
evaluate expressions eagerly.

I remember our discussing pattern selection as a step (sometimes a quite trivial one). We should definitely make that clearer.

eemeli · 2024-03-08T23:36:25Z

However, when we parse a message from a string, we allow implementations to be lazy about evaluating expressions (including declarations). Most implementations will probably not be so lazy that this comes into play, but we go out of our way to permit it. At least some of the DM errors can avoid detection when evaluated lazily (particularly duplicate option name). The tension between eager and lazy evaluation is why a lot of my comments above exist.

As I also mention in #710 (comment), it's important to note that we only allow laziness in the evaluation of expressions, and not in their representation. Data model errors are not included in the ones that we allow to be ignored:

message-format-wg/spec/errors.md

Lines 19 to 21 in e761964

    
           _Resolution Errors_ and _Formatting Errors_ in _expressions_ that are not used 
        
           in _pattern selection_ or _formatting_ MAY be ignored, 
        
           as they do not affect the output of the formatter.

aphillips · 2024-03-09T00:40:24Z

@catamorphism @mihnita Do you check data model errors (I do not mean "do you create a data model" but rather "do you check for the specific message errors listed under that heading in errors.md") early?

catamorphism · 2024-03-09T01:06:30Z

@catamorphism @mihnita Do you check data model errors (I do not mean "do you create a data model" but rather "do you check for the specific message errors listed under that heading in errors.md") early?

I have a separate checking pass that runs after parsing but before formatting that checks for all of the data model errors except "Duplicate Option Name" (which is easiest to check during parsing).

In my implementation, the presence of data model errors doesn't stop formatting from running. This was based on an earlier version of the tests that specified expected output for some messages that had data model errors. It was quite hard to implement this behavior and it would simplify the code if both syntax errors and data model errors were non-recoverable ("non-recoverable" = the whole message gets replaced with a single fallback, rather than trying to present partial input).

Sorry, didn't have time to follow the rest of the discussion since I last commented!

catamorphism · 2024-03-09T01:11:11Z

By the way, my implementation is lazy but it does statically check data model errors, because those errors are easiest to check in a separate pass. I don't see those two things as in conflict with each other, which I think is in line with what @eemeli said.

mihnita · 2024-03-09T17:57:35Z

discard variables that were never used

Might be a good idea, or not.
It depends when it is done.

For example:

.input {$firstName :string}
.input {$lastName :string}
{{Hello {$firstName}!}}

This can be interpreted as a "contract" from the developer that in the arguments passed to MF2 there will be a lastName argument, so a Japanese translator can translate as {{Hello {$lastName}!}}

They can probably be dropped at compile time, or runtime.

I think that specifying exactly WHEN (at what stage) validation happens should not be in the spec.

mihnita · 2024-03-09T18:18:37Z

I currently validate data model post-parse, but before formatting.
When the user gets the data model it is already validated.
From the outside it looks like it happens at parse time, although it is not.
I guess this is similar to what Tim does.

If we change the data model from an array of declarations to a map of declarations (as proposed by #718) that would make (some) validation happen earlier.
When we build the model, not after.

This tells me that we should try to specify what kind of errors are emitted, but not when.

catamorphism · 2024-03-11T18:19:49Z

Just a note that if this PR lands, then the text that was deleted in #710 should be replaced with something like "Assert that the option's identifier does not already exist in the resolved mapping of options." (That could be in this PR or a subsequent one.)

catamorphism

Doesn't fully address data model errors, as I noted, but I think what's here is consistent and it would be fine to address that in a future issue.

catamorphism · 2024-03-11T18:38:49Z

spec/formatting.md

-an appropriate error MUST be emitted and a _fallback value_ MAY be used as the formatting result.
+To start, parse the _message_ from its syntax or create it from a data model description.
+
+If _Syntax Errors_ are encountered, emit those errors.


It seems like the deletion of "data model errors" makes it implementation-dependent whether data model errors are recoverable, or whether they should be treated exactly like syntax errors.

That's fine, but I would support opening a separate issue to discuss whether that's what we want, or whether the spec should require data model errors to be non-recoverable, as there seem to be different opinions.

catamorphism · 2024-03-11T18:40:11Z

spec/formatting.md

@@ -466,6 +468,10 @@ according to their _key_ values and selecting the first one.
 > because no matching _variant_ is available.

 The number of _keys_ in each _variant_ MUST equal the number of _selectors_.
+If it does not, emit a _Variant Key Mismatch_ error


This text also seems to written to allow the implementation to choose whether to recover from data model errors (an implementation that checks for them statically would never reach this point and would just use the fallback value for the result of formatting, and then this would be an assertion rather than a dynamic error).

Which is also fine, but reinforces my suggestion to open a separate issue to discuss that.

I created #735 so that, if this PR lands, we don't forget to discuss data model errors.

aphillips · 2024-05-12T16:16:44Z

In the 2024-05-06 call we decided to close this without prejudice.

aphillips added normative Issue affects normative text in the specification formatting Issue pertains to the formatting section of the spec labels Mar 5, 2024

aphillips requested review from stasm, catamorphism, eemeli, echeran, gibson042, mihnita and macchiati March 5, 2024 14:46

eemeli reviewed Mar 5, 2024

View reviewed changes

Update spec/formatting.md

078018e

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

catamorphism reviewed Mar 5, 2024

View reviewed changes

Move data model errors back into pattern selection at al

10481be

aphillips requested review from eemeli and catamorphism March 6, 2024 15:24

aphillips added the LDML46 LDML46 Release (Tech Preview - October 2024) label Mar 6, 2024

aphillips mentioned this pull request Mar 6, 2024

We need a list of possible error codes #706

Closed

mihnita reviewed Mar 6, 2024

View reviewed changes

mihnita approved these changes Mar 6, 2024

View reviewed changes

eemeli requested changes Mar 7, 2024

View reviewed changes

eemeli mentioned this pull request Mar 7, 2024

Clarify (again) that Duplicate Option Name is a data model error #710

Merged

catamorphism approved these changes Mar 11, 2024

View reviewed changes

eemeli mentioned this pull request May 8, 2024

Refactor errors, adding section for Message Function Errors #774

Merged

aphillips closed this May 12, 2024

aphillips deleted the aphillips-syntax-fallback branch May 15, 2024 00:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address #703: make syntax/data model error fallback clear #704

Address #703: make syntax/data model error fallback clear #704

aphillips commented Mar 5, 2024

aphillips commented Mar 5, 2024

eemeli Mar 5, 2024

aphillips Mar 5, 2024

catamorphism left a comment

catamorphism Mar 5, 2024

eemeli Mar 6, 2024

catamorphism Mar 5, 2024

macchiati Mar 5, 2024

aphillips Mar 6, 2024

aphillips commented Mar 6, 2024 •

edited

Loading

mihnita Mar 6, 2024

eemeli Mar 7, 2024

mihnita left a comment

eemeli left a comment

aphillips commented Mar 8, 2024

eemeli commented Mar 8, 2024

aphillips commented Mar 9, 2024

catamorphism commented Mar 9, 2024

catamorphism commented Mar 9, 2024

mihnita commented Mar 9, 2024 •

edited

Loading

mihnita commented Mar 9, 2024 •

edited

Loading

catamorphism commented Mar 11, 2024

catamorphism left a comment

catamorphism Mar 11, 2024

catamorphism Mar 11, 2024

catamorphism Mar 19, 2024

aphillips commented May 12, 2024

	If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,
	If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,
	an appropriate error MUST be emitted and

	or, if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`.
	or, if this is not available or empty, the `U+FFFD` `REPLACEMENT CHARACTER` `�`.

Address #703: make syntax/data model error fallback clear #704

Address #703: make syntax/data model error fallback clear #704

Conversation

aphillips commented Mar 5, 2024

aphillips commented Mar 5, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

catamorphism left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aphillips commented Mar 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mihnita left a comment

Choose a reason for hiding this comment

eemeli left a comment

Choose a reason for hiding this comment

aphillips commented Mar 8, 2024

eemeli commented Mar 8, 2024

aphillips commented Mar 9, 2024

catamorphism commented Mar 9, 2024

catamorphism commented Mar 9, 2024

mihnita commented Mar 9, 2024 • edited Loading

mihnita commented Mar 9, 2024 • edited Loading

catamorphism commented Mar 11, 2024

catamorphism left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aphillips commented May 12, 2024

aphillips commented Mar 6, 2024 •

edited

Loading

mihnita commented Mar 9, 2024 •

edited

Loading

mihnita commented Mar 9, 2024 •

edited

Loading