-
-
Notifications
You must be signed in to change notification settings - Fork 36
Address #703: make syntax/data model error fallback clear #704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes #703 This fix moves the syntax/data model error resolution text verbatim from the bottom of the pattern selection section to the formatting intro (except to replace the words 'pattern selection' with 'formatting', making it more general). In addition, this changes uses US vs. UK spelling in the first sentence of the intro (an editorial nit) and removes the existing similar instruction about fallbacks. This is a normative change because the previous text had "MAY" for the fallback. I also rephrased the "To start..." paragraph to be less chatty by using an imperative (this is an editorial change).
We don't have a teleconference for two weeks, so this is likely to be the first change committed after issuing the tech preview. While this marks a normative change, I believe that it reflects WG consensus. |
spec/formatting.md
Outdated
an appropriate error MUST be emitted and a _fallback value_ MAY be used as the formatting result. | ||
To start, parse the _message_ from its syntax or create it from a data model description. | ||
|
||
If the message being formatted has any _Syntax Errors_ or _Data Model Errors_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should keep the bit about emitting the error. I'm also not really sure about the "message has errors" language here; I think the previous language on this was less opinionated about error handling.
If the message being formatted has any _Syntax Errors_ or _Data Model Errors_, | |
If the message being formatted has any _Syntax Errors_ or _Data Model Errors_, | |
an appropriate error MUST be emitted and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what else we would really say here?
We could also do:
If the message being formatted has any _Syntax Errors_ or _Data Model Errors_, | |
If the _message_ being formatted contains a _Syntax Error_ or _Data Model Error_ | |
emit an appropriate error. | |
The result of formatting a _message_ containing such an error is a single _fallback value_ | |
that uses the _message_'s fallback string defined in the _formatting context_ | |
or, if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`. |
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the PR is fine if we all want to require option 1 from #703 (comment) . I personally don't have a strong opinion.
spec/formatting.md
Outdated
|
||
If the message being formatted has any _Syntax Errors_ or _Data Model Errors_, | ||
an appropriate error MUST be emitted and | ||
the result of formatting the _message_ MUST be a pattern resolving to a single _fallback value_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies that no syntax errors are recoverable, right? For example, that:
.local $bar {|foo|} {{{$bar}}}
formats to �
and not {$bar}
.
It would be an improvement to disambiguate the spec in that way, but I'm not sure if that's the behavior everyone wants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a hard time figuring out a situation where it would be overall beneficial to allow for messages with syntax errors to format to anything other than {�}
or something like {msg-id}
, in particular if that behaviour is different between implementations.
What is the process generating such a broken message that we want to support?
spec/formatting.md
Outdated
To start, parse the _message_ from its syntax or create it from a data model description. | ||
|
||
If the message being formatted has any _Syntax Errors_ or _Data Model Errors_, | ||
an appropriate error MUST be emitted and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it's requiring eager evaluation. It's reasonable for a lazy implementation to have to report syntax errors even in expressions that are never used. But when it comes to data model errors, I'm not sure. For example, duplicate option name errors: do we want to require an implementation to error out on:
.local $foo = {horse :func one=1 two=2 one=1}
{{I don't use any variables!}}
even though $foo
is never referenced? Maybe this is a weird message and should always be an error. I don't have a strong opinion, I just want to make sure it's considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the things I did in my mock is to discard variables that were never used. And since they were never used, I never even tried to check them further. I don't think we need to force errors to be omitted in those cases, although it could be useful to have a 'strict' mode that emitted warnings, etc.
BTW — and this is not in any way suggesting a change in v45 — I have a somewhat different breakdown of errors, based information available at a given point. Categories 1-4 are all errors that can be handled through static analysis, before any runtime use.
- Errors that can be detected syntactically, without reference to any function registry.
- That includes any BNF breakages
- But other failures that don't require building a model or having any access to information about functions and their options: a mismatch between number of keys in a variant and the number of selectors in the .match, no fallback variant, duplicate declarations or option names, ...
- Errors that can be detected with reference to reference to the registry, but without having any input parameters
- This includes illegal options, illegal option values, some illegal literals (that the function can't handle), .match with functions that are not selector functions, placeholders with functions that aren't formatting functions, etc.
- Errors that require calling functions, but need no access to input parameters
- Literals that can't be handled, but regex in the registry is not powerful enough to limit
- Mismatches like .input {$person :personName ..} .local $date {$person :datetime ...), if the registry is not detailed enough to handle.
- Errors that can be detected at call sites (in strongly-typed programming languages, depending on APIs)
- There are not enough input parameters, or the types are not accepted by the functions that will be called on them.
- Runtime errors, that cannot be assessed except with particular input parameters
- A function can't handle a particular input datatype, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@catamorphism notes:
This seems like it's requiring eager evaluation.
We have a very strong statement that we never require eager evaluation. As you note, syntax errors are not what is in question here. They are always errors and they have nothing to do with expression evaluation.
Of the data model errors, most are not related to eager vs. lazy evaluation--except duplicate option name and maybe missing selector annotation. That seems like something that should not be included in the MUST I suggest above. The others are structural errors with the message that have to be evaluated in order for pattern selection to run. This is in part why the text here was originally in the pattern selection section.
Probably we should split syntax from data model error and push the DM errors back into pattern selection where they started. (Expression evaluation already has the one for duplicate option names)
You're welcome to discard unused variables--including never evaluating the expressions assigned to them--in your implementation (we say this explicitly). However, of the data model errors (which are not related to anything in the function registry), as noted, only the duplication of options is really "looking inside" the expression. The missing selector annotation has to be checked because it is on the .match
statement. Any message with a .match
has to evaluate the selectors.
I have more thoughts, but not time tonight to delve into them.
Looking closely this morning, I see that data model errors are tricky.
Note that the text at the top of formatting talks about creating a message from a data model. Parsing the data model can result in any of these errors and some of them probably can't be ignored by the implementation, so perhaps some text explicitly about creating the message from a DM representation is called for. |
If _Syntax Errors_ are encountered, emit those errors. | ||
The result of formatting a _message_ with a _Syntax Error_ is a single _fallback value_ | ||
that uses the _message_'s fallback string defined in the _formatting context_ | ||
or, if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick:
or, if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`. | |
or, if this is not available or empty, the `U+FFFD` `REPLACEMENT CHARACTER` `�`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestion doesn't match the style we otherwise use for character names in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
M.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think differentiating the treatment of syntax and data model errors is a significant departure from our practice so far. Quoting @aphillips from the closing comment of #367 (comment) a few months ago:
Note that the categorization [as syntax or data model errors] in the spec doesn't matter (is not normative in any meaningful way).
If the intent here is to introduce such a separation, then this PR should include more explicit language stating exactly when data model errors are checked. But I don't think it should, and data model errors should continue to be treated the same as syntax errors.
Regarding Duplicate Option Name, its mention was correctly removed from formatting.md
in #577, but it was accidentally re-added during conflict resolution for #593. As discussed in the former PR, it should not be emitted during formatting.
Re-reading this discussion and the spec, I think one part that the current text (written by me...) does not communicate clearly enough is that "Pattern Selection" is expected to apply to all messages, even if they only have one syntax pattern. So its current last paragraph applies in all cases, leading to the selection of a �
fallback pattern for any data model errors. Changing that to only apply for Variant Key Mismatch would be a mistake.
@eemeli I think the challenge for me is: Syntax errors relate solely to the ABNF and parsing errors generated from that. Data model errors are aspects of MF2 that cannot be represented strictly with the syntax. If one is constructing a message from a data model representation, data model errors will be visible as immediate failures, much like syntax errors. However, when we parse a message from a string, we allow implementations to be lazy about evaluating expressions (including declarations). Most implementations will probably not be so lazy that this comes into play, but we go out of our way to permit it. At least some of the DM errors can avoid detection when evaluated lazily (particularly duplicate option name). The tension between eager and lazy evaluation is why a lot of my comments above exist. In that light, I would probably go back towards an earlier version of this PR and just call out the eager/lazy problem. Along the lines of:
I remember our discussing pattern selection as a step (sometimes a quite trivial one). We should definitely make that clearer. |
As I also mention in #710 (comment), it's important to note that we only allow laziness in the evaluation of expressions, and not in their representation. Data model errors are not included in the ones that we allow to be ignored: message-format-wg/spec/errors.md Lines 19 to 21 in e761964
|
@catamorphism @mihnita Do you check data model errors (I do not mean "do you create a data model" but rather "do you check for the specific message errors listed under that heading in errors.md") early? |
I have a separate checking pass that runs after parsing but before formatting that checks for all of the data model errors except "Duplicate Option Name" (which is easiest to check during parsing). In my implementation, the presence of data model errors doesn't stop formatting from running. This was based on an earlier version of the tests that specified expected output for some messages that had data model errors. It was quite hard to implement this behavior and it would simplify the code if both syntax errors and data model errors were non-recoverable ("non-recoverable" = the whole message gets replaced with a single fallback, rather than trying to present partial input). Sorry, didn't have time to follow the rest of the discussion since I last commented! |
By the way, my implementation is lazy but it does statically check data model errors, because those errors are easiest to check in a separate pass. I don't see those two things as in conflict with each other, which I think is in line with what @eemeli said. |
Might be a good idea, or not. For example:
This can be interpreted as a "contract" from the developer that in the arguments passed to MF2 there will be a They can probably be dropped at compile time, or runtime. I think that specifying exactly WHEN (at what stage) validation happens should not be in the spec. |
I currently validate data model post-parse, but before formatting. If we change the data model from an array of declarations to a map of declarations (as proposed by #718) that would make (some) validation happen earlier. This tells me that we should try to specify what kind of errors are emitted, but not when. |
Just a note that if this PR lands, then the text that was deleted in #710 should be replaced with something like "Assert that the option's identifier does not already exist in the resolved mapping of options." (That could be in this PR or a subsequent one.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't fully address data model errors, as I noted, but I think what's here is consistent and it would be fine to address that in a future issue.
an appropriate error MUST be emitted and a _fallback value_ MAY be used as the formatting result. | ||
To start, parse the _message_ from its syntax or create it from a data model description. | ||
|
||
If _Syntax Errors_ are encountered, emit those errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the deletion of "data model errors" makes it implementation-dependent whether data model errors are recoverable, or whether they should be treated exactly like syntax errors.
That's fine, but I would support opening a separate issue to discuss whether that's what we want, or whether the spec should require data model errors to be non-recoverable, as there seem to be different opinions.
@@ -466,6 +468,10 @@ according to their _key_ values and selecting the first one. | |||
> because no matching _variant_ is available. | |||
|
|||
The number of _keys_ in each _variant_ MUST equal the number of _selectors_. | |||
If it does not, emit a _Variant Key Mismatch_ error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This text also seems to written to allow the implementation to choose whether to recover from data model errors (an implementation that checks for them statically would never reach this point and would just use the fallback value for the result of formatting, and then this would be an assertion rather than a dynamic error).
Which is also fine, but reinforces my suggestion to open a separate issue to discuss that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #735 so that, if this PR lands, we don't forget to discuss data model errors.
In the 2024-05-06 call we decided to close this without prejudice. |
Fixes #703
This fix moves the syntax/data model error resolution text verbatim from the bottom of the pattern selection section to the formatting intro (except to replace the words 'pattern selection' with 'formatting', making it more general).
In addition, this changes uses US vs. UK spelling in the first sentence of the intro (an editorial nit) and removes the existing similar instruction about fallbacks. This is a normative change because the previous text had "MAY" for the fallback.
I also rephrased the "To start..." paragraph to be less chatty by using an imperative (this is an editorial change).