Skip to content

Address #703: make syntax/data model error fallback clear #704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

aphillips
Copy link
Member

Fixes #703

This fix moves the syntax/data model error resolution text verbatim from the bottom of the pattern selection section to the formatting intro (except to replace the words 'pattern selection' with 'formatting', making it more general).

In addition, this changes uses US vs. UK spelling in the first sentence of the intro (an editorial nit) and removes the existing similar instruction about fallbacks. This is a normative change because the previous text had "MAY" for the fallback.

I also rephrased the "To start..." paragraph to be less chatty by using an imperative (this is an editorial change).

Fixes #703 

This fix moves the syntax/data model error resolution text verbatim from the bottom of the pattern selection section to the formatting intro (except to replace the words 'pattern selection' with 'formatting', making it more general).

In addition, this changes uses US vs. UK spelling in the first sentence of the intro (an editorial nit) and removes the existing similar instruction about fallbacks. This is a normative change because the previous text had "MAY" for the fallback.

I also rephrased the "To start..." paragraph to be less chatty by using an imperative (this is an editorial change).
@aphillips aphillips added normative Issue affects normative text in the specification formatting Issue pertains to the formatting section of the spec labels Mar 5, 2024
@aphillips
Copy link
Member Author

We don't have a teleconference for two weeks, so this is likely to be the first change committed after issuing the tech preview. While this marks a normative change, I believe that it reflects WG consensus.

an appropriate error MUST be emitted and a _fallback value_ MAY be used as the formatting result.
To start, parse the _message_ from its syntax or create it from a data model description.

If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should keep the bit about emitting the error. I'm also not really sure about the "message has errors" language here; I think the previous language on this was less opinionated about error handling.

Suggested change
If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,
If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,
an appropriate error MUST be emitted and

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what else we would really say here?

We could also do:

Suggested change
If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,
If the _message_ being formatted contains a _Syntax Error_ or _Data Model Error_
emit an appropriate error.
The result of formatting a _message_ containing such an error is a single _fallback value_
that uses the _message_'s fallback string defined in the _formatting context_
or, if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER ``.

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Copy link
Collaborator

@catamorphism catamorphism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR is fine if we all want to require option 1 from #703 (comment) . I personally don't have a strong opinion.


If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,
an appropriate error MUST be emitted and
the result of formatting the _message_ MUST be a pattern resolving to a single _fallback value_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implies that no syntax errors are recoverable, right? For example, that:

.local $bar {|foo|} {{{$bar}}}

formats to and not {$bar}.

It would be an improvement to disambiguate the spec in that way, but I'm not sure if that's the behavior everyone wants?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time figuring out a situation where it would be overall beneficial to allow for messages with syntax errors to format to anything other than {�} or something like {msg-id}, in particular if that behaviour is different between implementations.

What is the process generating such a broken message that we want to support?

To start, parse the _message_ from its syntax or create it from a data model description.

If the message being formatted has any _Syntax Errors_ or _Data Model Errors_,
an appropriate error MUST be emitted and
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it's requiring eager evaluation. It's reasonable for a lazy implementation to have to report syntax errors even in expressions that are never used. But when it comes to data model errors, I'm not sure. For example, duplicate option name errors: do we want to require an implementation to error out on:

.local $foo = {horse :func one=1 two=2 one=1}
{{I don't use any variables!}}

even though $foo is never referenced? Maybe this is a weird message and should always be an error. I don't have a strong opinion, I just want to make sure it's considered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the things I did in my mock is to discard variables that were never used. And since they were never used, I never even tried to check them further. I don't think we need to force errors to be omitted in those cases, although it could be useful to have a 'strict' mode that emitted warnings, etc.

BTW — and this is not in any way suggesting a change in v45 — I have a somewhat different breakdown of errors, based information available at a given point. Categories 1-4 are all errors that can be handled through static analysis, before any runtime use.

  1. Errors that can be detected syntactically, without reference to any function registry.
  • That includes any BNF breakages
  • But other failures that don't require building a model or having any access to information about functions and their options: a mismatch between number of keys in a variant and the number of selectors in the .match, no fallback variant, duplicate declarations or option names, ...
  1. Errors that can be detected with reference to reference to the registry, but without having any input parameters
  • This includes illegal options, illegal option values, some illegal literals (that the function can't handle), .match with functions that are not selector functions, placeholders with functions that aren't formatting functions, etc.
  1. Errors that require calling functions, but need no access to input parameters
  • Literals that can't be handled, but regex in the registry is not powerful enough to limit
  • Mismatches like .input {$person :personName ..} .local $date {$person :datetime ...), if the registry is not detailed enough to handle.
  1. Errors that can be detected at call sites (in strongly-typed programming languages, depending on APIs)
    • There are not enough input parameters, or the types are not accepted by the functions that will be called on them.
  2. Runtime errors, that cannot be assessed except with particular input parameters
  • A function can't handle a particular input datatype, etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@catamorphism notes:

This seems like it's requiring eager evaluation.

We have a very strong statement that we never require eager evaluation. As you note, syntax errors are not what is in question here. They are always errors and they have nothing to do with expression evaluation.

Of the data model errors, most are not related to eager vs. lazy evaluation--except duplicate option name and maybe missing selector annotation. That seems like something that should not be included in the MUST I suggest above. The others are structural errors with the message that have to be evaluated in order for pattern selection to run. This is in part why the text here was originally in the pattern selection section.

Probably we should split syntax from data model error and push the DM errors back into pattern selection where they started. (Expression evaluation already has the one for duplicate option names)

@macchiati

You're welcome to discard unused variables--including never evaluating the expressions assigned to them--in your implementation (we say this explicitly). However, of the data model errors (which are not related to anything in the function registry), as noted, only the duplication of options is really "looking inside" the expression. The missing selector annotation has to be checked because it is on the .match statement. Any message with a .match has to evaluate the selectors.

I have more thoughts, but not time tonight to delve into them.

@aphillips
Copy link
Member Author

aphillips commented Mar 6, 2024

Looking closely this morning, I see that data model errors are tricky.

  • Variant Key Mismatch has specific handling in the pattern selection section. I put some error handling text there in 10481be.
  • Missing Fallback Variant is something we're asking for feedback on in Tech Preview. That error would be emitted in pattern selection if we keep it an error. We should permit lazy evaluation of this error.
  • Missing Selector Annotation has no home. We decided (after arguing at great length) to require that the annotation be visible in the message (and not to allow inferring the selector via reflection, for example). But we don't check for it anywhere to enforce it. Neither the message parsing nor pattern selection processes care.
  • Duplicate Declaration might be ignored if the variable in question is never used in selection or inside the resulting pattern. There should be a home for it. Currently the syntax spec says that the error is produced, but not when or what the fallback behavior is:

    Variables, once declared, MUST NOT be redeclared. A message that does any of the following is not valid
    and will produce a Duplicate Declaration error during processing:

  • Duplicate Option Name is handled in the "option resolution" section of formatting. The error is optional and non-fatal.

Note that the text at the top of formatting talks about creating a message from a data model. Parsing the data model can result in any of these errors and some of them probably can't be ignored by the implementation, so perhaps some text explicitly about creating the message from a DM representation is called for.

@aphillips aphillips requested review from eemeli and catamorphism March 6, 2024 15:24
@aphillips aphillips added the LDML46 LDML46 Release (Tech Preview - October 2024) label Mar 6, 2024
If _Syntax Errors_ are encountered, emit those errors.
The result of formatting a _message_ with a _Syntax Error_ is a single _fallback value_
that uses the _message_'s fallback string defined in the _formatting context_
or, if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER `�`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick:

Suggested change
or, if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER ``.
or, if this is not available or empty, the `U+FFFD` `REPLACEMENT CHARACTER` ``.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion doesn't match the style we otherwise use for character names in the spec.

Copy link
Collaborator

@mihnita mihnita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
M.

Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think differentiating the treatment of syntax and data model errors is a significant departure from our practice so far. Quoting @aphillips from the closing comment of #367 (comment) a few months ago:

Note that the categorization [as syntax or data model errors] in the spec doesn't matter (is not normative in any meaningful way).

If the intent here is to introduce such a separation, then this PR should include more explicit language stating exactly when data model errors are checked. But I don't think it should, and data model errors should continue to be treated the same as syntax errors.

Regarding Duplicate Option Name, its mention was correctly removed from formatting.md in #577, but it was accidentally re-added during conflict resolution for #593. As discussed in the former PR, it should not be emitted during formatting.

Re-reading this discussion and the spec, I think one part that the current text (written by me...) does not communicate clearly enough is that "Pattern Selection" is expected to apply to all messages, even if they only have one syntax pattern. So its current last paragraph applies in all cases, leading to the selection of a fallback pattern for any data model errors. Changing that to only apply for Variant Key Mismatch would be a mistake.

@aphillips
Copy link
Member Author

@eemeli I think the challenge for me is:

Syntax errors relate solely to the ABNF and parsing errors generated from that.

Data model errors are aspects of MF2 that cannot be represented strictly with the syntax. If one is constructing a message from a data model representation, data model errors will be visible as immediate failures, much like syntax errors.

However, when we parse a message from a string, we allow implementations to be lazy about evaluating expressions (including declarations). Most implementations will probably not be so lazy that this comes into play, but we go out of our way to permit it. At least some of the DM errors can avoid detection when evaluated lazily (particularly duplicate option name). The tension between eager and lazy evaluation is why a lot of my comments above exist.

In that light, I would probably go back towards an earlier version of this PR and just call out the eager/lazy problem. Along the lines of:

To start, create the message by parsing a string or creating it from a data model.
Any syntax or data model errors result in the fallback pattern.

[!NOTE]
Some types of data model error might not be detected during the
construction of the message, if the implementation chooses not to
evaluate expressions eagerly.

I remember our discussing pattern selection as a step (sometimes a quite trivial one). We should definitely make that clearer.

@eemeli
Copy link
Collaborator

eemeli commented Mar 8, 2024

However, when we parse a message from a string, we allow implementations to be lazy about evaluating expressions (including declarations). Most implementations will probably not be so lazy that this comes into play, but we go out of our way to permit it. At least some of the DM errors can avoid detection when evaluated lazily (particularly duplicate option name). The tension between eager and lazy evaluation is why a lot of my comments above exist.

As I also mention in #710 (comment), it's important to note that we only allow laziness in the evaluation of expressions, and not in their representation. Data model errors are not included in the ones that we allow to be ignored:

_Resolution Errors_ and _Formatting Errors_ in _expressions_ that are not used
in _pattern selection_ or _formatting_ MAY be ignored,
as they do not affect the output of the formatter.

@aphillips
Copy link
Member Author

@catamorphism @mihnita Do you check data model errors (I do not mean "do you create a data model" but rather "do you check for the specific message errors listed under that heading in errors.md") early?

@catamorphism
Copy link
Collaborator

@catamorphism @mihnita Do you check data model errors (I do not mean "do you create a data model" but rather "do you check for the specific message errors listed under that heading in errors.md") early?

I have a separate checking pass that runs after parsing but before formatting that checks for all of the data model errors except "Duplicate Option Name" (which is easiest to check during parsing).

In my implementation, the presence of data model errors doesn't stop formatting from running. This was based on an earlier version of the tests that specified expected output for some messages that had data model errors. It was quite hard to implement this behavior and it would simplify the code if both syntax errors and data model errors were non-recoverable ("non-recoverable" = the whole message gets replaced with a single fallback, rather than trying to present partial input).

Sorry, didn't have time to follow the rest of the discussion since I last commented!

@catamorphism
Copy link
Collaborator

By the way, my implementation is lazy but it does statically check data model errors, because those errors are easiest to check in a separate pass. I don't see those two things as in conflict with each other, which I think is in line with what @eemeli said.

@mihnita
Copy link
Collaborator

mihnita commented Mar 9, 2024

discard variables that were never used

Might be a good idea, or not.
It depends when it is done.

For example:

.input {$firstName :string}
.input {$lastName :string}
{{Hello {$firstName}!}}

This can be interpreted as a "contract" from the developer that in the arguments passed to MF2 there will be a lastName argument, so a Japanese translator can translate as {{Hello {$lastName}!}}

They can probably be dropped at compile time, or runtime.

I think that specifying exactly WHEN (at what stage) validation happens should not be in the spec.

@mihnita
Copy link
Collaborator

mihnita commented Mar 9, 2024

I currently validate data model post-parse, but before formatting.
When the user gets the data model it is already validated.
From the outside it looks like it happens at parse time, although it is not.
I guess this is similar to what Tim does.


If we change the data model from an array of declarations to a map of declarations (as proposed by #718) that would make (some) validation happen earlier.
When we build the model, not after.

This tells me that we should try to specify what kind of errors are emitted, but not when.

@catamorphism
Copy link
Collaborator

Just a note that if this PR lands, then the text that was deleted in #710 should be replaced with something like "Assert that the option's identifier does not already exist in the resolved mapping of options." (That could be in this PR or a subsequent one.)

Copy link
Collaborator

@catamorphism catamorphism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't fully address data model errors, as I noted, but I think what's here is consistent and it would be fine to address that in a future issue.

an appropriate error MUST be emitted and a _fallback value_ MAY be used as the formatting result.
To start, parse the _message_ from its syntax or create it from a data model description.

If _Syntax Errors_ are encountered, emit those errors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the deletion of "data model errors" makes it implementation-dependent whether data model errors are recoverable, or whether they should be treated exactly like syntax errors.

That's fine, but I would support opening a separate issue to discuss whether that's what we want, or whether the spec should require data model errors to be non-recoverable, as there seem to be different opinions.

@@ -466,6 +468,10 @@ according to their _key_ values and selecting the first one.
> because no matching _variant_ is available.

The number of _keys_ in each _variant_ MUST equal the number of _selectors_.
If it does not, emit a _Variant Key Mismatch_ error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This text also seems to written to allow the implementation to choose whether to recover from data model errors (an implementation that checks for them statically would never reach this point and would just use the fallback value for the result of formatting, and then this would be an assertion rather than a dynamic error).

Which is also fine, but reinforces my suggestion to open a separate issue to discuss that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #735 so that, if this PR lands, we don't forget to discuss data model errors.

@aphillips
Copy link
Member Author

In the 2024-05-06 call we decided to close this without prejudice.

@aphillips aphillips closed this May 12, 2024
@aphillips aphillips deleted the aphillips-syntax-fallback branch May 15, 2024 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatting Issue pertains to the formatting section of the spec LDML46 LDML46 Release (Tech Preview - October 2024) normative Issue affects normative text in the specification
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fallbacks for variables in syntactically invalid declarations
5 participants