Skip to content

Add error handling #320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 1, 2023
245 changes: 245 additions & 0 deletions spec/formatting.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,248 @@ the local variable takes precedence.

It is an error for a local variable definition to
refer to a local variable that's defined after it in the message.

## Error Handling

Errors in messages and their formatting may occur and be detected
at multiple different stages of their processing.
Where available,
the use of validation tools is recommended,
as early detection of errors makes their correction easier.

During the formatting of a message,
various errors may be encountered.
These are divided into the following categories:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick / editorial:
How do you feel about having a small list with titles only, almost like a TOC
Followed by the details, with examples.

It would help with the "big picture", vs something spread over many paragraphs.

These are divided into the following categories:
* Syntax errors
* Data Model errors
    * Variant Key Mismatch errors
    * Missing Fallback Variant errors
* Resolution errors
    * Unresolved Variable errors
    * Unknown Function errors
* Selection errors
    * Selector errors
* Formatting errors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how this started out, before the examples got added...

Let's revisit this when we've more of the spec gathered up? The solution for this section is likely to look very similar to other sections as well.


- **Syntax errors** occur when the syntax representation of a message is not well-formed.

Example invalid messages resulting in a Syntax error:

```
{Missing end brace
```

```
{Unknown {#placeholder#}}
```

```
let $var = {(no message body)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you clarify this one? Is it illegal to assign the empty string or null to a variable? I think it might be a feature in some situations...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the let line is fine; it's there just to indicate that the source isn't completely empty. The failure here is from there being nothing after the let, i.e. no Pattern or Selector+Variants.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty pattern is not an error, though, right? It's not a very useful message, but it's not an error, is it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's an error because it's missing the {} wrappers. Something like this would be a valid empty message:

let $var = {(no message body)}
{}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying that the empty string is not a valid pattern and results in an error? That is, the minimum a pattern string can ever be is {}? That makes me uncomfortable....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is what the current EBNF states.

```

- **Data Model errors** occur when a message is invalid due to
violating one of the semantic requirements on its structure:

- **Variant Key Mismatch errors** occur when the number of keys on a Variant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: VariantKey

Part of the editorial cleanup I'm proposing, that would add back-ticks around keywords, and would make sure this is consistent with the spec.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer leaving this as "Variant Key" for now, as it's rather likely for this to leak out as a user-visible string in the error.

does not equal the number of Selectors.

Example invalid messages resulting in a Variant Key Mismatch error:

```
match {$one}
when 1 2 {Too many}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having too few or too many keys (compared to the match expression) is clearly a syntax error, and should be in that category. It's like calling a function in C or Java with too few or too many parameters; not a semantic or data model error, but rather one that can be detected purely because of malformed syntax.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent in differentiating this as a "data model" rather than "syntax" error is that it allows us to apply the same error check independently of the syntax used by the message. Here's what I said about this earlier in #320 (comment):

Let's say that there's a FooFormat9000 that's exactly like MF1, except that it doesn't require an "other" case for its selectors and defaults to an empty string instead. Now, if a FooFormat9000 message is parsed into an MF2 data model for use with an MF2 runtime, the code doing so might not add the expected default variant to the data model. When the MF2 runtime discovers this, it needs to emit a [Data Model] Error rather than a Syntax Error, because it doesn't know anything about FooFormat9000 where this syntax is valid.

An MF2 syntax parser would be expected to emit the data model errors during its work, as this is included in the spec (line 176):

Syntax and Data Model errors must be emitted as soon as possible.

This separation also matches the differentiation between "well-formed" and "valid" messages in the syntax spec.

when * {Otherwise}
```

```
match {$one} {$two}
when 1 2 {Two keys}
when * {Missing a key}
when * * {Otherwise}
```

- **Missing Fallback Variant errors** occur when the message
does not include a Variant with only catch-all keys.

Example invalid messages resulting in a Missing Fallback Variant error:

```
match {$one}
when 1 {Value is one}
when 2 {Value is two}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add text to say:

Missing when * case:

(and on line 82 add "Missing when * * case:")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the necessity for these specifiers. Are these specific examples more difficult to respond about than the others, or should we specify explicitly what's wrong in each of them?

Regarding the examples in general, I at least prefer requiring the reader to at least momentarily pause on some of them to figure out for themselves what's going on, rather than making sure that no thought is required.


```
match {$one} {$two}
when 1 * {First is one}
when * 1 {Second is one}
```

- **Resolution errors** occur when the runtime value of a part of a message
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear what this means.

Can we say "function resolution"?
There are very few things that can go wrong:

  • local variable definitions => similar to placeholders
  • plain text parts
  • placeholders => have a variable / literal part + function name + bag of options
  • selectors => have again variable(s) / literal part + function name + bag of options

We already have unresolved as a class of errors a bit below.
So, what else can go wrong in the (already parsed) parts above?
I think only function names?

So what about we call this section "Function resolution"?
Instead of "resolution errors", where "resolution" is not explained in the spec, and we don't even agree it is needed. We might agree that it is an implementation detail. So there is no need to mention it in the spec.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between the "resolution" and "formatting" errors that's proposed here is perhaps best seen by considering how a Placeholder containing an Expression is handled. Let's say that we have something like

{(foo) :message}

or

{(user_age) :global}

where :message and/or :global is a custom Function that uses its argument to look up a value from elsewhere, and that this value is then formatted as a part of the final message.

During the formatting of this, the custom code could then emit two different sorts of errors:

  1. Resolution Error if there's a failure in getting the value that is to be formatted, e.g. if no foo message is available or if the user_age global is not set.
  2. Formatting Error if the found value can't be formatted, e.g. because the foo message includes a variable reference that can't be resolved, or the user_age value turns out to have some unexpected shape.

Would you agree that these are different error categories, and that this categorical split could lead to different error handling in user code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user_age isn't set, isn't that an unresolved variable error?

If no foo message is available, that would be an internal error of the format function message, right? Does that mean "resolution error" is really "function internal error"?

In any case, I think I would move this item below some of the others here (perhaps to the bottom of the list), since I find myself thinking that this could also mean unresolved or formatting error when in fact this error would probably only occur later (only when the pattern is syntactically correct and all of the variables and functions have been resolved but there is still a problem).

Or am I still not understanding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both foo and user_age are literal values here, so the errors are coming from the :message and :global custom functions; from the PoV of the core implementation, the are no variables here to resolve.

The intent here is to allow for a custom formatting function to emit two different kinds of errors: Resolution and Formatting. This is meant to enable something like :message or :global to work from an end-user PoV as much as possible like core features such as $var.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I don't think what I am saying here affects the spec, but the answer above.

I think it is contradicting:

  • Resolution ... failure in getting the value that is to be formatted
  • Formatting ... includes a variable reference that can't be resolved ...

It is unclear what is the difference between the two: "get the value" and "resolve variable reference"
The "publicly visible" operation is probably "I have a variable named foo, I want to get the value"
There is no "reference" except deep in the implementation.

But that implementation detail should not "leak" in the kind of error I am getting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the syntax example for the above discussion is this placeholder:

{(foo) :message}

The foo here is not a variable reference, it's a literal value that a custom :message function is interpreting as a message identifier.

cannot be determined.

- **Unresolved Variable errors** occur when a variable reference cannot be resolved.

For example, attempting to format either of the following messages
must result in an Unresolved Variable error if done within a context that
does not provide for the variable reference `$var` to be successfully resolved:

```
{The value is {$var}.}
```

```
match {$var}
when 1 {The value is one.}
when * {The value is not one.}
```

- **Unknown Function errors** occur when an Expression includes
a reference to a function which cannot be resolved.

For example, attempting to format either of the following messages
must result in an Unknown Function error if done within a context that
does not provide for the function `:func` to be successfully resolved:

```
{The value is {(horse) :func}.}
```

```
match {(horse) :func}
when 1 {The value is one.}
when * {The value is not one.}
```

- **Selection errors** occur when message selection fails.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one bullet (Selection errors) with only one sub-bullet (Selector errors)
Is there a difference? Are there any other kind of "Selection errors" other then "Selector errors"

Otherwise feels redundant, like having:

* Syntax errors
   * Invalid syntax errors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave the selection + selector error categories as is for now, but adjust them later after #322 is resolved.


- **Selector errors** are failures in the matching of a key to a specific selector.

For example, attempting to format either of the following messages
might result in a Selector error if done within a context that
uses a `:plural` selector function which requires its input to be numeric:

```
match {(horse) :plural}
when 1 {The value is one.}
when * {The value is not one.}
```

```
let $sel = {(horse) :plural}
match {$sel}
when 1 {The value is one.}
when * {The value is not one.}
```

- **Formatting errors** occur during the formatting of a resolved value,
for example when encountering a value with an unsupported type
or an internally inconsistent set of options.

For example, attempting to format any of the following messages
might result in a Formatting error if done within a context that

1. provides for the variable reference `$user` to resolve to
an object `{ name: 'Kat', id: 1234 }`,
2. provides for the variable reference `$field` to resolve to
a string `'address'`, and
3. uses a `:get` formatting function which requires its argument to be an object and
an option `field` to be provided with a string value,

```
{Hello, {(horse) :get field=name}!}
```

```
{Hello, {$user :get}!}
```

```
let $id = {$user :get field=id}
{Hello, {$id :get field=name}!}
```

```
{Your {$field} is {$id :get field=$field}}
```

Syntax and Data Model errors must be emitted as soon as possible.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal: add a heading above it?
Probably ### level.

Because we go from the list of error types (what we produce), with examples for each, to a section explaining how / when to produce said errors.

It probably made sense when the list errors was a real list, visible at a glance.
But now it gets lost after the more detailed form.

If you think it helps also add a ### level heading before error types.
Proposal:

## Error Handling
### Error types
  ...
### How and when to produce errors (or a better title if you have one)
 ...


During selection, an Expression handler must only emit Resolution and Selection errors.
During formatting, an Expression handler must only emit Resolution and Formatting errors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above: This phrasing doesn't explicitly mandate that errors should chain (or nest, depending how you look at it). I.e a resolution error can lead to a formatting error. Implementations should emit both to enable higher-level abstractions to react differently to these different failures.


In all cases, when encountering an error,
a message formatter must provide some representation of the message.
An informative error or errors must also be separately provided.
When a message contains more than one error,
or contains some error which leads to further errors,
an implementation which does not emit all of the errors
should prioritise Syntax and Data Model errors over others.

When an error occurs in the resolution of an Expression or Markup Option,
the Expression or Markup in question is processed as if the option were not defined.
Comment on lines +189 to +190
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this lead to unresolved variable errors later that would otherwise be avoided? See my comment about allowing the empty variable above. Wouldn't it be better to have just the one error for:

let $var = {(horse) :unknown_function}
{You have a {$var}}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's tricky, because determining whether accessing {$var} here causes a second error would require us to define what the value of {(horse) :unknown_function} is. But maybe we do need to define that a little bit, because we should make it clear whether this message ought to get formatted as

You have a {$var}

or as

You have a {(horse)}

I'll add a statement clarifying that it should be the latter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... because the first error (:unknown_function) assigned the fallback string to $var and the second line just replaces {$var} with that, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of, but not quite. It's probably easiest to think of the value of $var here as an implementation-defined blob that is identifiable as a fallback value and which stringifies as (horse). Then, when it's formatted in the placeholder {$var}, it comes out as the string {(horse)}.

This indirection allows for us to consider the fallback value and the target of the formatting separately from each other, and more easily support non-string formatting targets.

This may allow for the fallback handling described below to be avoided,
though an error must still be emitted.

When an error occurs within a Selector,
the selector must not match any VariantKey other than the catch-all `*`
and a Resolution or Selector error is emitted.

## Fallback String Representations

The formatted string representation of a message with a Syntax or Data Model error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better for each error to define the fallback representation rather than trying to collect them here? For example "The fallback string representation of an unknown function error is ..." in the section on that type of error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer defining the fallback representation according to the Placeholder for which it applies, rather than the error type. This makes them more predictable and intentionally restricts implementations from trying to be too smart about the behaviour.

Consider for instance something like {$user :get key=name opt=$var}, which is intended to get the user name with some opt qualifier. If the $var may be controlled by an attacker, it may lead to the :get formatting to fail, which means that a fallback representation must be used. If an implementation could define some custom error for it, a "smart" solution could be to try and stringify $user, which could lead to some sensitive information leaking out in an unexpected manner. Instead, with the explicit definition here it must use the string {$user}, which is sufficiently opaque to avoid this failure mode.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I meant something different? I meant that instead of defining fallbacks in a separate section of the spec which then refers back to syntax and data model error types, it would potentially better to define error behavior inline with each error type.

I am not suggesting that we encourage/permit different implementations to make up their own fallback behavior or to define new error types with (suspect) fallback behavior.

One reason for my thinking is the example in the comment above:

let $var = {(horse) :unknown_function}
{You have a {$var}}

If I'm writing an implementation, I probably have an assignment operator bit of code for the let (x) = {expr}. One likely way to implement that would be to make the argument list a Map keyed with strings, into which the key var is put. When I do this, I'm not evaluating the rest of the pattern--to know how or even if the value of var is used. But I do need to assign var a value in the let expression. Defining the output of the "unknown function" error gives me clear guidance. Does that make sense? (This also helps answer @stasm's question about multiple errors)

is the concatenation of U+007B LEFT CURLY BRACKET `{`,
a fallback string,
and U+007D RIGHT CURLY BRACKET `}`.
If a fallback string is not defined,
the U+FFFD REPLACEMENT CHARACTER `�` character is used,
resulting in the string `{�}`.

When an error occurs in a Placeholder that is being formatted,
the fallback string representation of the Placeholder
always starts with U+007B LEFT CURLY BRACKET `{`
and ends with U+007D RIGHT CURLY BRACKET `}`.
Comment on lines +210 to +211
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might have a bidi consideration here? If the string being formatted is in Arabic, we might emit an FSI before the { and a PDI after the }. Format patterns use lots of neutrals (such as $ and :) and look like gibberish in a bidi context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be covered by the text in PR #315? The intent there is to define a prefix and a suffix to a part's formatted string representation, which here would be something like {$foo}.

Between the brackets, the following contents are used:

- Expression with Literal Operand: U+0028 LEFT PARENTHESIS `(`
followed by the value of the Literal,
and then by U+0029 RIGHT PARENTHESIS `)`

Examples: `{(horse)}`, `{(42)}`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be an error? Ever?
It is a string literal, to use "as is"

I agree that it can make sense if a function is specified: {(42) :number}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples are showing examples of the fallback string representations, so .e.g {(42)} would be the one used if {(42) :number} failed.


- Expression with Variable Operand: U+0024 DOLLAR SIGN `$`
followed by the Variable Name of the Operand

Example: `{$user}`

- Expression with no Operand: U+003A COLON `:` followed by the Expression Name

Example: `{:platform}`

- Markup start: U+002B PLUS SIGN `+` followed by the MarkupStart Name

Example: `{+tag}`

- Markup end: U+002D HYPHEN-MINUS `-` followed by the MarkupEnd Name

Example: `{-tag}`

- Otherwise: The U+FFFD REPLACEMENT CHARACTER `�` character
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal: change this to "the string between { and }, as is.
I would assume we get here is none of the cases before was encountered.

So probably a syntax error (bad sigil, for example, or bad format for parameters):
"Encountered {?count} penguins" is more useful than "Encountered {�} penguins"
Or
"Encountered {$count :number invalid options ?? format} penguins"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is catching anything not covered by the above. I think that's currently an empty set, as syntax errors use {�} or {fallback string} for the whole message. So this should never happen, but let's still include it to make sure that we don't end up with undefined behaviour.

I would strongly prefer not including any explicit reference to syntax source here, as that would require keeping a reference to it in all implementations, and might not apply at all if the source is not an MF2 syntax message.


Example: `{�}`

Option names and values are not included in the fallback string representations.

When an error occurs in an Expression with a Variable Operand
and the Variable refers to a local variable Declaration,
the fallback string is formatted based on the Expression of the Declaration,
rather than the Expression of the Placeholder.

For example, attempting to format either of the following messages within a context that
does not provide for the function `:func` to be successfully resolved:

```
let $var = {(horse) :func}
{The value is {$var}.}
```

```
let $var = {(horse)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I don't think the second example should be valid

let $var = {(horse)}
{The value is {$var :func}.}

Because the whole point of local variables would be reuse, mostly for consistency.
If change the function in the message body, the whole consistency goes out the windows.

Yes, not difficult to implement (just a bit worse performance).
But kind of no good use case, with potential of misuse.

So my suggestion would be to remove the second example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant as a minimal example showing that the identifiers of local variables should never end up being included in the formatted output. It's not really meant to be a useful message, but it's technically valid.

{The value is {$var :func}.}
```

would result in both cases with this formatted string representation:

```
The value is {(horse)}.
```