Skip to content

Add error handling #320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 1, 2023
Merged

Add error handling #320

merged 11 commits into from
Feb 1, 2023

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Dec 19, 2022

Closes #45

As discussed in the issue and WG calls, it's important for the spec to provide a solid description of its error handling and ensure that the potential of message formatting causing fatal errors is minimised. It's also important for us to strike an appropriate balance with respect to implementation requirements and complexity, and not through this make it significantly more difficult or impossible to put together a compliant implementation.

This error handling spec does not enforce the shape of the errors or the manner of their emission, as long as it does not preclude that some representation of the formatted message is always provided. The text here is rather explicit about the fallback shape of errors when formatting to a string target; that should guide but not restrict the behaviour of implementations that provide other targets, for which different fallback representations may be more appropriate.

@eemeli
Copy link
Collaborator Author

eemeli commented Dec 22, 2022

Updated the PR to answer comments raised above, added a bit of an intro to the description, and marking this as ready for review.

The biggest change is the addition of selection errors. One part of that is notable:

When selection fails to match any Variant, an empty string is used as the formatted string representation of the message and a Missing Fallback error is emitted.

Would it be more appropriate to use a whole-message fallback {???} in this case as well? The rationale with the current text is that not matching any variant means that there's no message content that should be shown.

@eemeli eemeli marked this pull request as ready for review December 22, 2022 13:36
Comment on lines +65 to +66
always starts with U+007B LEFT CURLY BRACKET `{`
and ends with U+007D RIGHT CURLY BRACKET `}`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might have a bidi consideration here? If the string being formatted is in Arabic, we might emit an FSI before the { and a PDI after the }. Format patterns use lots of neutrals (such as $ and :) and look like gibberish in a bidi context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be covered by the text in PR #315? The intent there is to define a prefix and a suffix to a part's formatted string representation, which here would be something like {$foo}.

Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor consistency/formatting comments, plus one question/material comment.

@macchiati

This comment was marked as resolved.

@eemeli

This comment was marked as resolved.

@aphillips

This comment was marked as off-topic.

@macchiati

This comment was marked as off-topic.

@eemeli

This comment was marked as off-topic.

@aphillips

This comment was marked as off-topic.

@macchiati

This comment was marked as off-topic.

@eemeli
Copy link
Collaborator Author

eemeli commented Jan 4, 2023

I've created discussion #322 to continue the other vs * discussion, and have copied there the relevant comments above from @aphillips, @macchiati and myself, which are now hidden here.

These are divided into the following categories:

- **Syntax errors** occur when the syntax representation of a message is invalid.
- **Resolution errors** occur when the runtime value of a part of a message
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unclear what this means.

Can we say "function resolution"?
There are very few things that can go wrong:

  • local variable definitions => similar to placeholders
  • plain text parts
  • placeholders => have a variable / literal part + function name + bag of options
  • selectors => have again variable(s) / literal part + function name + bag of options

We already have unresolved as a class of errors a bit below.
So, what else can go wrong in the (already parsed) parts above?
I think only function names?

So what about we call this section "Function resolution"?
Instead of "resolution errors", where "resolution" is not explained in the spec, and we don't even agree it is needed. We might agree that it is an implementation detail. So there is no need to mention it in the spec.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference between the "resolution" and "formatting" errors that's proposed here is perhaps best seen by considering how a Placeholder containing an Expression is handled. Let's say that we have something like

{(foo) :message}

or

{(user_age) :global}

where :message and/or :global is a custom Function that uses its argument to look up a value from elsewhere, and that this value is then formatted as a part of the final message.

During the formatting of this, the custom code could then emit two different sorts of errors:

  1. Resolution Error if there's a failure in getting the value that is to be formatted, e.g. if no foo message is available or if the user_age global is not set.
  2. Formatting Error if the found value can't be formatted, e.g. because the foo message includes a variable reference that can't be resolved, or the user_age value turns out to have some unexpected shape.

Would you agree that these are different error categories, and that this categorical split could lead to different error handling in user code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user_age isn't set, isn't that an unresolved variable error?

If no foo message is available, that would be an internal error of the format function message, right? Does that mean "resolution error" is really "function internal error"?

In any case, I think I would move this item below some of the others here (perhaps to the bottom of the list), since I find myself thinking that this could also mean unresolved or formatting error when in fact this error would probably only occur later (only when the pattern is syntactically correct and all of the variables and functions have been resolved but there is still a problem).

Or am I still not understanding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both foo and user_age are literal values here, so the errors are coming from the :message and :global custom functions; from the PoV of the core implementation, the are no variables here to resolve.

The intent here is to allow for a custom formatting function to emit two different kinds of errors: Resolution and Formatting. This is meant to enable something like :message or :global to work from an end-user PoV as much as possible like core features such as $var.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I don't think what I am saying here affects the spec, but the answer above.

I think it is contradicting:

  • Resolution ... failure in getting the value that is to be formatted
  • Formatting ... includes a variable reference that can't be resolved ...

It is unclear what is the difference between the two: "get the value" and "resolve variable reference"
The "publicly visible" operation is probably "I have a variable named foo, I want to get the value"
There is no "reference" except deep in the implementation.

But that implementation detail should not "leak" in the kind of error I am getting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the syntax example for the above discussion is this placeholder:

{(foo) :message}

The foo here is not a variable reference, it's a literal value that a custom :message function is interpreting as a message identifier.

An informative error or errors must also be separately provided.

When an error occurs in the syntax or resolution of an Expression or MarkupStart Option,
the Expression or MarkupStart in question is processed as if the option were not defined.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not mention MarkupStart, as there is no agreement on markup.

Copy link
Collaborator Author

@eemeli eemeli Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you objecting here out of principle, or do you specifically think that failures in Expression and Markup options should be handled differently?

@eemeli

This comment was marked as resolved.

@aphillips
Copy link
Member

@eemeli

Regarding syntax errors, we currently say this:

A message satisfying all rules of the grammar is considered _well-formed_.
Furthermore, a well-formed message can is considered _valid_
if it meets additional semantic requirements about its structure, defined below.

I see some English grammatical issues with line 221 :-)

Could we separate these so that we'd consider a message that is not well-formed to have a Syntax Error, and a message which is not valid to have a Data Model Error?

Yes and we should formally define these (and other) error types. The set of errors should be as small as possible (but no smaller).

Separating these into different categories would make it easier to also define different error handling for them, in particular with respect to the Data Model Errors, which are in many cases recoverable while the former less so:

A well-formed message is considered valid if the following requirements are satisfied:
- The number of keys on each variant must be equal to the number of selectors.
- At least one variant's keys must all be equal to the catch-all key (`*`).

This would also allow for Missing Fallback to be considered a Data Model Error.

The question here is whether the fallback is (as currently defined) an error at parse time or only triggers when none of the options are matched.

@eemeli
Copy link
Collaborator Author

eemeli commented Jan 17, 2023

Updated to add data model errors as a category, along with a note:

Syntax and Data Model errors must be emitted as soon as possible.

Also, references to message identifiers are dropped and syntax errors in options get no special treatment.

@mihnita

This comment was marked as resolved.


During the formatting of a message,
various errors may be encountered.
These are divided into the following categories:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick / editorial:
How do you feel about having a small list with titles only, almost like a TOC
Followed by the details, with examples.

It would help with the "big picture", vs something spread over many paragraphs.

These are divided into the following categories:
* Syntax errors
* Data Model errors
    * Variant Key Mismatch errors
    * Missing Fallback Variant errors
* Resolution errors
    * Unresolved Variable errors
    * Unknown Function errors
* Selection errors
    * Selector errors
* Formatting errors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how this started out, before the examples got added...

Let's revisit this when we've more of the spec gathered up? The solution for this section is likely to look very similar to other sections as well.

Copy link
Collaborator

@mihnita mihnita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left quite a few comments, more than I hoped.
But none of the are blockers for the merge.

Many (most?) are nitpicks, which (in my own convention) means:

  • If you like the idea, do it
  • If you don't like it, feel free to ignore it, and there is no need to defend that decision
  • Implementing it or not does not affect my approval

I hope GitHub does not block this because of "pending comments"
If it does, resolve. If that invalidates the approval, I will approve it again.
(I still find the GitHub review pretty clunky)

- **Data Model errors** occur when a message is invalid due to
violating one of the following semantic requirements on its structure:

- **Variant Key Mismatch errors** occur when the number of keys on a Variant
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: VariantKey

Part of the editorial cleanup I'm proposing, that would add back-ticks around keywords, and would make sure this is consistent with the spec.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer leaving this as "Variant Key" for now, as it's rather likely for this to leak out as a user-visible string in the error.

when * {The value is not one.}
```

- **Selection errors** occur when message selection fails.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one bullet (Selection errors) with only one sub-bullet (Selector errors)
Is there a difference? Are there any other kind of "Selection errors" other then "Selector errors"

Otherwise feels redundant, like having:

* Syntax errors
   * Invalid syntax errors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave the selection + selector error categories as is for now, but adjust them later after #322 is resolved.

{Hello, {$id :get field=name}!}
```

Syntax and Data Model errors must be emitted as soon as possible.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal: add a heading above it?
Probably ### level.

Because we go from the list of error types (what we produce), with examples for each, to a section explaining how / when to produce said errors.

It probably made sense when the list errors was a real list, visible at a glance.
But now it gets lost after the more detailed form.

If you think it helps also add a ### level heading before error types.
Proposal:

## Error Handling
### Error types
  ...
### How and when to produce errors (or a better title if you have one)
 ...

followed by the value of the Literal,
and then by U+0029 RIGHT PARENTHESIS `)`

Examples: `{(horse)}`, `{(42)}`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be an error? Ever?
It is a string literal, to use "as is"

I agree that it can make sense if a function is specified: {(42) :number}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The examples are showing examples of the fallback string representations, so .e.g {(42)} would be the one used if {(42) :number} failed.


Example: `{-tag}`

- Otherwise: The U+FFFD REPLACEMENT CHARACTER `�` character
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposal: change this to "the string between { and }, as is.
I would assume we get here is none of the cases before was encountered.

So probably a syntax error (bad sigil, for example, or bad format for parameters):
"Encountered {?count} penguins" is more useful than "Encountered {�} penguins"
Or
"Encountered {$count :number invalid options ?? format} penguins"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is catching anything not covered by the above. I think that's currently an empty set, as syntax errors use {�} or {fallback string} for the whole message. So this should never happen, but let's still include it to make sure that we don't end up with undefined behaviour.

I would strongly prefer not including any explicit reference to syntax source here, as that would require keeping a reference to it in all implementations, and might not apply at all if the source is not an MF2 syntax message.

```

```
let $var = {(horse)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I don't think the second example should be valid

let $var = {(horse)}
{The value is {$var :func}.}

Because the whole point of local variables would be reuse, mostly for consistency.
If change the function in the message body, the whole consistency goes out the windows.

Yes, not difficult to implement (just a bit worse performance).
But kind of no good use case, with potential of misuse.

So my suggestion would be to remove the second example.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant as a minimal example showing that the identifiers of local variables should never end up being included in the formatted output. It's not really meant to be a useful message, but it's technically valid.

@mihnita
Copy link
Collaborator

mihnita commented Jan 28, 2023

Thank you very much for all the work and the patience!
The split between syntax / model / runtime (select format) feels clean, LGTM.

@macchiati
Copy link
Member

This document covers a lot of the types of errors that could occur, which is great.

It is very much written from the prospective of an "untyped" programming language environment. In many environments (such as templating systems), the message can be analyzed more completely, and many more errors can be "compile-time" errors or "lint" errors instead of "runtime" errors. So it would be good to note that.

For example, the syntax errors and variant key mismatch errors (which are really syntax errors) can easily be detected in such circumstances. When the system has access to the specified Selectors and Functions that are invoked in the course of processing the message, and knows the types of the arguments supplied, then that can also be checked.

If a system has access to the plural selector, then the following is detectable at compile time.
match {(horse) :plural}
when 1 {The value is one.}
...

Same for this one, if there is access to the get formatter.

{Hello, {(horse) :get field=name}!}

Upshot:

  1. It would be useful to have some examples of errors that can't be detected by a system that has access to the types of the parameters and to the definitions of the Selection and Format functions that would be in play at runtime. (I don't see any currently in the document).
  2. If we can't find such examples, then the document should make it clear that such systems do not have to have a Fallback String representation.

@macchiati
Copy link
Member

macchiati commented Jan 28, 2023 via email

@eemeli
Copy link
Collaborator Author

eemeli commented Jan 30, 2023

@macchiati:
[...] In many environments (such as templating systems), the message can be analyzed more completely, and many more errors can be "compile-time" errors or "lint" errors instead of "runtime" errors. So it would be good to note that.

I've now added a first paragraph noting this:

Errors in messages and their formatting may occur and be detected at multiple different stages of their processing. Where available, the use of validation tools is recommended, as early detection of errors makes their correction easier.

@macchiati:
1. It would be useful to have some examples of errors that can't be detected by a system that has access to the types of the parameters and to the definitions of the Selection and Format functions that would be in play at runtime. (I don't see any currently in the document).

I've added a fourth example under Formatting errors which I believe demonstrates such an error. I'm not claiming that what it presents is good practice, or that it couldn't be avoided by e.g. a stricter typing of the value of $field, but it's at least technically valid MF2.

@stasm Do you think we could/should include in the function registry some (optional) gating on when/whether a function could fail? It might be useful to be able to determine the complete set of conditions that's required and/or sufficient for formatting a message without any failure. This would be beyond the scope of this PR, but might allow for earlier/easier detection of errors.

Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My remaining comments are editorial.

```

```
let $var = {(no message body)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you're saying that the empty string is not a valid pattern and results in an error? That is, the minimum a pattern string can ever be is {}? That makes me uncomfortable....

match {$one}
when 1 {Value is one}
when 2 {Value is two}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps add text to say:

Missing when * case:

(and on line 82 add "Missing when * * case:")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the necessity for these specifiers. Are these specific examples more difficult to respond about than the others, or should we specify explicitly what's wrong in each of them?

Regarding the examples in general, I at least prefer requiring the reader to at least momentarily pause on some of them to figure out for themselves what's going on, rather than making sure that no thought is required.


```
match {$one}
when 1 2 {Too many}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having too few or too many keys (compared to the match expression) is clearly a syntax error, and should be in that category. It's like calling a function in C or Java with too few or too many parameters; not a semantic or data model error, but rather one that can be detected purely because of malformed syntax.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent in differentiating this as a "data model" rather than "syntax" error is that it allows us to apply the same error check independently of the syntax used by the message. Here's what I said about this earlier in #320 (comment):

Let's say that there's a FooFormat9000 that's exactly like MF1, except that it doesn't require an "other" case for its selectors and defaults to an empty string instead. Now, if a FooFormat9000 message is parsed into an MF2 data model for use with an MF2 runtime, the code doing so might not add the expected default variant to the data model. When the MF2 runtime discovers this, it needs to emit a [Data Model] Error rather than a Syntax Error, because it doesn't know anything about FooFormat9000 where this syntax is valid.

An MF2 syntax parser would be expected to emit the data model errors during its work, as this is included in the spec (line 176):

Syntax and Data Model errors must be emitted as soon as possible.

This separation also matches the differentiation between "well-formed" and "valid" messages in the syntax spec.

@macchiati
Copy link
Member

macchiati commented Jan 30, 2023 via email

@macchiati
Copy link
Member

Errors in messages and their formatting may occur and be detected at multiple different stages of their processing. Where available, the use of validation tools is recommended, as early detection of errors makes their correction easier.

This isn't quite what I meant, which is something more like:

Where (a) the Selector and Format definitions are accessible, and (b) the number and types of the input parameters are known, proper use of validation tooling can ensure that there will be no runtime errors emitted.

(I don't see your "fourth example under Formatting errors" yet)

@eemeli
Copy link
Collaborator Author

eemeli commented Jan 31, 2023

@macchiati:
By "fourth example under Formatting errors" do you mean a 4th example under iii ? I was looking for an iv but didn't find it.

The added example is on line 173:

{Your {$field} is {$id :get field=$field}}

The error here relies on line 154, also added in the same commit:

  1. provides for the variable reference $field to resolve to a string 'address'

@macchiati:
Where (a) the Selector and Format definitions are accessible, and (b) the number and types of the input parameters are known, proper use of validation tooling can ensure that there will be no runtime errors emitted.

I don't think we can or should guarantee "no runtime errors", because we can't control what sorts of custom functions provided by third parties may be used. For example, there could be an :env function available that finds an environment variable matching its string argument. While we can ensure with that said argument is a string, we can't really ensure before running the code in its environment that every environment variable is available or has an expected shape.

@eemeli
Copy link
Collaborator Author

eemeli commented Feb 1, 2023

As discussed at the last meeting, let's merge this now that it has general approval and (as far as I can tell) blocker issues have been resolved. We can (and should!) continue to improve this section and the whole spec, but let's do so within the scope of further issues and PRs.

@eemeli eemeli merged commit 7ee489b into unicode-org:main Feb 1, 2023
@eemeli eemeli deleted the errors branch February 1, 2023 06:25
@aphillips
Copy link
Member

@eemeli Thank you for merging this: I had planned to ask you to do it today, but you beat me to it 🎉

@macchiati
Copy link
Member

I don't think we can or should guarantee "no runtime errors", because we can't control what sorts of custom functions provided by third parties may be used. For example, there could be an :env function available that finds an environment variable matching its string argument. While we can ensure with that said argument is a string, we can't really ensure before running the code in its environment that every environment variable is available or has an expected shape.

Let me expand on this a bit. Of course, when you reference a Selector or Formatter that throws an error, you can't do anything about that. And that can certainly be the case for custom Selectors and Functions.

The following is still pretty rough, but it is what I'm trying to get at.

A Selector is well-behaved when all of the following conditions are met:

  1. It declares a domain of variant keys, (eg "high", "medium", "low"; or integers).
  2. It declares a set of modifiers.
  3. For any set of modifiers it will return true for at least one element in that domain, and will return true or false for all others (without error).
  4. It will always return true for *, with any set of modifiers.

A Function is well-behaved when all of the following conditions are met:

  1. It declares a domain of variant keys (eg "high", "medium", "low"; or integers)
  2. It declares a set of modifiers.
  3. For any set of modifiers it will return a string for each element in that domain. (without error)

Note: implementations of all standard registered Selectors and Functions must be well-behaved.

A message is well-behaved all of the following conditions are met:

  1. It is syntactically correct.
  2. All of its Selectors and Functions are well-behaved.

Inputs to a message are complete when there is a value for each one of them that is used by a Selector or Function in the message, and that value is in the domain of those Selector or Function.

Well-behaved messages will not produce errors when their inputs are complete. This does not, of course, include extraordinary conditions such as out-of-memory.

Back to your example, an :env function would not be well-behaved, and thus any message that used it might throw errors.

@zbraniecki
Copy link
Member

It declares a domain of variant keys, (eg "high", "medium", "low"; or integers).

I believe we should leave syntactic and semantic space for open bound selectors.

An example of such selector:

match {$cityName}
  when "New York" {Nowy Jork}
  when "Paris" {Paryż}
  when * {$cityName}

@eemeli
Copy link
Collaborator Author

eemeli commented Feb 2, 2023

@macchiati It sounds like we should include an attribute of some sort in the function registry for declaring whether a function is "well-behaved" or not, so that this could be used as input for e.g. validators. Would you be ok with filing this as a separate issue, so that we don't lose track of this conversation at the tail end of a closed PR?

@macchiati
Copy link
Member

macchiati commented Feb 2, 2023 via email

@macchiati
Copy link
Member

macchiati commented Feb 2, 2023 via email

@macchiati
Copy link
Member

I said I would do this a while back; just added the issue as #377

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify error fallbacking
8 participants