-
-
Notifications
You must be signed in to change notification settings - Fork 36
Add error handling #320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add error handling #320
Conversation
Updated the PR to answer comments raised above, added a bit of an intro to the description, and marking this as ready for review. The biggest change is the addition of selection errors. One part of that is notable:
Would it be more appropriate to use a whole-message fallback |
always starts with U+007B LEFT CURLY BRACKET `{` | ||
and ends with U+007D RIGHT CURLY BRACKET `}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might have a bidi consideration here? If the string being formatted is in Arabic, we might emit an FSI before the {
and a PDI after the }
. Format patterns use lots of neutrals (such as $
and :
) and look like gibberish in a bidi context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be covered by the text in PR #315? The intent there is to define a prefix and a suffix to a part's formatted string representation, which here would be something like {$foo}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor consistency/formatting comments, plus one question/material comment.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
I've created discussion #322 to continue the |
These are divided into the following categories: | ||
|
||
- **Syntax errors** occur when the syntax representation of a message is invalid. | ||
- **Resolution errors** occur when the runtime value of a part of a message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is unclear what this means.
Can we say "function resolution"?
There are very few things that can go wrong:
- local variable definitions => similar to placeholders
- plain text parts
- placeholders => have a variable / literal part + function name + bag of options
- selectors => have again variable(s) / literal part + function name + bag of options
We already have unresolved as a class of errors a bit below.
So, what else can go wrong in the (already parsed) parts above?
I think only function names?
So what about we call this section "Function resolution"?
Instead of "resolution errors", where "resolution" is not explained in the spec, and we don't even agree it is needed. We might agree that it is an implementation detail. So there is no need to mention it in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference between the "resolution" and "formatting" errors that's proposed here is perhaps best seen by considering how a Placeholder containing an Expression is handled. Let's say that we have something like
{(foo) :message}
or
{(user_age) :global}
where :message
and/or :global
is a custom Function that uses its argument to look up a value from elsewhere, and that this value is then formatted as a part of the final message.
During the formatting of this, the custom code could then emit two different sorts of errors:
- Resolution Error if there's a failure in getting the value that is to be formatted, e.g. if no
foo
message is available or if theuser_age
global is not set. - Formatting Error if the found value can't be formatted, e.g. because the
foo
message includes a variable reference that can't be resolved, or theuser_age
value turns out to have some unexpected shape.
Would you agree that these are different error categories, and that this categorical split could lead to different error handling in user code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user_age
isn't set, isn't that an unresolved variable error?
If no foo
message is available, that would be an internal error of the format function message
, right? Does that mean "resolution error" is really "function internal error"?
In any case, I think I would move this item below some of the others here (perhaps to the bottom of the list), since I find myself thinking that this could also mean unresolved or formatting error when in fact this error would probably only occur later (only when the pattern is syntactically correct and all of the variables and functions have been resolved but there is still a problem).
Or am I still not understanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both foo
and user_age
are literal values here, so the errors are coming from the :message
and :global
custom functions; from the PoV of the core implementation, the are no variables here to resolve.
The intent here is to allow for a custom formatting function to emit two different kinds of errors: Resolution and Formatting. This is meant to enable something like :message
or :global
to work from an end-user PoV as much as possible like core features such as $var
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I don't think what I am saying here affects the spec, but the answer above.
I think it is contradicting:
- Resolution ... failure in getting the value that is to be formatted
- Formatting ... includes a variable reference that can't be resolved ...
It is unclear what is the difference between the two: "get the value" and "resolve variable reference"
The "publicly visible" operation is probably "I have a variable named foo
, I want to get the value"
There is no "reference" except deep in the implementation.
But that implementation detail should not "leak" in the kind of error I am getting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that the syntax example for the above discussion is this placeholder:
{(foo) :message}
The foo
here is not a variable reference, it's a literal value that a custom :message
function is interpreting as a message identifier.
spec/formatting.md
Outdated
An informative error or errors must also be separately provided. | ||
|
||
When an error occurs in the syntax or resolution of an Expression or MarkupStart Option, | ||
the Expression or MarkupStart in question is processed as if the option were not defined. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not mention MarkupStart, as there is no agreement on markup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you objecting here out of principle, or do you specifically think that failures in Expression and Markup options should be handled differently?
This comment was marked as resolved.
This comment was marked as resolved.
I see some English grammatical issues with line 221 :-)
Yes and we should formally define these (and other) error types. The set of errors should be as small as possible (but no smaller).
The question here is whether the fallback is (as currently defined) an error at parse time or only triggers when none of the options are matched. |
Updated to add data model errors as a category, along with a note:
Also, references to message identifiers are dropped and syntax errors in options get no special treatment. |
This comment was marked as resolved.
This comment was marked as resolved.
|
||
During the formatting of a message, | ||
various errors may be encountered. | ||
These are divided into the following categories: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick / editorial:
How do you feel about having a small list with titles only, almost like a TOC
Followed by the details, with examples.
It would help with the "big picture", vs something spread over many paragraphs.
These are divided into the following categories:
* Syntax errors
* Data Model errors
* Variant Key Mismatch errors
* Missing Fallback Variant errors
* Resolution errors
* Unresolved Variable errors
* Unknown Function errors
* Selection errors
* Selector errors
* Formatting errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's how this started out, before the examples got added...
Let's revisit this when we've more of the spec gathered up? The solution for this section is likely to look very similar to other sections as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left quite a few comments, more than I hoped.
But none of the are blockers for the merge.
Many (most?) are nitpicks, which (in my own convention) means:
- If you like the idea, do it
- If you don't like it, feel free to ignore it, and there is no need to defend that decision
- Implementing it or not does not affect my approval
I hope GitHub does not block this because of "pending comments"
If it does, resolve. If that invalidates the approval, I will approve it again.
(I still find the GitHub review pretty clunky)
- **Data Model errors** occur when a message is invalid due to | ||
violating one of the following semantic requirements on its structure: | ||
|
||
- **Variant Key Mismatch errors** occur when the number of keys on a Variant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: VariantKey
Part of the editorial cleanup I'm proposing, that would add back-ticks around keywords, and would make sure this is consistent with the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer leaving this as "Variant Key" for now, as it's rather likely for this to leak out as a user-visible string in the error.
when * {The value is not one.} | ||
``` | ||
|
||
- **Selection errors** occur when message selection fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one bullet (Selection errors) with only one sub-bullet (Selector errors)
Is there a difference? Are there any other kind of "Selection errors" other then "Selector errors"
Otherwise feels redundant, like having:
* Syntax errors
* Invalid syntax errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave the selection + selector error categories as is for now, but adjust them later after #322 is resolved.
{Hello, {$id :get field=name}!} | ||
``` | ||
|
||
Syntax and Data Model errors must be emitted as soon as possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal: add a heading above it?
Probably ### level.
Because we go from the list of error types (what we produce), with examples for each, to a section explaining how / when to produce said errors.
It probably made sense when the list errors was a real list, visible at a glance.
But now it gets lost after the more detailed form.
If you think it helps also add a ### level heading before error types.
Proposal:
## Error Handling
### Error types
...
### How and when to produce errors (or a better title if you have one)
...
followed by the value of the Literal, | ||
and then by U+0029 RIGHT PARENTHESIS `)` | ||
|
||
Examples: `{(horse)}`, `{(42)}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be an error? Ever?
It is a string literal, to use "as is"
I agree that it can make sense if a function is specified: {(42) :number}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The examples are showing examples of the fallback string representations, so .e.g {(42)}
would be the one used if {(42) :number}
failed.
|
||
Example: `{-tag}` | ||
|
||
- Otherwise: The U+FFFD REPLACEMENT CHARACTER `�` character |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposal: change this to "the string between {
and }
, as is.
I would assume we get here is none of the cases before was encountered.
So probably a syntax error (bad sigil, for example, or bad format for parameters):
"Encountered {?count} penguins"
is more useful than "Encountered {�} penguins"
Or
"Encountered {$count :number invalid options ?? format} penguins"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is catching anything not covered by the above. I think that's currently an empty set, as syntax errors use {�}
or {fallback string}
for the whole message. So this should never happen, but let's still include it to make sure that we don't end up with undefined behaviour.
I would strongly prefer not including any explicit reference to syntax source here, as that would require keeping a reference to it in all implementations, and might not apply at all if the source is not an MF2 syntax message.
``` | ||
|
||
``` | ||
let $var = {(horse)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I don't think the second example should be valid
let $var = {(horse)}
{The value is {$var :func}.}
Because the whole point of local variables would be reuse, mostly for consistency.
If change the function in the message body, the whole consistency goes out the windows.
Yes, not difficult to implement (just a bit worse performance).
But kind of no good use case, with potential of misuse.
So my suggestion would be to remove the second example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is meant as a minimal example showing that the identifiers of local variables should never end up being included in the formatted output. It's not really meant to be a useful message, but it's technically valid.
Thank you very much for all the work and the patience! |
This document covers a lot of the types of errors that could occur, which is great. It is very much written from the prospective of an "untyped" programming language environment. In many environments (such as templating systems), the message can be analyzed more completely, and many more errors can be "compile-time" errors or "lint" errors instead of "runtime" errors. So it would be good to note that. For example, the syntax errors and variant key mismatch errors (which are really syntax errors) can easily be detected in such circumstances. When the system has access to the specified Selectors and Functions that are invoked in the course of processing the message, and knows the types of the arguments supplied, then that can also be checked. If a system has access to the plural selector, then the following is detectable at compile time. Same for this one, if there is access to the get formatter. {Hello, {(horse) :get field=name}!} Upshot:
|
Quick note
as if the option were not defined.
The "were" is correct (subjunctive in a counterfactual)
…On Fri, Jan 27, 2023, 17:36 Mihai Nita ***@***.***> wrote:
Thank you very much for all the work and the patience!
The split between syntax / model / runtime (select format) feels clean,
LGTM.
—
Reply to this email directly, view it on GitHub
<#320 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMAWMDWBCOAZ7X6NFBLWURZYHANCNFSM6AAAAAATDTAQ6A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I've now added a first paragraph noting this:
I've added a fourth example under Formatting errors which I believe demonstrates such an error. I'm not claiming that what it presents is good practice, or that it couldn't be avoided by e.g. a stricter typing of the value of @stasm Do you think we could/should include in the function registry some (optional) gating on when/whether a function could fail? It might be useful to be able to determine the complete set of conditions that's required and/or sufficient for formatting a message without any failure. This would be beyond the scope of this PR, but might allow for earlier/easier detection of errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My remaining comments are editorial.
``` | ||
|
||
``` | ||
let $var = {(no message body)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you're saying that the empty string is not a valid pattern and results in an error? That is, the minimum a pattern string can ever be is {}
? That makes me uncomfortable....
match {$one} | ||
when 1 {Value is one} | ||
when 2 {Value is two} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps add text to say:
Missing when * case:
(and on line 82 add "Missing when * *
case:")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the necessity for these specifiers. Are these specific examples more difficult to respond about than the others, or should we specify explicitly what's wrong in each of them?
Regarding the examples in general, I at least prefer requiring the reader to at least momentarily pause on some of them to figure out for themselves what's going on, rather than making sure that no thought is required.
|
||
``` | ||
match {$one} | ||
when 1 2 {Too many} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having too few or too many keys (compared to the match expression) is clearly a syntax error, and should be in that category. It's like calling a function in C or Java with too few or too many parameters; not a semantic or data model error, but rather one that can be detected purely because of malformed syntax.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent in differentiating this as a "data model" rather than "syntax" error is that it allows us to apply the same error check independently of the syntax used by the message. Here's what I said about this earlier in #320 (comment):
Let's say that there's a FooFormat9000 that's exactly like MF1, except that it doesn't require an "other" case for its selectors and defaults to an empty string instead. Now, if a FooFormat9000 message is parsed into an MF2 data model for use with an MF2 runtime, the code doing so might not add the expected default variant to the data model. When the MF2 runtime discovers this, it needs to emit a [Data Model] Error rather than a Syntax Error, because it doesn't know anything about FooFormat9000 where this syntax is valid.
An MF2 syntax parser would be expected to emit the data model errors during its work, as this is included in the spec (line 176):
Syntax and Data Model errors must be emitted as soon as possible.
This separation also matches the differentiation between "well-formed" and "valid" messages in the syntax spec.
By "fourth example under Formatting errors" do you mean a 4th example under
iii ? I was looking for an iv but didn't find it.
…On Mon, Jan 30, 2023 at 1:42 AM Eemeli Aro ***@***.***> wrote:
@macchiati <https://github.com/macchiati>:
[...] In many environments (such as templating systems), the message can
be analyzed more completely, and many more errors can be "compile-time"
errors or "lint" errors instead of "runtime" errors. So it would be good to
note that.
I've now added a first paragraph noting this:
Errors in messages and their formatting may occur and be detected at
multiple different stages of their processing. Where available, the use of
validation tools is recommended, as early detection of errors makes their
correction easier.
@macchiati <https://github.com/macchiati>:
1. It would be useful to have some examples of errors that can't be
detected by a system that has access to the types of the parameters and to
the definitions of the Selection and Format functions that would be in play
at runtime. (I don't see any currently in the document).
I've added a fourth example under Formatting errors which I believe
demonstrates such an error. I'm not claiming that what it presents is good
practice, or that it couldn't be avoided by e.g. a stricter typing of the
value of $field, but it's at least technically valid MF2.
@stasm <https://github.com/stasm> Do you think we could/should include in
the function registry some (optional) gating on when/whether a function
could fail? It might be useful to be able to determine the complete set of
conditions that's required and/or sufficient for formatting a message
without any failure. This would be beyond the scope of this PR, but might
allow for earlier/easier detection of errors.
—
Reply to this email directly, view it on GitHub
<#320 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMDLV2IOYFGIRNHV7GLWU6EHNANCNFSM6AAAAAATDTAQ6A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This isn't quite what I meant, which is something more like: Where (a) the Selector and Format definitions are accessible, and (b) the number and types of the input parameters are known, proper use of validation tooling can ensure that there will be no runtime errors emitted. (I don't see your "fourth example under Formatting errors" yet) |
The added example is on line 173:
The error here relies on line 154, also added in the same commit:
I don't think we can or should guarantee "no runtime errors", because we can't control what sorts of custom functions provided by third parties may be used. For example, there could be an |
As discussed at the last meeting, let's merge this now that it has general approval and (as far as I can tell) blocker issues have been resolved. We can (and should!) continue to improve this section and the whole spec, but let's do so within the scope of further issues and PRs. |
@eemeli Thank you for merging this: I had planned to ask you to do it today, but you beat me to it 🎉 |
Let me expand on this a bit. Of course, when you reference a Selector or Formatter that throws an error, you can't do anything about that. And that can certainly be the case for custom Selectors and Functions. The following is still pretty rough, but it is what I'm trying to get at. A Selector is well-behaved when all of the following conditions are met:
A Function is well-behaved when all of the following conditions are met:
Note: implementations of all standard registered Selectors and Functions must be well-behaved. A message is well-behaved all of the following conditions are met:
Inputs to a message are complete when there is a value for each one of them that is used by a Selector or Function in the message, and that value is in the domain of those Selector or Function. Well-behaved messages will not produce errors when their inputs are complete. This does not, of course, include extraordinary conditions such as out-of-memory. Back to your example, an |
I believe we should leave syntactic and semantic space for open bound selectors. An example of such selector:
|
@macchiati It sounds like we should include an attribute of some sort in the function registry for declaring whether a function is "well-behaved" or not, so that this could be used as input for e.g. validators. Would you be ok with filing this as a separate issue, so that we don't lose track of this conversation at the tail end of a closed PR? |
I don't quite see the problem. The bar is pretty low for being
well-behaved.
If $cityName has the domain "any string", and doesn't throw exceptions if
it hits an arbitrary string "$!$#@", and returns true for at least one
string in that domain "New York", then it is well-behaved.
Moreover, having a class of well-behaved selectors and functions puts no
restriction on use of non-well-behaved selectors in messages. It just means
we have a vocabulary for singling out selectors, functions, and messages
that have particularly important characteristics for particular
implementations.
…On Wed, Feb 1, 2023 at 1:32 PM Zibi Braniecki ***@***.***> wrote:
It declares a domain of variant keys, (eg "high", "medium", "low"; or
integers).
I believe we should leave syntactic and semantic space for open bound
selectors.
An example of such selector:
match {$cityName}
when "New York" {Nowy Jork}
when "Paris" {Paryż}
when * {$cityName}
—
Reply to this email directly, view it on GitHub
<#320 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMC265ZNGWBEN6XO743WVLJABANCNFSM6AAAAAATDTAQ6A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Will do, thanks.
…On Wed, Feb 1, 2023 at 11:07 PM Eemeli Aro ***@***.***> wrote:
@macchiati <https://github.com/macchiati> It sounds like we should
include an attribute of some sort in the function registry for declaring
whether a function is "well-behaved" or not, so that this could be used as
input for e.g. validators. Would you be ok with filing this as a separate
issue, so that we don't lose track of this conversation at the tail end of
a closed PR?
—
Reply to this email directly, view it on GitHub
<#320 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMGPCAPMRE3G25GGHOTWVNMJ5ANCNFSM6AAAAAATDTAQ6A>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I said I would do this a while back; just added the issue as #377 |
Closes #45
As discussed in the issue and WG calls, it's important for the spec to provide a solid description of its error handling and ensure that the potential of message formatting causing fatal errors is minimised. It's also important for us to strike an appropriate balance with respect to implementation requirements and complexity, and not through this make it significantly more difficult or impossible to put together a compliant implementation.
This error handling spec does not enforce the shape of the errors or the manner of their emission, as long as it does not preclude that some representation of the formatted message is always provided. The text here is rather explicit about the fallback shape of errors when formatting to a string target; that should guide but not restrict the behaviour of implementations that provide other targets, for which different fallback representations may be more appropriate.