-
-
Notifications
You must be signed in to change notification settings - Fork 36
Add interchange data model description + JSON Schema definition #393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In the case of a central message provider dispatching messages to N-type-clients, perhaps in some systems:
...depending on data availability in either context. Being able to distribute the parsing could be beneficial (as clearly evidenced by this PR)! |
That's a good point; the formatting spec should really mention that it's getting a parsed message structure from somewhere to start with. We should not require parsing syntax into the data model specified here before formatting, but the synonymity between the data model and the syntax should allow for messages from either source (syntax or data model) to be equivalently formatted.
That is a possibility, but tbh it's somewhat unlikely. For I don't think there is sufficient value in harmonising the output format of MF2 to overcome the costs that would be incurred by implementations needing to fit its mold.
Parsing something like MessageResource should be comparable to parsing JSON, and formatting a message should be a rather fast operation, so I rather hope that we'll be able to not need such optimizations. But yes, there is value in portable representations of messages. I see the following as probably the greatest benefits provided by this data model:
|
spec/data-model.md
Outdated
While this document uses TypeScript syntax for their definition, | ||
the canonical and authoritative source is the `message.json` | ||
[JSON Schema](https://json-schema.org/) definition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little worried about maintenance of this doc (at least while the proposal continues to change) and keeping the TypeScript and JSON Schema definitions consistent with each other. Though I can see why you want to have the TypeScript interfaces for exposition and the JSON Schema definition for reference. Maybe that's a non-concern given that you say that the JSON Schema one is authoritative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we'll be changing the structure much any more, and I'd say that the experience with the ABNF would indicate that the need to update two different places when applying changes is not too onerous. And yes, the explicit reference to the JSON Schema as authoritative is intended to disambiguate any divergences, should such form.
type: 'select' | ||
declarations: Declaration[] | ||
selectors: Expression[] | ||
variants: Variant[] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the ICU4J Mf2DataModel
class has this method to get the message's variants:
public OrderedMap<SelectorKeys, Pattern> getVariants();
This is less convenient to implement in ICU4C than it is to return a list of Variant
s, but I'm trying to do it anyway for the sake of parity with ICU4J.
I like what you have here (Variant[]
) better because it's easier to implement in C++, where ICU4C doesn't enjoy the benefits of Java's polymorphic OrderedMap
class. But also because while the list and map representations are isomorphic, I think it's more appealing to have an API that returns a list and let users build their own on top of it that does some kind of optimization for more efficient pattern-matching than it is to return a map when maybe efficient lookup isn't always necessary.
Whether it ends up being a list or a map, mostly I just wanted to highlight that the ICU4J and ICU4C implementations should match what's defined here.
spec/message.json
Outdated
"properties": { | ||
"type": { "const": "message" }, | ||
"declarations": { "$ref": "#/$defs/declarations" }, | ||
"pattern": { "$ref": "#/$defs/pattern" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like it might be simpler to have a single message
, which has a body
attribute that's one of pattern
or select
, and then select
would just have the selectors
and variants
fields? (In other words, declarations
would be factored out.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also the approach @stasm has suggested, i.e. expressing single-pattern messages as having an empty list for selectors, and a single entry in variants with a corrspondingly empty list of keys.
As I mentioned on the call, I'd much prefer landing something first and then iterating on it, rather than sorting out the details all in one go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed #437 to continue discussing this.
Co-authored-by: Tim Chevalier <tjc@igalia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I agree with your intention to land something and iterate it on it more later, so in that spirit, I think the only one of my suggestions that needs to be addressed is the clarification about what the fields of a Reserved
are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As agreed in the yesterday's meeting, we'll land this without the JSOM schema for now.
|
||
interface Expression { | ||
type: 'expression' | ||
body: Literal | VariableRef | FunctionRef | Reserved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A FunctionRef
can also have a Literal
or a VariableRef
as an argument, and in fact, due to how our syntax is designed, I'd argue that the function's argument is more important than the function. (E.g. it comes first in the syntax.)
I'd like to suggest an alternative way to structure our expressions, to more closely map to our syntax:
expression = "{" [s] ((operand [s annotation]) / annotation) [s] "}"
Let's special-case argument-less functions rather than function-less operands.
Instead of Literal | VariableRef | FunctionRef
here and operand?: Literal | VariableRef
inside FunctionRef
, we can do:
type Expression = OperandExpr | FunctionExpr;
interface OperandExpr {
operand: Literal | VariableRef;
annotation?: FunctionExpr;
}
interface FunctionExpr {
name: string;
options: Map<string, Literal | VariableRef>;
}
FWIW, this is how I implemented expressions in stasm/message2:
https://github.com/stasm/message2/blob/4abf43f2023b6e20d8ee1d462684d0741ece791b/syntax/ast.ts#L44-L70
(Not blocking this PR on this, but I'd like to discuss this change as a follow-up.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will be happy to discuss this further in a follow-on issue or PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #436 to continue this.
@aphillips In the interests of getting this landed, I went ahead and applied the language change myself, dropping references to the JSON schema. As this already has a ✅ from @stasm, I believe that we can merge this without waiting for the next meeting if you yourself are ok with the text. |
To make sure that it's usable for future work after this lands, the JSON schema is still available here: |
This proposes a way for us to fulfill our deliverable of:
This is not a data model that implementations MUST use, but one which they MAY support. It's intended to work as a formalization of the meaning of our syntax as well as an interchange format for messages in other syntaxes, such as MF1. Effectively, it's synonymous with our syntax, but expressed as a parsed JSON structure.
It is intended to be a format capable of representing messages in all current syntaxes. As in, you should be able to parse anything into this structure with no data loss, and then use a compatible MF2 library with your messages. Full roundtripping of formats that support inline selectors like MF1 and Fluent might not retain their original structure, but will be semantically equivalent.
The format is included inline in the markdown doc using TypeScript interfaces, but its canonical form is provided as a JSON Schema definition.