Skip to content

Latest commit

 

History

History

data-model

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Interchange Data Model

This section defines a data model representation of Unicode MessageFormat messages.

Implementations are not required to use this data model for their internal representation of messages. Neither are they required to provide an interface that accepts or produces representations of this data model.

The major reason this specification provides a data model is to allow interchange of the logical representation of a message between different implementations. This includes mapping legacy formatting syntaxes (such as ICU MessageFormat) to a Unicode MessageFormat implementation. Another use would be in converting to or from translation formats without the need to continually parse and serialize all or part of a message.

Implementations that expose APIs supporting the production, consumption, or transformation of a message as a data structure are encouraged to use this data model.

This data model provides these capabilities:

  • any Unicode MessageFormat message can be parsed into this representation
  • this data model representation can be serialized as a well-formed Unicode MessageFormat message
  • parsing a Unicode MessageFormat message into a data model representation and then serializing it results in an equivalently functional message

This data model might also be used to:

  • parse non Unicode MessageFormat messages into a data model (and therefore re-serialize it as Unicode MessageFormat). Note that this depends on compatibility between the two syntaxes.
  • re-serialize a Unicode MessageFormat message into some other format including (but not limited to) other formatting syntaxes or translation formats.

To ensure compatibility across all platforms, this interchange data model is defined here using TypeScript notation. An equivalent JSON Schema definition message.json is also provided, for use with message data encoded as JSON or compatible formats, such as YAML.

Note that while the data model description below is the canonical one, the JSON Schema definition is intended for interchange between systems and processors. To that end, it relaxes some aspects of the data model, such as allowing declarations, options, and attributes to be optional rather than required properties.

Important

The data model uses the field name name to denote various interface identifiers. In the Unicode MessageFormat syntax, the source for these name fields sometimes uses the production identifier. This happens when the named item, such as a function, supports namespacing.

Message Model

A SelectMessage corresponds to a syntax message that includes selectors. A message without selectors and with a single pattern is represented by a PatternMessage.

In the syntax, a PatternMessage may be represented either as a simple message or as a complex message, depending on whether it has declarations and if its pattern is allowed in a simple message.

type Message = PatternMessage | SelectMessage;

interface PatternMessage {
  type: "message";
  declarations: Declaration[];
  pattern: Pattern;
}

interface SelectMessage {
  type: "select";
  declarations: Declaration[];
  selectors: VariableRef[];
  variants: Variant[];
}

Each message declaration is represented by a Declaration, which connects the name of a variable with its expression value. The name does not include the initial $ of the variable.

The name of an InputDeclaration MUST be the same as the name in the VariableRef of its VariableExpression value.

type Declaration = InputDeclaration | LocalDeclaration;

interface InputDeclaration {
  type: "input";
  name: string;
  value: VariableExpression;
}

interface LocalDeclaration {
  type: "local";
  name: string;
  value: Expression;
}

In a SelectMessage, the keys and value of each variant are represented as an array of Variant. For the CatchallKey, a string value may be provided to retain an identifier. This is always '*' in the Unicode MessageFormat syntax, but may vary in other formats.

interface Variant {
  keys: Array<Literal | CatchallKey>;
  value: Pattern;
}

interface CatchallKey {
  type: "*";
  value?: string;
}

Pattern Model

Each Pattern contains a linear sequence of text and placeholders corresponding to potential output of a message.

Each element of the Pattern MUST either be a non-empty string, an Expression, or a Markup object. String values represent literal text. String values include all processing of the underlying text values, including escape sequence processing. Expression wraps each of the potential expression shapes. Markup wraps each of the potential markup shapes.

Implementations MUST NOT rely on the set of Expression and Markup interfaces defined in this document being exhaustive. Future versions of this specification might define additional expressions or markup.

type Pattern = Array<string | Expression | Markup>;

type Expression =
  | LiteralExpression
  | VariableExpression
  | FunctionExpression;

interface LiteralExpression {
  type: "expression";
  arg: Literal;
  function?: FunctionRef;
  attributes: Attributes;
}

interface VariableExpression {
  type: "expression";
  arg: VariableRef;
  function?: FunctionRef;
  attributes: Attributes;
}

interface FunctionExpression {
  type: "expression";
  arg?: never;
  function: FunctionRef;
  attributes: Attributes;
}

Expression Model

The Literal and VariableRef correspond to the the literal and variable syntax rules. When they are used as the body of an Expression, they represent expression values with no function.

Literal represents all literal values, both quoted literal and unquoted literal. The presence or absence of quotes is not preserved by the data model. The value of Literal is the "cooked" value (i.e. escape sequences are processed).

In a VariableRef, the name does not include the initial $ of the variable.

interface Literal {
  type: "literal";
  value: string;
}

interface VariableRef {
  type: "variable";
  name: string;
}

A FunctionRef represents a function. The name does not include the : starting sigil.

Options is a key-value mapping containing options, and is used to represent the function and markup options.

interface FunctionRef {
  type: "function";
  name: string;
  options: Options;
}

type Options = Map<string, Literal | VariableRef>;

Markup Model

A Markup object has a kind of either "open", "standalone", or "close", each corresponding to open, standalone, and close markup. The name in these does not include the starting sigils # and / or the ending sigil /. The options for markup use the same key-value mapping as FunctionRef.

interface Markup {
  type: "markup";
  kind: "open" | "standalone" | "close";
  name: string;
  options: Options;
  attributes: Attributes;
}

Attribute Model

Attributes is a key-value mapping used to represent the expression and markup attributes.

Attributes with no value are represented by true here.

type Attributes = Map<string, Literal | true>;

Model Extensions

Implementations MAY extend this data model with additional interfaces, as well as adding new fields to existing interfaces. When encountering an unfamiliar field, an implementation MUST ignore it. For example, an implementation could include a span field on all interfaces encoding the corresponding start and end positions in its source syntax.

In general, implementations MUST NOT extend the sets of values for any defined field or type when representing a valid message. However, when using this data model to represent an invalid message, an implementation MAY do so. This is intended to allow for the representation of "junk" or invalid content within messages.