-
-
Notifications
You must be signed in to change notification settings - Fork 36
Draft of the registry specification #368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
7423e0d
8f64eff
831a9cd
9ec9f59
39f13a9
5f47dd6
5b42dcf
5fa7c19
e93aa91
3b50ec5
2a23615
8374906
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
<!ELEMENT registry (function*|pattern*)> | ||
|
||
<!ELEMENT function (description|(formatSignature|matchSignature)+)> | ||
<!ATTLIST function name NMTOKEN #REQUIRED> | ||
|
||
<!ELEMENT description (#PCDATA)> | ||
|
||
<!ELEMENT pattern EMPTY> | ||
<!ATTLIST pattern id ID #REQUIRED> | ||
<!ATTLIST pattern regex CDATA #REQUIRED> | ||
|
||
<!ELEMENT formatSignature (input?|option*)> | ||
<!ATTLIST formatSignature position (open|close|standalone) "standalone"> | ||
<!ATTLIST formatSignature locales NMTOKENS #IMPLIED> | ||
|
||
<!ELEMENT matchSignature (input?|option*|match*)> | ||
<!ATTLIST matchSignature locales NMTOKENS #IMPLIED> | ||
|
||
<!ELEMENT input EMPTY> | ||
<!ATTLIST input values NMTOKENS #IMPLIED> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we end up allowing for some There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think #364 is related, actually. My intent here was to allow specifying either an enumeration of nmtokens or a regex to validate arguments that are MF2 literals. This seems orthogonal to whether these literals are quoted or not? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I meant that if #364 lands, then nearly all There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we can enforce such restriction in the DTD alone. LDML does it by extending DTD with annotations: https://unicode.org/reports/tr35/#57-dtd-annotations. |
||
<!ATTLIST input pattern NMTOKEN #IMPLIED> | ||
<!ATTLIST input readonly (true|false) "false"> | ||
|
||
<!ELEMENT option EMPTY> | ||
<!ATTLIST option name NMTOKEN #REQUIRED> | ||
<!ATTLIST option values NMTOKENS #IMPLIED> | ||
<!ATTLIST option default NMTOKEN #IMPLIED> | ||
<!ATTLIST option pattern IDREF #IMPLIED> | ||
<!ATTLIST option required (true|false) "false"> | ||
<!ATTLIST option readonly (true|false) "false"> | ||
|
||
<!ELEMENT match EMPTY> | ||
<!ATTLIST match values NMTOKENS #IMPLIED> | ||
<!ATTLIST match pattern NMTOKEN #IMPLIED> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
# WIP DRAFT MessageFormat 2.0 Registry | ||
|
||
_This document is non-normative._ | ||
|
||
The implementations and tooling can greatly benefit from a structured definition of formatting and matching functions available to messages at runtime. | ||
stasm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
The _registry_ is a mechanism for storing such declarations in a portable manner. | ||
|
||
## Goals | ||
|
||
The registry provides a machine-readable description of MessageFormat extensions (custom functions), | ||
in order to support the following goals and use-cases: | ||
|
||
* Validate semantic properties of messages. For example: | ||
* Type-check values passed into functions. | ||
* Validate that matching functions are only called in selectors. | ||
* Validate that formatting functions are only called in placeholders. | ||
* Verify the exhaustiveness of variant keys given a selector. | ||
* Support the localization roundtrip. For example: | ||
* Generate variant keys for a given locale during XLIFF extraction. | ||
* Improve the authoring experience. For example: | ||
* Forbid edits to certain function options (e.g. currency options). | ||
* Autocomplete function and option names. | ||
* Display on-hover tooltips for function signatures with documentation. | ||
* Display/edit known message metadata. | ||
* Restrict input in GUI by providing a dropdown with all viable option values. | ||
|
||
## Data Model | ||
|
||
The registry contains descriptions of function signatures. | ||
[`registry.dtd`](./registry.dtd) describes its data model. | ||
|
||
The main building block of the registry is the `<function>` element. | ||
It represents an implementation of a custom function available to translation at runtime. | ||
A function defines a human-readable _description_ of its behavior | ||
and one or more machine-readable _signatures_ of how to call it. | ||
Named `<pattern>` elements can optionally define regex validation rules for literals, option values, and variant keys. | ||
|
||
MessageFormat functions can be invoked in two contexts: | ||
* inside placeholders, to produce a part of the message's formatted output; | ||
for example, a raw value of `|1.5|` may be formatted to `1,5` in a language which uses commas as decimal separators, | ||
* inside selectors, to contribute to selecting the appropriate variant among all given variants. | ||
|
||
A single _function name_ may be used in both contexts, | ||
regardless of whether it's implemented as one or multiple functions. | ||
|
||
A _signature_ defines one particular set of at most one argument and any number of named options that can be used together in a single call to the function. | ||
`<formatSignature>` corresponds to a function call inside a placeholder inside translatable text. | ||
`<matchSignature>` corresponds to a function call inside a selector. | ||
Signatures with a non-empty `locales` attribute are locale-specific and only available in translations in the given languages. | ||
|
||
A signature may define the positional argument of the function with the `<input>` element. | ||
stasm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
If the `<input>` element is not present, the function is defined as a nullary function. | ||
A signature may also define one or more `<option>` elements representing _named options_ to the function. | ||
An option can be omitted in a call to the function, | ||
unless the `required` attribute is present. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it be an error for an option to have both a |
||
They accept either a finite enumeration of values (the `values` attribute) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since a finite enumeration can be expressed as a regex, I'm wondering if it would be simpler to only allow a regex? I can imagine a tool providing finite enumerations as syntactic sugar. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, interesting. I'm not opposed to removing
Do you feel strongly about this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Generating autocompletion values for an editor is easy from an explicit list in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
or validate their input with a regular expression (the `pattern` attribute). | ||
Read-only options (the `readonly` attribute) can be displayed to translators in CAT tools, but may not be edited. | ||
|
||
Matching-function signatures additionally include one or more `<match>` elements | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it "one or more" or "zero or more"? The DTD implies that there can be zero |
||
to define the keys against which they can match when used as selectors. | ||
stasm marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Example | ||
|
||
The following `registry.xml` is an example of a registry file | ||
which may be provided by an implementation to describe its built-in functions. | ||
For the sake of brevity, only `locales="en"` is considered. | ||
|
||
```xml | ||
<?xml version="1.0" encoding="UTF-8" ?> | ||
<!DOCTYPE registry SYSTEM "./registry.dtd"> | ||
|
||
<registry> | ||
<function name="platform"> | ||
<description>Match the current OS.</description> | ||
<matchSignature> | ||
<match values="windows linux macos android ios"/> | ||
</matchSignature> | ||
</function> | ||
|
||
<pattern id="anyNumber" regex="-?[0-9]+(\.[0-9]+)"/> | ||
<pattern id="positiveInteger" regex="[0-9]+"/> | ||
<pattern id="currencyCode" regex="[A-Z]{3}"/> | ||
|
||
<function name="number"> | ||
<description> | ||
Format a number. | ||
Match a numerical value against CLDR plural categories or against a number literal. | ||
</description> | ||
|
||
<matchSignature locales="en"> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm very curious about the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should have some way to define at least locale-dependent option and match values. For example, the registry should allow for a way to note that while the whole set of CLDR categories is One alternative could be for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For completeness of the example, my plan was to allow the overrides directly on the level of the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not a fan of this. I think that a plural selector should recognize all of the keywords and not depend on the registry to "allow" or "disallow" the enumerated values for a given locale (note that I think it is reasonable that (for example) the In the case of plurals, CLDR provides the data about which keywords apply to which locale. In the case of some other formatter or selector that is not supplied by CLDR, the implementation should know what applies to each locale. If we want the registry to describe that relationship (for example to support exploding the matrix of keys in localization tools), I think I would prefer that it be separate from the signature, e.g.: <function name="customPluralLikeSelector">
<matchSignature>
<input pattern="anyNumber"/>
<option name="type" value="foo bar"/>
<match value="zero one two few many other"/>
<match pattern="anyNumber"/>
</matchSignature>
<validate comment="the naming is terrible and we wouldn't structure it exactly like this">
<match type="values">
<value lang="">one other</value>
<value lang="pl">one few many other</value>
<value lang="ja">other</value>
<!-- etc. --->
</match>
</validate> There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I think I like it! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Filed #410 to discuss this further. |
||
<input pattern="anyNumber"/> | ||
<option name="type" values="cardinal ordinal"/> | ||
<option name="minimumIntegerDigits" pattern="positiveInteger"/> | ||
<option name="minimumFractionDigits" pattern="positiveInteger"/> | ||
<option name="maximumFractionDigits" pattern="positiveInteger"/> | ||
<option name="minimumSignificantDigits" pattern="positiveInteger"/> | ||
<option name="maximumSignificantDigits" pattern="positiveInteger"/> | ||
<match values="one other"/> | ||
<match pattern="anyNumber"/> | ||
</matchSignature> | ||
|
||
<formatSignature locales="en"> | ||
<input pattern="anyNumber"/> | ||
<option name="minimumIntegerDigits" pattern="positiveInteger"/> | ||
<option name="minimumFractionDigits" pattern="positiveInteger"/> | ||
<option name="maximumFractionDigits" pattern="positiveInteger"/> | ||
<option name="minimumSignificantDigits" pattern="positiveInteger"/> | ||
<option name="maximumSignificantDigits" pattern="positiveInteger"/> | ||
<option name="style" readonly="true" values="decimal currency percent unit" default="decimal"/> | ||
<option name="currency" readonly="true" pattern="currencyCode"/> | ||
</formatSignature> | ||
</function> | ||
</registry> | ||
``` | ||
|
||
Given the above description, the `:number` function is defined to work both in a selector and a placeholder: | ||
|
||
match {$count :number} | ||
when 1 {One new message} | ||
when other {{$count :number} new messages} | ||
|
||
Furthermore, | ||
`:number`'s `<matchSignature>` contains two `<match>` elements | ||
which allow to validate the variant keys. | ||
If at least one `<match>` validation rules passes, | ||
a variant key is considered valid. | ||
|
||
* `<match pattern="anyNumber"/>` can be used to valide the `when 1` variant | ||
by testing the `1` key against the `anyNumber` regular expression defined in the registry file. | ||
* `<match values="one other"/>` can be used to valide the `when other` variant | ||
by verifying that the `other` key is present in the list of enumarated values: `one other`. | ||
|
||
---- | ||
|
||
A localization engineer can then extend the registry by defining the following `customRegistry.xml` file. | ||
|
||
```xml | ||
<?xml version="1.0" encoding="UTF-8" ?> | ||
<!DOCTYPE registry SYSTEM "./registry.dtd"> | ||
|
||
<registry> | ||
<function name="noun"> | ||
<description>Handle the grammar of a noun.</description> | ||
<formatSignature locales="en"> | ||
<input/> | ||
<option name="article" values="definite indefinite"/> | ||
<option name="plural" values="one other"/> | ||
<option name="case" values="nominative genitive" default="nominative"/> | ||
</formatSignature> | ||
</function> | ||
|
||
<function name="adjective"> | ||
<description>Handle the grammar of an adjective.</description> | ||
<formatSignature locales="en"> | ||
<input/> | ||
<option name="article" values="definite indefinite"/> | ||
<option name="plural" values="one other"/> | ||
<option name="case" values="nominative genitive" default="nominative"/> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a little confused about why there's a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. English doesn't make it easy to build meaningful examples of grammatical features :) I'll try to come up with something better. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I know :) I was just thinking the example could be from a different language, maybe. |
||
</formatSignature> | ||
<formatSignature locales="en"> | ||
<input/> | ||
<option name="article" values="definite indefinite"/> | ||
<option name="accord"/> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, this is still WIP and underspec'ed. I'd like to continue the discussion about validating runtime values outside of this PR. |
||
</formatSignature> | ||
</function> | ||
</registry> | ||
``` | ||
|
||
Messages can now use the `:noun` and the `:adjective` functions. | ||
The following message references the first signature of `:adjective`, | ||
which expects the `plural` and `case` options: | ||
|
||
{You see {$color :adjective article=indefinite plural=one case=nominative} {$object :noun case=nominative}!} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if you write |
||
|
||
The following message references the second signature of `:adjective`, | ||
which only expects the `accord` option: | ||
|
||
let $obj = {$object :noun case=nominative} | ||
{You see {$color :adjective article=indefinite accord=$obj} {$obj}!} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the syntax, function names are restricted to
name
rather thannmtoken
, so not allNMTOKEN
values can be valid here. Also, ensuring that function definitions map 1:1 to identifiers seems pretty reasonable?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
ID
type is also subject to a validity constraint about there only being one element of the given name in the XML document: https://www.w3.org/TR/xml/#id.I wasn't sure if we wanted to be this strict. Perhaps it's OK to have functions named the same as regex patterns? Or to have two function definitions with the same name? That's why I went for
name
andNMTOKEN
.