From 7423e0d3396df0c6cc43aa3e94da2d1a060b75bc Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stanis=C5=82aw=20Ma=C5=82olepszy?= Date: Mon, 13 Mar 2023 11:59:12 +0100 Subject: [PATCH 01/12] Draft of the registry specification This PR includes 2 files: * `registry.dtd` is the schema for defining regestries in XML; it is normative. * `registry.md` is the non-normative documentation explaining the motivation and the schema. It also includes examples. This PR is based on my [old spec proposal from January 2022](https://github.com/unicode-org/message-format-wg/discussions/218), and the more recent [presentation](https://github.com/unicode-org/message-format-wg/blob/main/meetings/2023/notes-2023-02-06.md) that I did to resume the work on the design of the registry. For now, I've focused on describing custom functions by defining their signatures. A single signature corresponds to one set of: the current locale, the argument, and the options bag. I didn't address all feedback from our February 6 meeting in this PR. Looking at my notes, here are the topics for future discussions: * [ ] Not all options should be locale-specific. * [ ] Some options should be common to all signatures of a given function. * Support other data types besides functions: * [ ] markup, * [ ] metadata (comments, max length, screenshot URL, etc.), * [ ] global variables. * Describe the interface of runtime arguments and local variables (i.e. the return types of formatting functions). Right now the validation of arguments and option values only applies to literal values. --- spec/registry.dtd | 33 ++++++++++ spec/registry.md | 162 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 195 insertions(+) create mode 100644 spec/registry.dtd create mode 100644 spec/registry.md diff --git a/spec/registry.dtd b/spec/registry.dtd new file mode 100644 index 0000000000..3939c1283d --- /dev/null +++ b/spec/registry.dtd @@ -0,0 +1,33 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/spec/registry.md b/spec/registry.md new file mode 100644 index 0000000000..b89376391a --- /dev/null +++ b/spec/registry.md @@ -0,0 +1,162 @@ +# WIP DRAFT MessageFormat 2.0 Registry + +_This document is non-normative._ + +The implementations and tooling can greatly benefit from a structured definition of formatting and matching functions available to messages at runtime. +The _registry_ is a mechanism for storing such declarations in a portable manner. + +## Goals + +The goal of the registry is to provide a machine-readable description of MessageFormat extensions (custom functions) in order to: + +* support the localization roundtrip, +* validate message correctness, and +* improve the authoring experience. + +## Use Cases + +The registry enables the following usage scenarios: + +* Generate variant keys for a given locale during XLIFF extraction. +* Verify the exhaustiveness of variant keys given a selector. +* Type-check values passed into functions. +* Validate that matching functions are only called in selectors. +* Validate that formatting functions are only called in placeholders. +* Forbid edits to certain function options (e.g. currency options). +* Autocomplete function and option names. +* Display on-hover tooltips for function signatures with documentation. +* Display/edit known message metadata. +* Restrict input in GUI by providing a dropdown with all viable option values. + +## Data Model + +The registry contains descriptions of function signatures. +[`registry.dtd`](./registry.dtd) describes its data model. + +The main building block of the registry is the `` element. +It represents an implementation of a custom function available to translation at runtime. +A function defines a human-readable _description_ of its behavior +and one or more machine-readable _signatures_ of how to call it. +Named `` elements can optionally define regex validation rules for input, optional values, and variant keys. + +The `` element defines one particular signature of the custom function, +i.e. the set of arguments and named options that can be used together in a single call to the function. +The `type` attribute specifies the calling context of the signature. +A `type="format"` function can only be called inside a placeholder inside translatable text. +A `type="match"` function can only be called inside a selector. +Signatures with a non-empty `locales` attribute are locale-specific and only available in translations in the given languages. + +A signature may define the type of input it accepts with zero or more `` elments. +In MessageFormat 2.0 functions can only ever accept at most one positional argument. +Multiple `` elements can be used to define different validation rules for this single argument input, +together with appropriate human-readable descriptions. + +A signature may also define one or more `` elements representing _named options_ to the function. +Parameters are optional by default, +unless the `required` attribute is present. +They accept either a finite enumeration of values (the `values` attribute) +or validate they input with a regular expression (the `pattern` attribute). +Read-only parameters (the `readonly` attribute) can be displayed to translators in CAT tools, but may not be edited. + +Matching-function signatures additionally include one or more `` elements +to define the keys against which they can match when used as selectors. + +## Example + +The following `registry.xml` is an example of a registry file +which may be provided by an implementation to describe its built-in functions. +For the sake of brevity, only `locales="en"` is considered. + +```xml + + + + + + Match the current OS. + + + + + + + + + + + + Format a number. + Match a numerical value against CLDR plural categories or against a number literal. + + + + + + + + + + + + + + + + + + + + + + + + + + +``` + +A localization engineer can then extend the registry by defining the following `customRegistry.xml` file. + +```xml + + + + + + Handle the grammar of a noun. + + + + + + + + + + Handle the grammar of an adjective. + + + + + + + + + + + + + +``` + +Messages can now use the `:noun` and the `:adjective` functions. +The following message references the first signature of `:adjective`, +which expects the `plural` and `case` options: + + {You see {$color adjective article=indefinite plural=one case=nominative} {$object :noun case=nominative}!} + +The following message references the second signature of `:adjective`, +which only expects the `accord` option: + + let $obj = {$object :noun case=nominative} + {You see {$color adjective article=indefinite accord=$obj} {$obj}!} From 8f64eff8ca587894e10232feb71584c569cc5ce4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stanis=C5=82aw=20Ma=C5=82olepszy?= Date: Mon, 13 Mar 2023 15:15:30 +0100 Subject: [PATCH 02/12] Apply suggestions from code review Co-authored-by: Eemeli Aro --- spec/registry.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/spec/registry.md b/spec/registry.md index b89376391a..1decb250bf 100644 --- a/spec/registry.md +++ b/spec/registry.md @@ -47,7 +47,7 @@ A `type="match"` function can only be called inside a selector. Signatures with a non-empty `locales` attribute are locale-specific and only available in translations in the given languages. A signature may define the type of input it accepts with zero or more `` elments. -In MessageFormat 2.0 functions can only ever accept at most one positional argument. +Functions can only ever accept at most one positional argument. Multiple `` elements can be used to define different validation rules for this single argument input, together with appropriate human-readable descriptions. @@ -153,10 +153,10 @@ Messages can now use the `:noun` and the `:adjective` functions. The following message references the first signature of `:adjective`, which expects the `plural` and `case` options: - {You see {$color adjective article=indefinite plural=one case=nominative} {$object :noun case=nominative}!} + {You see {$color :adjective article=indefinite plural=one case=nominative} {$object :noun case=nominative}!} The following message references the second signature of `:adjective`, which only expects the `accord` option: let $obj = {$object :noun case=nominative} - {You see {$color adjective article=indefinite accord=$obj} {$obj}!} + {You see {$color :adjective article=indefinite accord=$obj} {$obj}!} From 831a9cdaca37aa26d6299ae1680d3ea134f33e99 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stanis=C5=82aw=20Ma=C5=82olepszy?= Date: Mon, 13 Mar 2023 15:34:25 +0100 Subject: [PATCH 03/12] Add readonly to and only allow one per signature --- spec/registry.dtd | 3 ++- spec/registry.md | 10 +++------- 2 files changed, 5 insertions(+), 8 deletions(-) diff --git a/spec/registry.dtd b/spec/registry.dtd index 3939c1283d..023cd41305 100644 --- a/spec/registry.dtd +++ b/spec/registry.dtd @@ -9,7 +9,7 @@ - + @@ -18,6 +18,7 @@ + diff --git a/spec/registry.md b/spec/registry.md index 1decb250bf..a54a43cf41 100644 --- a/spec/registry.md +++ b/spec/registry.md @@ -46,11 +46,7 @@ A `type="format"` function can only be called inside a placeholder inside transl A `type="match"` function can only be called inside a selector. Signatures with a non-empty `locales` attribute are locale-specific and only available in translations in the given languages. -A signature may define the type of input it accepts with zero or more `` elments. -Functions can only ever accept at most one positional argument. -Multiple `` elements can be used to define different validation rules for this single argument input, -together with appropriate human-readable descriptions. - +A signature may define the positional argument of the function with the `` element. A signature may also define one or more `` elements representing _named options_ to the function. Parameters are optional by default, unless the `required` attribute is present. @@ -90,7 +86,7 @@ For the sake of brevity, only `locales="en"` is considered. - + @@ -102,7 +98,7 @@ For the sake of brevity, only `locales="en"` is considered. - + From 9ec9f593dc77331f8498f95bf086e066fbdcb07b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stanis=C5=82aw=20Ma=C5=82olepszy?= Date: Mon, 13 Mar 2023 16:25:33 +0100 Subject: [PATCH 04/12] Rename to @@ -132,14 +132,14 @@ A localization engineer can then extend the registry by defining the following ` Handle the grammar of an adjective. - - - + - - + From 39f13a99db4f0391ebabd8534dbf54e190c6c827 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stanis=C5=82aw=20Ma=C5=82olepszy?= Date: Tue, 14 Mar 2023 13:46:30 +0100 Subject: [PATCH 05/12] Split into and --- spec/registry.dtd | 10 ++++++---- spec/registry.md | 32 +++++++++++++++----------------- 2 files changed, 21 insertions(+), 21 deletions(-) diff --git a/spec/registry.dtd b/spec/registry.dtd index 8dc50d6fc1..3b9c599902 100644 --- a/spec/registry.dtd +++ b/spec/registry.dtd @@ -9,10 +9,12 @@ - - - - + + + + + + diff --git a/spec/registry.md b/spec/registry.md index b9293abed5..4bab069eb3 100644 --- a/spec/registry.md +++ b/spec/registry.md @@ -39,11 +39,9 @@ A function defines a human-readable _description_ of its behavior and one or more machine-readable _signatures_ of how to call it. Named `` elements can optionally define regex validation rules for input, option values, and variant keys. -The `` element defines one particular signature of the custom function, -i.e. the set of arguments and named options that can be used together in a single call to the function. -The `type` attribute specifies the calling context of the signature. -A `type="format"` function can only be called inside a placeholder inside translatable text. -A `type="match"` function can only be called inside a selector. +A _signature_ defines one particular set of arguments and named options that can be used together in a single call to the function. +`` corresponds to a function call inside a placeholder inside translatable text. +`` coresponds to a function call inside a selector. Signatures with a non-empty `locales` attribute are locale-specific and only available in translations in the given languages. A signature may define the positional argument of the function with the `` element. @@ -70,9 +68,9 @@ For the sake of brevity, only `locales="en"` is considered. Match the current OS. - + - + @@ -85,7 +83,7 @@ For the sake of brevity, only `locales="en"` is considered. Match a numerical value against CLDR plural categories or against a number literal. - + + - + + ``` @@ -120,27 +118,27 @@ A localization engineer can then extend the registry by defining the following ` Handle the grammar of a noun. - + + Handle the grammar of an adjective. - + - + + + ``` From 5f47dd6649abb0900dfc9934baaf824450655f2f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stanis=C5=82aw=20Ma=C5=82olepszy?= Date: Fri, 19 May 2023 15:31:48 +0200 Subject: [PATCH 06/12] Functions must define at least one format or match signature --- spec/registry.dtd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/spec/registry.dtd b/spec/registry.dtd index 3b9c599902..4074361c87 100644 --- a/spec/registry.dtd +++ b/spec/registry.dtd @@ -1,6 +1,6 @@ - + From 5b42dcfb386a5890581c0d0bcc1ab7539efdeac9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stanis=C5=82aw=20Ma=C5=82olepszy?= Date: Fri, 19 May 2023 16:05:44 +0200 Subject: [PATCH 07/12] Explain what format and match functions are --- spec/registry.md | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/spec/registry.md b/spec/registry.md index 4bab069eb3..9706346e84 100644 --- a/spec/registry.md +++ b/spec/registry.md @@ -39,9 +39,17 @@ A function defines a human-readable _description_ of its behavior and one or more machine-readable _signatures_ of how to call it. Named `` elements can optionally define regex validation rules for input, option values, and variant keys. +MessageFormat functions can be invoked in two contexts: +* inside placeholders, to produce a part of the message's formatted output; + for example, a raw value of `|1.5|` may be formatted to `1,5` in a language which uses commas as decimal separators, +* inside selectors, to contribute to selecting the appropriate variant among all given variants. + +A single _function name_ may be used in both contexts, +regardless of whether it's implemented as one or multiple functions. + A _signature_ defines one particular set of arguments and named options that can be used together in a single call to the function. `` corresponds to a function call inside a placeholder inside translatable text. -`` coresponds to a function call inside a selector. +`` corresponds to a function call inside a selector. Signatures with a non-empty `locales` attribute are locale-specific and only available in translations in the given languages. A signature may define the positional argument of the function with the `` element. From 5fa7c1927e105b2892a591a5c65b9ecb1913261f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stanis=C5=82aw=20Ma=C5=82olepszy?= Date: Fri, 19 May 2023 16:06:21 +0200 Subject: [PATCH 08/12] Clarify that at most one positional argument is allowed; add a note about nullary functions --- spec/registry.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/spec/registry.md b/spec/registry.md index 9706346e84..cfd42a64e4 100644 --- a/spec/registry.md +++ b/spec/registry.md @@ -47,12 +47,13 @@ MessageFormat functions can be invoked in two contexts: A single _function name_ may be used in both contexts, regardless of whether it's implemented as one or multiple functions. -A _signature_ defines one particular set of arguments and named options that can be used together in a single call to the function. +A _signature_ defines one particular set of at most one argument and any number of named options that can be used together in a single call to the function. `` corresponds to a function call inside a placeholder inside translatable text. `` corresponds to a function call inside a selector. Signatures with a non-empty `locales` attribute are locale-specific and only available in translations in the given languages. A signature may define the positional argument of the function with the `` element. +If the `` element is not present, the function is defined as a nullary function. A signature may also define one or more `