Add syntax proposal with EBNF #230

eemeli · 2022-04-28T08:28:10Z

This PR adds spec/syntax.md as a starting point for our MF2 grammar considerations. This document is based on the syntax specification prepared earlier by @stasm and represents an evolution of the "A" option presented in #229, taking into account comments and other responses to the presentation.

Given the time constraints, not all comments have been taken into account. We are not suggesting that this is a final syntax, but a sufficiently minimal compromise on which further changes may be applied.

Compared to the "A" option, the following changes have been made:

All message-internal comments are removed.
Unicode escapes are removed.
"Plain" messages are added. These allow for the representation of messages that contain no selectors or placeholders with just the message's translatable contents.

The proposed grammar is LL(1), and a parser for it is available. That was generated using the REx Parser Generator, with -ll 1 -javascript -tree -main command-line options.

We'd like the discussion about the syntax design to happen here on GitHub rather than in Google Docs comments, for clarity, discoverability and posterity. Our proposal for next steps and onwards progress is the following:

Agree that this is a sufficiently acceptable starting point for more specific syntax discussions, and merge this PR.
Add issues and/or PRs focusing on specific aspects of the syntax. We're not able to transfer all existing comments from the MF2 Syntax presentation, so we kindly ask you to open new issues here on GitHub with your feedback.

To that end, we ask that you evaluate this proposal as a whole, and hopefully give your approval for merging it so that specific issues and changes may be suggested separately, rather than swamping this initial PR.

As agreed on Monday's call, Staś and I are happy to make ourselves available for in-person discussions before we present this again to the WG on our call on 2022-05-09.

Co-authored-by: Stanisław Małolepszy sta@malolepszy.org

Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>

stasm · 2022-04-28T21:01:20Z

I’ve compiled a list of topics mentioned in the comments on the slidedeck, and grouped them into themes. Most of these are still open questions, and agreeing to merge this PR doesn't mean they cease to be open questions. The intent of this PR is to have a relatively minimal base that we can then extend (or restrict even further) through a structured review process and discussion. Once this PR is merged, please feel free to open new issues to discuss specific feedback about the syntax

Escaping

What are the general rules of escaping?
Do we need Unicode escape sequences?
What happens when an unknown sequences are encountered, e.g. \a?
How do we make sure the need to escape is clear even when a message is stored in a general purpose container?
Should the sets of escaped characters same inside the Plain, Text, and String productions?

Markup

How do ensure non HTML display elements are first class?
What is the syntax for standalone elements?
Can we use argument-less functions to represent standalone display elements?
Do we need a different sigil for display element names?

Functions

Are we OK with a function call operator ({$arg : func}, whitespace optional), or do we want a sigil attached to function names ({$arg :func})?
How do we expect $foo to differ from $foo: number (if at all)?
Is {:func} sufficient syntax for standalone function calls?

Literals

Do we allow bare literals in placeholders, e.g. {"foo"}?
Do literals in the argument position need to be delimited with quotes at all times? E.g. {"42": number}.

Dynamic option values

Do we want syntax like {$arg func opt=$variable}? What are the use-cases?
How can we make it work such that func is cached/memoized?
Can $variable above be supplied by the application code as an argument to MessageFormat?
Can $variable refer to another placeholder found in the same message?
Can $variable refer to a local variable defined by the message?

Naming

Variable / argument / input data.
Local variable / alias / macro.

Comments

Do we want comments inside messages at all?
Where do we put localization notes? How far away do we accept them to be from the things that they describe?

Preamble

If we start in the "code" mode, do we need to delimit the preamble?
If so, what delimiters to use? { … } or perhaps {? … }? Something else?
Is it OK to make all local variable definitions selectors? I.e. the preamble is a single flat list of selectors, some of which are bound to local variables.

Variants

Should variant keys be delimited, too?
Can there be fewer keys for a given pattern than there are selectors in the preamble?

Delimiting patterns

Why use [ … ] to delimit patterns rather than { … }?

srl295

just a comment that it's great to see this shaping up. having not really seen this before, the spec and ebnf sound very reasonable

aphillips · 2022-04-29T14:37:42Z

@stasm I have comments or opinions about nearly everything on your very good list above, but this isn't the place to discuss them. Are you planning to create an issue or issues around them? Or is the idea to just open discussion threads for whatever concerns us?

I'm going to give a positive review here so that we can merge it. I don't see any value in debating the wording before we can look at the doc as an artifact.

srl295

I'm going to say generally LGTM, knowing it's intended as a stake in the ground …

spec/syntax.md

eemeli · 2022-04-29T18:28:10Z

@aphillips It would probably be most helpful for any interested parties to open issues about topics that they'd like for the group to discuss. Staś and I may do so as well, but really this would work best as a collaborative effort.

One reason to prefer issues over discussions is that they should more naturally stay single-threaded on a single topic, and it's possible to explicitly close them either directly, or via a pull request. On GitHub, discussions don't have any clear conclusion.

eemeli · 2022-05-03T13:48:47Z

During the meeting yesterday, we agreed to retarget this PR towards a new branch develop rather than main, as suggested by @srl295 and @stasm.

This will hopefully allow us to soon find consensus to land this, making it easier for further development to take place via issues and further PRs.

echeran

The biggest process concerns have been addressed, so I'll approve.

nit: I explain in #231 why I don't seen the benefit of creating yet another branch when we already have experiments serving the same functionality. This feels messy <=> another potential instance of discontinuity.

The intent was to help preserve continuity by creating something in the repo that others can comment on and open PRs against.

Okay. Given that there is a lot of valuable concrete technical discussion already in the comments of the slide deck, I imagine your plan to preserve the continuity of those discussions would be to create PRs for each of the topics listed and copy over those comments? That would SGTM.

markusicu · 2022-05-04T03:56:57Z

Hi there, thanks for writing down a complete proposal. However, the more I look at it, the more I find aspects of the syntax confusing and unintuitive.

First, to a human, there is no clear mental model for how to start reading and writing a message. A message can just be plain text (yeah!), or text in square brackets, or "code". I see how you get that to work in the EBNF, but it's confusing.

I have always liked simple messages to be simple, with no surrounding syntax. However, that should include placeholders, requiring enclosing-delimiter syntax only when there are placeholders is weird, and allowing simple text both with and without delimiters is also weird.

Starting in "text" mode also has problems discussed here (uncertainty about trimming spaces), and we have problems with MF1 (ICU MessageFormat) where we don't want people to write literal text outside of a select-type placeholder.

So while it pains me to say this, bite the bullet, start in "code" mode, and always surround translatable text with delimiters. And please use curly braces for those delimiters; square brackets are common enough in real text. Together with using curly braces for placeholders, only {} are special and need escaping.

So like this:

{Hello world!}

{Hello {$name}!}

The next most confusing thing is figuring out which placeholder-looking item becomes an input to the selection, and how those match up to the variant value tuples. You really need to enclose the selectors so that it's clear which ones they are, and the number of selectors needs to be the same as the number of values in each value tuple. I suggest enclosing both the list of selectors and each value tuple in square brackets. Bracketing both selectors and tuples correlates them visually.

Also, it seems like a variable definition can have a side effect of its expression leaking into the selection syntax; don't overload unrelated functionalities.

With that, a message always starts in "code" mode with optional variable definitions; then it's either a single pattern, or a selector head followed by pairs of value tuples and patterns:

$f1={$something}  $f2={$else}
[{$count :plural offset=1} {$gender}]
[few female] {pattern {$f1}}
[_ _] {the default pattern}

About the colon as a function prefix -- I am ok with using a colon, but it's a prefix, just like the dollar for variables, so it needs to attach to the function name, and there should be no space allow between the colon and the name. If you don't like the visuals, then pick a different symbol. (Earlier proposals used an @ sign.)

Please don't allow whitespace everywhere. Traditional programming languages are super loose with spaces, but then people debate and enforce style guides for where to put them. Looking at my example above, I suggest not allowing spaces where I didn't put any. (For example, not around the =.)

The syntax so far is not clear about formatting functions and selection functions. It looks like "number" could be both, but they should be separate. "number" should just format, and "plural" should both format and select (remember that the two are intertwined).

Similarly, once a variable has been subject to select-and-format, using it should require different syntax from regular variables, because we need to be clear that the placeholder is replaced consistently in each variant. If you just write {$count} then a developer has to wonder whether their formatting options apply; and you cannot allow formatting options on a select-and-format variable in the variant, because then the output may well not fit the selected variant any more (e.g., changing the number of fraction digits on a plural). So pick a symbol other than the dollar, and don't allow functions and options there.

About the markup syntax: Are these free-form, implementation-dependent tokens? Do you expect them to be HTML or TTS hints or accessibility hints? Without any distinctions, this seems like Unicode private use code points with all of their problems.

Or are the markup tokens registered functions? If so, then why not use the {:function} syntax, and maybe with a literal string before the function name (which could then be something like :html).

If we do need markup with special syntax, then it should fit better into the overall syntax (e.g., always an ASCII symbol after the opening {) and into the framework as a whole (e.g., registered entities).

Speaking of literals, don't enclose them in quotes. One of the stated goals here is to make messages usable as string literals in programming languages. Enclose literals in parentheses {... {(5) :number} books} or angle brackets {... {<5> :number} books} or pairs of pipe symbols {... {|5| :number} books} or similar.

Do we need variables inside placeholder options? We have seen this before, but I don't remember a good use case being presented. I see one example that suggests using it for grammatical agreement, but I strongly suspect that managing grammatical agreement will require a second pass over the output of the whole "formatting" pass for a message; a second pass so that it can look at everything in context -- rather than trying to make one isolated placeholder agree with the input to another. So unless we have a really good use case for variables in options, let's leave them out.

Phew, this turned out long, sorry :-)

Cc @macchiati

stasm · 2022-05-04T05:32:55Z

Hi @markusicu, thank you for writing this down! I agree with a lot of your points, and with some others I agree but I picked different trade-offs. We should discuss all of them.

The idea behind this PR is to merge a starting point into the develop branch precisely so that we have a base against which we can file more issues. I see a material for a new issue in almost every paragraph of your comment:

Start in "code" mode.
- Start in "code" mode exclusively #256
Use {...} as text delimiters rather than [...].
- Delimiting patterns: [ ... ] vs { ... } #255
Enclose selectors so that it's clear what they are.
- Enclose selectors so that it's clear what they are #257
Don't overload selector syntax to define local variables.
- Preamble: unify local variable definitions and selectors? #252
Use colon as a prefix, not an operator.
- Functions: is ':' a sigil (no spaces after), or operator (spaces allowed) #242
Don't allow whitespace everywhere.
- Don't allow whitespace everywhere #259
Make it clear which functions are formatting functions and which are selection functions.
- Make it clear which functions are formatting functions and which are selection functions #260
Use a different namespace to reference local variables.
- Use a different namespace to reference local variables #261
Do markup elements map to registered functions?
- Do markup elements map to registered functions? #262
Pick a delimiter for literals other than the double quote.
- Pick a delimiter for literals other than the double quote #263
Do we need variables as option values?
- Dynamic option values: do we want syntax like {$arg func opt=$value}? #247

Would you mind opening a new issues for each of these and tagging them with the syntax label? I can do this for you, too, just let me know.

romulocintra · 2022-05-09T16:49:44Z

Waiting for @mihnita comments and approval to merge this PR

spec/syntax.md

Co-authored-by: Mihai Nita <nmihai_2000@yahoo.com>

Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>

macchiati · 2022-05-23T05:10:29Z

I will not be able to attend tomorrow's meeting (Monday and Tuesday we're moving into our new place in SF). I hope to be able to join subsequent discussions. To make sense of the differences, I tried putting together a comparison at the following location, with my notes. https://docs.google.com/document/d/1t9ZAgdVUb2ILUu-moEpS0g0T-3WZ07uCHwmw9WkMnt4/edit?usp=sharing I didn't have the time to be as thorough as I'd like to be, so my apologies for that. Mark

…

On Wed, May 11, 2022 at 1:40 PM Eemeli Aro ***@***.***> wrote: Merged #230 <#230> into develop. — Reply to this email directly, view it on GitHub <#230 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACJLEMAVQFTZIET2YDMGL73VJQLK7ANCNFSM5URS56FA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Add syntax proposal with EBNF

9cb94e6

Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>

eemeli added Meetings/Agenda syntax Issues related with syntax or ABNF labels Apr 28, 2022

eemeli added this to the A formal definition of the canonical syntax for representing the data model, with well defined rules for handling text, special characters, escape sequences, whitespace, markup, as well as parsing errors. milestone Apr 28, 2022

eemeli requested review from aphillips, zbraniecki, srl295, echeran, sffc, mihnita, romulocintra, nbouvrette, markusicu and grhoten April 28, 2022 08:28

eemeli assigned stasm and eemeli Apr 28, 2022

This was linked to issues Apr 28, 2022

Defining the canonical syntax #178

Closed

Syntax design should aid reader in what is translatable #51

Closed

Syntax Simplicity #48

Closed

Automatic selection of formatter by argument type #42

Closed

Support messages in HTML #15

Closed

fix: Drop extraneous "Ignore" token from syntax.md

aef24e8

sffc removed their request for review April 28, 2022 10:33

srl295 reviewed Apr 29, 2022

View reviewed changes

aphillips approved these changes Apr 29, 2022

View reviewed changes

srl295 approved these changes Apr 29, 2022

View reviewed changes

spec/syntax.md Outdated Show resolved Hide resolved

eemeli requested a review from echeran May 3, 2022 13:48

echeran approved these changes May 3, 2022

View reviewed changes

romulocintra approved these changes May 9, 2022

View reviewed changes

mihnita requested changes May 9, 2022

View reviewed changes

spec/syntax.md Outdated Show resolved Hide resolved

spec/syntax.md Outdated Show resolved Hide resolved

spec/syntax.md Show resolved Hide resolved

spec/syntax.md Outdated Show resolved Hide resolved

spec/syntax.md Show resolved Hide resolved

eemeli commented May 9, 2022

View reviewed changes

spec/syntax.md Outdated Show resolved Hide resolved

eemeli commented May 10, 2022

View reviewed changes

spec/syntax.md Outdated Show resolved Hide resolved

Apply suggestions from code review

8e7d743

Co-authored-by: Mihai Nita <nmihai_2000@yahoo.com>

eemeli requested a review from mihnita May 10, 2022 08:32

Apply suggestions from code review

972dc1f

Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>

mihnita approved these changes May 11, 2022

View reviewed changes

eemeli merged commit fe595d5 into develop May 11, 2022

eemeli deleted the syntax-proposal branch May 11, 2022 20:40

echeran mentioned this pull request May 20, 2022

MF2.0 compromise syntax #266

Closed

Uh oh!

Add syntax proposal with EBNF #230

Add syntax proposal with EBNF #230

Uh oh!

Conversation

eemeli commented Apr 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stasm commented Apr 28, 2022

Escaping

Markup

Functions

Literals

Dynamic option values

Naming

Comments

Preamble

Variants

Delimiting patterns

Uh oh!

srl295 left a comment

Choose a reason for hiding this comment

Uh oh!

aphillips commented Apr 29, 2022

Uh oh!

srl295 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eemeli commented Apr 29, 2022

Uh oh!

eemeli commented May 3, 2022

Uh oh!

echeran left a comment

Choose a reason for hiding this comment

Uh oh!

markusicu commented May 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stasm commented May 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

romulocintra commented May 9, 2022 • edited by eemeli Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

macchiati commented May 23, 2022 via email

Uh oh!

Uh oh!

eemeli commented Apr 28, 2022 •

edited

Loading

markusicu commented May 4, 2022 •

edited

Loading

stasm commented May 4, 2022 •

edited

Loading

romulocintra commented May 9, 2022 •

edited by eemeli

Loading