Skip to content

Add syntax proposal with EBNF #230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 11, 2022
Merged

Add syntax proposal with EBNF #230

merged 5 commits into from
May 11, 2022

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Apr 28, 2022

This PR adds spec/syntax.md as a starting point for our MF2 grammar considerations. This document is based on the syntax specification prepared earlier by @stasm and represents an evolution of the "A" option presented in #229, taking into account comments and other responses to the presentation.

Given the time constraints, not all comments have been taken into account. We are not suggesting that this is a final syntax, but a sufficiently minimal compromise on which further changes may be applied.

Compared to the "A" option, the following changes have been made:

  • All message-internal comments are removed.
  • Unicode escapes are removed.
  • "Plain" messages are added. These allow for the representation of messages that contain no selectors or placeholders with just the message's translatable contents.

The proposed grammar is LL(1), and a parser for it is available. That was generated using the REx Parser Generator, with -ll 1 -javascript -tree -main command-line options.

We'd like the discussion about the syntax design to happen here on GitHub rather than in Google Docs comments, for clarity, discoverability and posterity. Our proposal for next steps and onwards progress is the following:

  1. Agree that this is a sufficiently acceptable starting point for more specific syntax discussions, and merge this PR.
  2. Add issues and/or PRs focusing on specific aspects of the syntax. We're not able to transfer all existing comments from the MF2 Syntax presentation, so we kindly ask you to open new issues here on GitHub with your feedback.

To that end, we ask that you evaluate this proposal as a whole, and hopefully give your approval for merging it so that specific issues and changes may be suggested separately, rather than swamping this initial PR.

As agreed on Monday's call, Staś and I are happy to make ourselves available for in-person discussions before we present this again to the WG on our call on 2022-05-09.

Co-authored-by: Stanisław Małolepszy sta@malolepszy.org

Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>
@eemeli eemeli added Meetings/Agenda syntax Issues related with syntax or ABNF labels Apr 28, 2022
@eemeli eemeli added this to the A formal definition of the canonical syntax for representing the data model, with well defined rules for handling text, special characters, escape sequences, whitespace, markup, as well as parsing errors. milestone Apr 28, 2022
@sffc sffc removed their request for review April 28, 2022 10:33
@stasm
Copy link
Collaborator

stasm commented Apr 28, 2022

I’ve compiled a list of topics mentioned in the comments on the slidedeck, and grouped them into themes. Most of these are still open questions, and agreeing to merge this PR doesn't mean they cease to be open questions. The intent of this PR is to have a relatively minimal base that we can then extend (or restrict even further) through a structured review process and discussion. Once this PR is merged, please feel free to open new issues to discuss specific feedback about the syntax

Escaping

  • What are the general rules of escaping?
  • Do we need Unicode escape sequences?
  • What happens when an unknown sequences are encountered, e.g. \a?
  • How do we make sure the need to escape is clear even when a message is stored in a general purpose container?
  • Should the sets of escaped characters same inside the Plain, Text, and String productions?

Markup

  • How do ensure non HTML display elements are first class?
  • What is the syntax for standalone elements?
  • Can we use argument-less functions to represent standalone display elements?
  • Do we need a different sigil for display element names?

Functions

  • Are we OK with a function call operator ({$arg : func}, whitespace optional), or do we want a sigil attached to function names ({$arg :func})?
  • How do we expect $foo to differ from $foo: number (if at all)?
  • Is {:func} sufficient syntax for standalone function calls?

Literals

  • Do we allow bare literals in placeholders, e.g. {"foo"}?
  • Do literals in the argument position need to be delimited with quotes at all times? E.g. {"42": number}.

Dynamic option values

  • Do we want syntax like {$arg func opt=$variable}? What are the use-cases?
  • How can we make it work such that func is cached/memoized?
  • Can $variable above be supplied by the application code as an argument to MessageFormat?
  • Can $variable refer to another placeholder found in the same message?
  • Can $variable refer to a local variable defined by the message?

Naming

  • Variable / argument / input data.
  • Local variable / alias / macro.

Comments

  • Do we want comments inside messages at all?
  • Where do we put localization notes? How far away do we accept them to be from the things that they describe?

Preamble

  • If we start in the "code" mode, do we need to delimit the preamble?
  • If so, what delimiters to use? { … } or perhaps {? … }? Something else?
  • Is it OK to make all local variable definitions selectors? I.e. the preamble is a single flat list of selectors, some of which are bound to local variables.

Variants

  • Should variant keys be delimited, too?
  • Can there be fewer keys for a given pattern than there are selectors in the preamble?

Delimiting patterns

  • Why use [ … ] to delimit patterns rather than { … }?

Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a comment that it's great to see this shaping up. having not really seen this before, the spec and ebnf sound very reasonable

@aphillips
Copy link
Member

@stasm I have comments or opinions about nearly everything on your very good list above, but this isn't the place to discuss them. Are you planning to create an issue or issues around them? Or is the idea to just open discussion threads for whatever concerns us?

I'm going to give a positive review here so that we can merge it. I don't see any value in debating the wording before we can look at the doc as an artifact.

Copy link
Member

@srl295 srl295 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to say generally LGTM, knowing it's intended as a stake in the ground …

@eemeli
Copy link
Collaborator Author

eemeli commented Apr 29, 2022

@aphillips It would probably be most helpful for any interested parties to open issues about topics that they'd like for the group to discuss. Staś and I may do so as well, but really this would work best as a collaborative effort.

One reason to prefer issues over discussions is that they should more naturally stay single-threaded on a single topic, and it's possible to explicitly close them either directly, or via a pull request. On GitHub, discussions don't have any clear conclusion.

@eemeli
Copy link
Collaborator Author

eemeli commented May 3, 2022

During the meeting yesterday, we agreed to retarget this PR towards a new branch develop rather than main, as suggested by @srl295 and @stasm.

This will hopefully allow us to soon find consensus to land this, making it easier for further development to take place via issues and further PRs.

@eemeli eemeli requested a review from echeran May 3, 2022 13:48
Copy link
Collaborator

@echeran echeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest process concerns have been addressed, so I'll approve.

nit: I explain in #231 why I don't seen the benefit of creating yet another branch when we already have experiments serving the same functionality. This feels messy <=> another potential instance of discontinuity.

The intent was to help preserve continuity by creating something in the repo that others can comment on and open PRs against.

Okay. Given that there is a lot of valuable concrete technical discussion already in the comments of the slide deck, I imagine your plan to preserve the continuity of those discussions would be to create PRs for each of the topics listed and copy over those comments? That would SGTM.

@markusicu
Copy link
Member

markusicu commented May 4, 2022

Hi there, thanks for writing down a complete proposal. However, the more I look at it, the more I find aspects of the syntax confusing and unintuitive.

First, to a human, there is no clear mental model for how to start reading and writing a message. A message can just be plain text (yeah!), or text in square brackets, or "code". I see how you get that to work in the EBNF, but it's confusing.

I have always liked simple messages to be simple, with no surrounding syntax. However, that should include placeholders, requiring enclosing-delimiter syntax only when there are placeholders is weird, and allowing simple text both with and without delimiters is also weird.

Starting in "text" mode also has problems discussed here (uncertainty about trimming spaces), and we have problems with MF1 (ICU MessageFormat) where we don't want people to write literal text outside of a select-type placeholder.

So while it pains me to say this, bite the bullet, start in "code" mode, and always surround translatable text with delimiters. And please use curly braces for those delimiters; square brackets are common enough in real text. Together with using curly braces for placeholders, only {} are special and need escaping.

So like this:

{Hello world!}

{Hello {$name}!}

The next most confusing thing is figuring out which placeholder-looking item becomes an input to the selection, and how those match up to the variant value tuples. You really need to enclose the selectors so that it's clear which ones they are, and the number of selectors needs to be the same as the number of values in each value tuple. I suggest enclosing both the list of selectors and each value tuple in square brackets. Bracketing both selectors and tuples correlates them visually.

Also, it seems like a variable definition can have a side effect of its expression leaking into the selection syntax; don't overload unrelated functionalities.

With that, a message always starts in "code" mode with optional variable definitions; then it's either a single pattern, or a selector head followed by pairs of value tuples and patterns:

$f1={$something}  $f2={$else}
[{$count :plural offset=1} {$gender}]
[few female] {pattern {$f1}}
[_ _] {the default pattern}

About the colon as a function prefix -- I am ok with using a colon, but it's a prefix, just like the dollar for variables, so it needs to attach to the function name, and there should be no space allow between the colon and the name. If you don't like the visuals, then pick a different symbol. (Earlier proposals used an @ sign.)

Please don't allow whitespace everywhere. Traditional programming languages are super loose with spaces, but then people debate and enforce style guides for where to put them. Looking at my example above, I suggest not allowing spaces where I didn't put any. (For example, not around the =.)

The syntax so far is not clear about formatting functions and selection functions. It looks like "number" could be both, but they should be separate. "number" should just format, and "plural" should both format and select (remember that the two are intertwined).

Similarly, once a variable has been subject to select-and-format, using it should require different syntax from regular variables, because we need to be clear that the placeholder is replaced consistently in each variant. If you just write {$count} then a developer has to wonder whether their formatting options apply; and you cannot allow formatting options on a select-and-format variable in the variant, because then the output may well not fit the selected variant any more (e.g., changing the number of fraction digits on a plural). So pick a symbol other than the dollar, and don't allow functions and options there.

About the markup syntax: Are these free-form, implementation-dependent tokens? Do you expect them to be HTML or TTS hints or accessibility hints? Without any distinctions, this seems like Unicode private use code points with all of their problems.

Or are the markup tokens registered functions? If so, then why not use the {:function} syntax, and maybe with a literal string before the function name (which could then be something like :html).

If we do need markup with special syntax, then it should fit better into the overall syntax (e.g., always an ASCII symbol after the opening {) and into the framework as a whole (e.g., registered entities).

Speaking of literals, don't enclose them in quotes. One of the stated goals here is to make messages usable as string literals in programming languages. Enclose literals in parentheses {... {(5) :number} books} or angle brackets {... {<5> :number} books} or pairs of pipe symbols {... {|5| :number} books} or similar.

Do we need variables inside placeholder options? We have seen this before, but I don't remember a good use case being presented. I see one example that suggests using it for grammatical agreement, but I strongly suspect that managing grammatical agreement will require a second pass over the output of the whole "formatting" pass for a message; a second pass so that it can look at everything in context -- rather than trying to make one isolated placeholder agree with the input to another. So unless we have a really good use case for variables in options, let's leave them out.

Phew, this turned out long, sorry :-)

Cc @macchiati

@stasm
Copy link
Collaborator

stasm commented May 4, 2022

Hi @markusicu, thank you for writing this down! I agree with a lot of your points, and with some others I agree but I picked different trade-offs. We should discuss all of them.

The idea behind this PR is to merge a starting point into the develop branch precisely so that we have a base against which we can file more issues. I see a material for a new issue in almost every paragraph of your comment:

Would you mind opening a new issues for each of these and tagging them with the syntax label? I can do this for you, too, just let me know.

@romulocintra
Copy link
Collaborator

romulocintra commented May 9, 2022

  • Waiting for @mihnita comments and approval to merge this PR

Co-authored-by: Mihai Nita <nmihai_2000@yahoo.com>
@eemeli eemeli requested a review from mihnita May 10, 2022 08:32
Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>
@macchiati
Copy link
Member

macchiati commented May 23, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
syntax Issues related with syntax or ABNF
Projects
None yet
10 participants