-
-
Notifications
You must be signed in to change notification settings - Fork 36
Add syntax proposal with EBNF #230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>
I’ve compiled a list of topics mentioned in the comments on the slidedeck, and grouped them into themes. Most of these are still open questions, and agreeing to merge this PR doesn't mean they cease to be open questions. The intent of this PR is to have a relatively minimal base that we can then extend (or restrict even further) through a structured review process and discussion. Once this PR is merged, please feel free to open new issues to discuss specific feedback about the syntax Escaping
Markup
Functions
Literals
Dynamic option values
Naming
Comments
Preamble
Variants
Delimiting patterns
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a comment that it's great to see this shaping up. having not really seen this before, the spec and ebnf sound very reasonable
@stasm I have comments or opinions about nearly everything on your very good list above, but this isn't the place to discuss them. Are you planning to create an issue or issues around them? Or is the idea to just open discussion threads for whatever concerns us? I'm going to give a positive review here so that we can merge it. I don't see any value in debating the wording before we can look at the doc as an artifact. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to say generally LGTM, knowing it's intended as a stake in the ground …
@aphillips It would probably be most helpful for any interested parties to open issues about topics that they'd like for the group to discuss. Staś and I may do so as well, but really this would work best as a collaborative effort. One reason to prefer issues over discussions is that they should more naturally stay single-threaded on a single topic, and it's possible to explicitly close them either directly, or via a pull request. On GitHub, discussions don't have any clear conclusion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The biggest process concerns have been addressed, so I'll approve.
nit: I explain in #231 why I don't seen the benefit of creating yet another branch when we already have experiments
serving the same functionality. This feels messy <=> another potential instance of discontinuity.
The intent was to help preserve continuity by creating something in the repo that others can comment on and open PRs against.
Okay. Given that there is a lot of valuable concrete technical discussion already in the comments of the slide deck, I imagine your plan to preserve the continuity of those discussions would be to create PRs for each of the topics listed and copy over those comments? That would SGTM.
Hi there, thanks for writing down a complete proposal. However, the more I look at it, the more I find aspects of the syntax confusing and unintuitive. First, to a human, there is no clear mental model for how to start reading and writing a message. A message can just be plain text (yeah!), or text in square brackets, or "code". I see how you get that to work in the EBNF, but it's confusing. I have always liked simple messages to be simple, with no surrounding syntax. However, that should include placeholders, requiring enclosing-delimiter syntax only when there are placeholders is weird, and allowing simple text both with and without delimiters is also weird. Starting in "text" mode also has problems discussed here (uncertainty about trimming spaces), and we have problems with MF1 (ICU MessageFormat) where we don't want people to write literal text outside of a select-type placeholder. So while it pains me to say this, bite the bullet, start in "code" mode, and always surround translatable text with delimiters. And please use curly braces for those delimiters; square brackets are common enough in real text. Together with using curly braces for placeholders, only So like this:
The next most confusing thing is figuring out which placeholder-looking item becomes an input to the selection, and how those match up to the variant value tuples. You really need to enclose the selectors so that it's clear which ones they are, and the number of selectors needs to be the same as the number of values in each value tuple. I suggest enclosing both the list of selectors and each value tuple in square brackets. Bracketing both selectors and tuples correlates them visually. Also, it seems like a variable definition can have a side effect of its expression leaking into the selection syntax; don't overload unrelated functionalities. With that, a message always starts in "code" mode with optional variable definitions; then it's either a single pattern, or a selector head followed by pairs of value tuples and patterns:
About the colon as a function prefix -- I am ok with using a colon, but it's a prefix, just like the dollar for variables, so it needs to attach to the function name, and there should be no space allow between the colon and the name. If you don't like the visuals, then pick a different symbol. (Earlier proposals used an Please don't allow whitespace everywhere. Traditional programming languages are super loose with spaces, but then people debate and enforce style guides for where to put them. Looking at my example above, I suggest not allowing spaces where I didn't put any. (For example, not around the The syntax so far is not clear about formatting functions and selection functions. It looks like "number" could be both, but they should be separate. "number" should just format, and "plural" should both format and select (remember that the two are intertwined). Similarly, once a variable has been subject to select-and-format, using it should require different syntax from regular variables, because we need to be clear that the placeholder is replaced consistently in each variant. If you just write About the markup syntax: Are these free-form, implementation-dependent tokens? Do you expect them to be HTML or TTS hints or accessibility hints? Without any distinctions, this seems like Unicode private use code points with all of their problems. Or are the markup tokens registered functions? If so, then why not use the If we do need markup with special syntax, then it should fit better into the overall syntax (e.g., always an ASCII symbol after the opening Speaking of literals, don't enclose them in quotes. One of the stated goals here is to make messages usable as string literals in programming languages. Enclose literals in parentheses Do we need variables inside placeholder options? We have seen this before, but I don't remember a good use case being presented. I see one example that suggests using it for grammatical agreement, but I strongly suspect that managing grammatical agreement will require a second pass over the output of the whole "formatting" pass for a message; a second pass so that it can look at everything in context -- rather than trying to make one isolated placeholder agree with the input to another. So unless we have a really good use case for variables in options, let's leave them out. Phew, this turned out long, sorry :-) Cc @macchiati |
Hi @markusicu, thank you for writing this down! I agree with a lot of your points, and with some others I agree but I picked different trade-offs. We should discuss all of them. The idea behind this PR is to merge a starting point into the
Would you mind opening a new issues for each of these and tagging them with the |
|
Co-authored-by: Mihai Nita <nmihai_2000@yahoo.com>
Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>
I will not be able to attend tomorrow's meeting (Monday and Tuesday we're
moving into our new place in SF). I hope to be able to join subsequent
discussions.
To make sense of the differences, I tried putting together a comparison
at the following location, with my notes.
https://docs.google.com/document/d/1t9ZAgdVUb2ILUu-moEpS0g0T-3WZ07uCHwmw9WkMnt4/edit?usp=sharing
I didn't have the time to be as thorough as I'd like to be, so my
apologies for that.
Mark
…On Wed, May 11, 2022 at 1:40 PM Eemeli Aro ***@***.***> wrote:
Merged #230 <#230>
into develop.
—
Reply to this email directly, view it on GitHub
<#230 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMAVQFTZIET2YDMGL73VJQLK7ANCNFSM5URS56FA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This PR adds
spec/syntax.md
as a starting point for our MF2 grammar considerations. This document is based on the syntax specification prepared earlier by @stasm and represents an evolution of the "A" option presented in #229, taking into account comments and other responses to the presentation.Given the time constraints, not all comments have been taken into account. We are not suggesting that this is a final syntax, but a sufficiently minimal compromise on which further changes may be applied.
Compared to the "A" option, the following changes have been made:
The proposed grammar is LL(1), and a parser for it is available. That was generated using the REx Parser Generator, with
-ll 1 -javascript -tree -main
command-line options.We'd like the discussion about the syntax design to happen here on GitHub rather than in Google Docs comments, for clarity, discoverability and posterity. Our proposal for next steps and onwards progress is the following:
To that end, we ask that you evaluate this proposal as a whole, and hopefully give your approval for merging it so that specific issues and changes may be suggested separately, rather than swamping this initial PR.
As agreed on Monday's call, Staś and I are happy to make ourselves available for in-person discussions before we present this again to the WG on our call on 2022-05-09.
Co-authored-by: Stanisław Małolepszy sta@malolepszy.org