-
-
Notifications
You must be signed in to change notification settings - Fork 36
whitespace in the EBNF #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is something I intended to do over the end-of-the-year break, but got sick instead. I have a WIP branch in which I ran into some issues with defining the whitespace in |
Thanks @stasm. In that case, I won't invest more time in fixing the EBNF and will wait on your PR. |
I worked on this yesterday and this morning and got a bit stuck. Let me try to document my attempt. My goal was to encode in the EBNF the following two requirements.
There are a few follow-up questions and edge-cases related to (2) which require some more consideration:
I'd like us to agree on all of them before this issue can be fixed. |
@stasm Thanks. Why not post the PR? I think most of your judgements are in the follow up questions are correct. The differences in the ENBF should be minor. We can have a hot debate about whether LWSP or plain WSP should be used (and where). I think the statement in (2) is mostly be not entirely correct, e.g. space is sometimes used as a separator (next to I also noted your comments elsewhere about which sort of BNF to use and we should discuss that. |
Some more thoughts on context-free grammars. So far we've managed to keep I'm not fond of making our grammar specific to one tool, which is why I had attempted to define whitespace rules in the EBNF explicitly before. However, I'm not sure I know how to do this right. The main issue boils down to the fact that: What looks intuitive is oftentimes not LL(1). For example, I'd like to allow
This representation suffers from the so-called first/first conflict, which is common in LL grammars. When the parser sees a space after This particular issue can be solved by left-factoring the production in question:
This is now LL(1), but arguably is also less readable for a human reader. There are other constructs, however, which I don't know how to refactor to keep the LL(1) requirement. Assuming a slightly simplified syntax, I'd like to define that whitespace is required between function options, and optional at the end of placeholder.
This is also rather understandable and hopefully readable for a human. It also seems to be a fairly standard way of describing a set of repeated symbols. For example the XML spec defines the start tag as follows:
The problem is that this, again, is not LL(1). This is an example of a first/follow conflict. When the parser is done parsing an I don't know how to refactor this into LL(1), or even if it's possible. With all of the above in mind, perhaps LL(1) is too strict? We haven't documented it as a hard requirement, although we did mention in a few discussion about the syntax that it would be nice to have. I think there are a number of paths forward from here:
|
I got stuck (as you can probably tell from my comments above) and haven't finished it yet. I think what I'd like to do is make changes that make the grammar LL(1) with backtracking, and submit that for discussion. |
My opinions:
Whether the value is in braces shouldn't matter; it's more about the context. So I would allow but not require spaces around the
Yes, the space after
Yes, the space between options should be required. Here too it's simpler if the shape of the option value (i.e.
Yes, as with
Yes, spaces should be required around the |
I spent some more time thinking about this after the meeting and I think I agree that it's better to require whitespace around options and variant keys at all times. I call this in my head the "xml model" in which, too, the attributes must be separated by whitespace even if I have a stronger conviction that the whitespace around |
Agreed on everything except for this:
Specifically, I'm concerned that a line like
appears to associate the
Coming from there, and the spaces needed between |
I feel like this is a slippery slope. If we require spaces around the |
My general approach to whitespace is that I wouldn't want the syntax to punish users for sloppiness. While I have rather strong opinions on how I'd like to see messages formatted, I don't want to impose them on others. Parsing should be lenient. Linting can be strict. This is why I originally proposed not even requiring whitespace around literals, e.g. (Incidentally, if we switch to For the same reason of leniency in accepting input, I don't want to require whitespace around |
The words "sloppiness", "punish", and "impose" are doing a lot of work here, and I want to offer an alternative perspective. Only in the past few years, I've worked on projects where source code formatting was strictly enforced. At first, it felt cumbersome for me to include an extra step (to use the tooling), but the payoff was consistency across developers, and no spurious diffs in PR reviews due to formatting. Over time, the side benefit of less cognitive load accrued: I no longer had to worry about things I used to, like manually matching the reviewer's / codebase's subjective preferences of Having tooling to help users authoring messages is very useful, of course. For MessageFormat, @nbouvrette showed us way back when about this community tool from @vanwagonet, Online ICU Message Editor, that interactively validates & demonstrates a ICU MessageFormat v1 message pattern. (Fun anecdote: In cases where the syntax is very regular to begin with, you can create tooling where formatting is 100% predetermined and deterministic.) I find the strictness actually empowering because it allows me to spend less time thinking about syntax and formatting, so I can spend more time on higher-level concerns. |
Perhaps sloppy wasn't the right word. Is scrappy better?
@echeran I share the sentiment. I enjoy coding without thinking about the formatting, too, even if sometimes I'd prefer a different particular formatting. But the benefit of not even having to think about and discuss formatting is far greater than that of applying my own preference. I think we can expect similar tooling to emerge for MessageFormat 2. It might be a bit more involved for strings embedded directly in the source code (but still feasible). But we should also expect that some users won't have access to such tooling, either because of the stack they use, or the limitations of the build system, or the limitations of the editor. In their case, the cost of the grammar's strictness would be entirely theirs. Strictness is great once you're past the learning curve and when you have good tooling that can help you comply. Learning is a scrappy process which benefits from lenient parsing. I guess I'm trying to be realistic: I anticipate that developers will want to spend as little time writing MessageFormat syntax as possible. Hence my attempt to relax the grammar and remove as much friction as possible. |
Is your feature request related to a problem? Please describe.
The EBNF is a bit handwavy about whitespace. As it is currently written, no whitespace is permitted in places where we often write spaces in our examples, e.g. around the
=
in statements likelet $foo = (bar)
Describe the solution you'd like
Go through the EBNF and ensure that we permit (or disallow!) whitespace appropriately. I intend to file a PR for this.
Note that we should discuss whether we permit LWSP or WSP or just our
WhiteSpace
production. There are some okay-ish arguments for LWSP or just WSP.Describe why your solution should shape the standard
Parsers will be written from the EBNF. It should be correct and complete.
Additional context or examples
See above.
The text was updated successfully, but these errors were encountered: