-
-
Notifications
You must be signed in to change notification settings - Fork 36
(Design) Code Mode Introducer choice #521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This design doc attempts to capture the options for beautifying the code mode introducer.
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
As a translator, I don't want to have to learn special syntax to support features such as declarations. | ||
|
||
As a user, I want my messages to be robust. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Define what "robust" means. Does it mean that the visual diff between the simple quoted pattern & a non-simple pattern is minimal? Or does it mean that there is an unambiguous 1:1 correspondence between a message (including its patterns and behavior logic) and its syntactical text representation? Or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Option D being the overall best choice of those available, but the document should not present #
as the sigil of choice. As discussed previously, it's such a common comment-start character that using it like this would be problematic.
I'd recommend either !
or %
as alternatives.
I'm not sure if it was captured earlier, but we probably shouldn't use |
|
||
_What is this proposal trying to achieve?_ | ||
|
||
It must be possible to reliably parse messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Reliable" is vague. Can this be framed as a property of the grammar? (For example: "The grammar for messages must be LL(k), so that it can be parsed without arbitrary lookahead.")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will clarify. This has to do with determining whether a message is simple or complex (and only that). Any given message must produce a consistent result for this.
Determining whether a message will have code tokens requires some | ||
special character sequence, either part of the code itself or | ||
prepended to the message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Determining whether a message will have code tokens requires some | |
special character sequence, either part of the code itself or | |
prepended to the message. | |
For ease of parsing, a distinguished character sequence | |
should denote the presence of code tokens in a message | |
(either part of the code itself or prepended to the message). |
(Strictly speaking, it doesn't require that, since you could just scan for the first code token -- but that's undesirable since it requires arbitrary lookahead.)
The actual sigil used needs to be an ASCII character in the reserved or private use | ||
set (with syntax adjustments if we use up a private-use one). | ||
Most of the options below have been changed to use `^`, using | ||
Apple's experimental syntax as a model for sigil choice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what Apple's experimental syntax is -- can you add a link?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this from @grhoten's presentation at the Unicode Technical Workshop this past week. I don't have a link handy, but one will be in the offing I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember watching https://developer.apple.com/videos/play/wwdc2023/10153/ which demoed a syntax which used ^
: ^[Bienvenido](inflect: true) a tu iPhone
.
|
||
Pros: | ||
- Sigil is part of the keyword, not something separate | ||
- Requires minimum additional typing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say "minimum" here. Option B requires the absolute minimum additional typing, since it's just adding one character per message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, Option D adds zero characters. That's actually the minimum ;-).
I could say "Requires no additional typing". Also, note that this option avoids some of the iteration hazard of other options.
|
||
Cons: | ||
- Has no other purpose in the syntax | ||
- Looks like something should happen inside it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what that means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{^}
looks like some sort of placeholder. So does {}
. The sequence doesn't do anything, but it looks like it might do something. If we chose this option, we would have to tell users why it's there.
Co-authored-by: Tim Chevalier <tjc@igalia.com>
Not available in Italian: https://superuser.com/a/667694 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added another option, F, which proposes to place all declarations in a dedicated block for code, a preamble.
I'm OK with merging this PR as "proposed". I don't know yet if I agree that Option D is the best. Let's discuss on Monday.
- Closing portion of the syntax adds no value; | ||
could be a source of unintentional syntax errors | ||
- Messages commonly end with four `}}}}` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The whole complex message looks similar to a placeholder, which may mislead users into thinking that it's OK to put text before and after it, or even to do things like `You have {{match {$count} when 1 {{one thingy}} when * {{{$count} thingies}}}} and {{match {$count} when 1 {{one gizmo}} when * {{{$count} gizmos}}}}` | |
|
||
There are the following designs being considered: | ||
|
||
### Option A. Use Pattern Quotes for Messages (current design) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also consider slight variations of Option A which aren't exactly the current main
syntax. For example: {# ... #}
, {% ... %}
, or {[ ... ]}
. They could help mitigate at least the }}}}
problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout, although it's still a lot of closing syntax }}%}
``` | ||
Sample quoted pattern with no declarations or match: | ||
``` | ||
^{{Pattern}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all options not using {{
for code-mode, we could allow {{Pattern}}
to be a valid shorthand for quoted patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but then we'd be into having multiple equivalent representations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we already have two anyways: Pattern
and {{{{Pattern}}}}
. I'm merely describing options that we have. We could, for instance, also consider forbidding complex message syntax without declarations, so that only Pattern
is valid.
I think the question to ask first is: given that Pattern
is valid syntax and that we preserve whitespace, do we expect use-cases for quoting simple patterns?
It would be another way of ensuring leading and trailing whitespace is preserved, using only the MF2 syntax, regardless of what the host format offers. Consider:
<string xml:space="preserve">" Hello "</string>
<string>{{{{ Hello }}}}</string>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on how you look at this, it would either mean supporting for a pattern within a pattern, or adding a third syntax under the trenchcoat. I'd really rather not. Either we leave the external spaces for the surrounding syntax, or we don't. Let's not do both if we don't need to.
Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>
Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>
I am merging this document as proposed and then creating a ballot around it. |
It seems to me that there is a higher-level breakdown, where 'special sequence' below is a sequence of one or more ASCII punctuation/symbol characters. Alpha: mark the start of a complex message with a special sequence, and either the end of the declarations or the the end of the complex message with a special sequence (Option A, F) Beta: mark the start of a complex message with a special sequence (Option B, C, E) Gamma: mark start of each keyword that starts statements with a special sequence (Option D) Personally, I don't think either Alpha or Gamma add much value; and the key to Beta is to pick a sequence that is distinctive, doesn't collide with any other sequence, and very unlikely to start a simple message (thus probably a multi character sequence of symbols instead of a single symbol.
|
Sure, but non-ASCII is a non-starter for syntax. I can think of lots of characters that would be nice markers....... |
I think the only real impediment for non-ASCII in syntax would be that the standard keyboards in some OS's don't yet provide easy access to even all the Latin 1 characters. |
"Easy access" is relative. The first level (easiest) of access is what comes 'engraved' on a physical keyboard. There are software keyboards, but I'd guess MF2 will mostly be written on PCs, not smartphones. Some keyboards have an AltGr key, which (as far as I understand) allows access to another 'shift' level of characters per key. But some keyboards don't have an AltGr key. Mine doesn't. I wouldn't know how to type most Latin-1 characters, except for some by switching to the Japanese IME, and entering the Japanese name of the character. It's a pity, because using a non-ASCII character such as § (we probably wouldn't even need two of them) would be really nice. |
(as chair) We're discussing code mode introducers in #526 now, as this PR is closed. Consider moving this discussion there. |
This design doc attempts to capture the options for beautifying the code mode introducer.