Skip to content

(Design) Code Mode Introducer choice #521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 13, 2023
Merged

Conversation

aphillips
Copy link
Member

This design doc attempts to capture the options for beautifying the code mode introducer.

This design doc attempts to capture the options for beautifying the code mode introducer.
aphillips and others added 2 commits November 10, 2023 12:51
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
Copy link
Collaborator

@echeran echeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


As a translator, I don't want to have to learn special syntax to support features such as declarations.

As a user, I want my messages to be robust.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Define what "robust" means. Does it mean that the visual diff between the simple quoted pattern & a non-simple pattern is minimal? Or does it mean that there is an unambiguous 1:1 correspondence between a message (including its patterns and behavior logic) and its syntactical text representation? Or something else?

Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Option D being the overall best choice of those available, but the document should not present # as the sigil of choice. As discussed previously, it's such a common comment-start character that using it like this would be problematic.

I'd recommend either ! or % as alternatives.

@eemeli
Copy link
Collaborator

eemeli commented Nov 11, 2023

I'm not sure if it was captured earlier, but we probably shouldn't use ^ for anything, given that it's a dead key on a number of non-American keyboards.


_What is this proposal trying to achieve?_

It must be possible to reliably parse messages.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Reliable" is vague. Can this be framed as a property of the grammar? (For example: "The grammar for messages must be LL(k), so that it can be parsed without arbitrary lookahead.")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will clarify. This has to do with determining whether a message is simple or complex (and only that). Any given message must produce a consistent result for this.

Comment on lines +25 to +27
Determining whether a message will have code tokens requires some
special character sequence, either part of the code itself or
prepended to the message.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Determining whether a message will have code tokens requires some
special character sequence, either part of the code itself or
prepended to the message.
For ease of parsing, a distinguished character sequence
should denote the presence of code tokens in a message
(either part of the code itself or prepended to the message).

(Strictly speaking, it doesn't require that, since you could just scan for the first code token -- but that's undesirable since it requires arbitrary lookahead.)

The actual sigil used needs to be an ASCII character in the reserved or private use
set (with syntax adjustments if we use up a private-use one).
Most of the options below have been changed to use `^`, using
Apple's experimental syntax as a model for sigil choice.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what Apple's experimental syntax is -- can you add a link?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this from @grhoten's presentation at the Unicode Technical Workshop this past week. I don't have a link handy, but one will be in the offing I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember watching https://developer.apple.com/videos/play/wwdc2023/10153/ which demoed a syntax which used ^: ^[Bienvenido](inflect: true) a tu iPhone.


Pros:
- Sigil is part of the keyword, not something separate
- Requires minimum additional typing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say "minimum" here. Option B requires the absolute minimum additional typing, since it's just adding one character per message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Option D adds zero characters. That's actually the minimum ;-).

I could say "Requires no additional typing". Also, note that this option avoids some of the iteration hazard of other options.


Cons:
- Has no other purpose in the syntax
- Looks like something should happen inside it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what that means?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{^} looks like some sort of placeholder. So does {}. The sequence doesn't do anything, but it looks like it might do something. If we chose this option, we would have to tell users why it's there.

Co-authored-by: Tim Chevalier <tjc@igalia.com>
@aphillips
Copy link
Member Author

@eemeli noted:

I'm not sure if it was captured earlier, but we probably shouldn't use ^ for anything, given that it's a dead key on a number of non-American keyboards.

Ugh. You're right. Wave-dash/tilde anyone? ~?

@eemeli
Copy link
Collaborator

eemeli commented Nov 11, 2023

Ugh. You're right. Wave-dash/tilde anyone? ~?

Not available in Italian: https://superuser.com/a/667694

Copy link
Collaborator

@stasm stasm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added another option, F, which proposes to place all declarations in a dedicated block for code, a preamble.

I'm OK with merging this PR as "proposed". I don't know yet if I agree that Option D is the best. Let's discuss on Monday.

- Closing portion of the syntax adds no value;
could be a source of unintentional syntax errors
- Messages commonly end with four `}}}}`

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The whole complex message looks similar to a placeholder, which may mislead users into thinking that it's OK to put text before and after it, or even to do things like `You have {{match {$count} when 1 {{one thingy}} when * {{{$count} thingies}}}} and {{match {$count} when 1 {{one gizmo}} when * {{{$count} gizmos}}}}`


There are the following designs being considered:

### Option A. Use Pattern Quotes for Messages (current design)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also consider slight variations of Option A which aren't exactly the current main syntax. For example: {# ... #}, {% ... %}, or {[ ... ]}. They could help mitigate at least the }}}} problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout, although it's still a lot of closing syntax }}%}

```
Sample quoted pattern with no declarations or match:
```
^{{Pattern}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all options not using {{ for code-mode, we could allow {{Pattern}} to be a valid shorthand for quoted patterns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but then we'd be into having multiple equivalent representations?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we already have two anyways: Pattern and {{{{Pattern}}}}. I'm merely describing options that we have. We could, for instance, also consider forbidding complex message syntax without declarations, so that only Pattern is valid.

I think the question to ask first is: given that Pattern is valid syntax and that we preserve whitespace, do we expect use-cases for quoting simple patterns?

It would be another way of ensuring leading and trailing whitespace is preserved, using only the MF2 syntax, regardless of what the host format offers. Consider:

<string xml:space="preserve">"   Hello   "</string>
<string>{{{{   Hello   }}}}</string>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how you look at this, it would either mean supporting for a pattern within a pattern, or adding a third syntax under the trenchcoat. I'd really rather not. Either we leave the external spaces for the surrounding syntax, or we don't. Let's not do both if we don't need to.

aphillips and others added 3 commits November 11, 2023 10:45
Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>
Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>
@aphillips
Copy link
Member Author

I am merging this document as proposed and then creating a ballot around it.

@aphillips aphillips merged commit 559808b into main Nov 13, 2023
@aphillips aphillips deleted the aphillips-code-mode-introducer branch November 13, 2023 22:57
@macchiati
Copy link
Member

It seems to me that there is a higher-level breakdown, where 'special sequence' below is a sequence of one or more ASCII punctuation/symbol characters.

Alpha: mark the start of a complex message with a special sequence, and either the end of the declarations or the the end of the complex message with a special sequence (Option A, F)

Beta: mark the start of a complex message with a special sequence (Option B, C, E)

Gamma: mark start of each keyword that starts statements with a special sequence (Option D)

Personally, I don't think either Alpha or Gamma add much value; and the key to Beta is to pick a sequence that is distinctive, doesn't collide with any other sequence, and very unlikely to start a simple message (thus probably a multi character sequence of symbols instead of a single symbol.

  • By the way, we'd have a lot more choices for these if we considered non-ASCII — §§ would be a nice separator.

@aphillips
Copy link
Member Author

By the way, we'd have a lot more choices for these if we considered non-ASCII — §§ would be a nice separator.

Sure, but non-ASCII is a non-starter for syntax. I can think of lots of characters that would be nice markers.......

@macchiati
Copy link
Member

I think the only real impediment for non-ASCII in syntax would be that the standard keyboards in some OS's don't yet provide easy access to even all the Latin 1 characters.

@duerst
Copy link

duerst commented Nov 14, 2023

I think the only real impediment for non-ASCII in syntax would be that the standard keyboards in some OS's don't yet provide easy access to even all the Latin 1 characters.

"Easy access" is relative. The first level (easiest) of access is what comes 'engraved' on a physical keyboard. There are software keyboards, but I'd guess MF2 will mostly be written on PCs, not smartphones. Some keyboards have an AltGr key, which (as far as I understand) allows access to another 'shift' level of characters per key. But some keyboards don't have an AltGr key. Mine doesn't. I wouldn't know how to type most Latin-1 characters, except for some by switching to the Japanese IME, and entering the Japanese name of the character. It's a pity, because using a non-ASCII character such as § (we probably wouldn't even need two of them) would be really nice.

@aphillips
Copy link
Member Author

(as chair)

We're discussing code mode introducers in #526 now, as this PR is closed. Consider moving this discussion there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants