Skip to content

(Design) Code Mode Introducer choice #521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 13, 2023
253 changes: 253 additions & 0 deletions exploration/code-mode-introducer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
# Design Proposal: Choosing a Code Mode Introducer

Status: **Proposed**

<details>
<summary>Metadata</summary>
<dl>
<dt>Contributors</dt>
<dd>@aphillips</dd>
<dt>First proposed</dt>
<dd>2023-11-10</dd>
<dt>Pull Requests</dt>
<dd>#000</dd>
</dl>
</details>

## Objective

_What is this proposal trying to achieve?_

It must be possible to reliably parse messages.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Reliable" is vague. Can this be framed as a property of the grammar? (For example: "The grammar for messages must be LL(k), so that it can be parsed without arbitrary lookahead.")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will clarify. This has to do with determining whether a message is simple or complex (and only that). Any given message must produce a consistent result for this.


Our current syntax features unquoted patterns for simple messages
and unquoted code tokens with quoted patterns for complex messages.
Determining whether a message will have code tokens requires some
special character sequence, either part of the code itself or
prepended to the message.
Comment on lines +25 to +27
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Determining whether a message will have code tokens requires some
special character sequence, either part of the code itself or
prepended to the message.
For ease of parsing, a distinguished character sequence
should denote the presence of code tokens in a message
(either part of the code itself or prepended to the message).

(Strictly speaking, it doesn't require that, since you could just scan for the first code token -- but that's undesirable since it requires arbitrary lookahead.)

This proposal examines the options for determining code mode.

## Background

_What context is helpful to understand this proposal?_

## Use-Cases

_What use-cases do we see? Ideally, quote concrete examples._

As a developer, I want to create messages with the minimal amount of special syntax.
I don't want to have to type additional characters that add no value.
I want the syntax to be logical and as consistent as possible.

As a translator, I don't want to have to learn special syntax to support features such as declarations.

As a user, I want my messages to be robust.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Define what "robust" means. Does it mean that the visual diff between the simple quoted pattern & a non-simple pattern is minimal? Or does it mean that there is an unambiguous 1:1 correspondence between a message (including its patterns and behavior logic) and its syntactical text representation? Or something else?

Minor edits and changes should not result in syntax errors.

As a user, I want to be able to see which messages are complex at a glance
and to parse messages into their component parts visually as easily as possible.

## Requirements

_What properties does the solution have to manifest to enable the use-cases above?_

## Constraints

_What prior decisions and existing conditions limit the possible design?_

Some of the options use a new sigil as part of the introducer.
For various reasons, `#` has been used recently as a placeholder for this sigil.
There are concerns that this character is not suitable, since it is used as a comment
introducer in a number of formats.
See for example [#520](https://github.com/unicode-org/message-format-wg/issues/520).
The actual sigil used needs to be an ASCII character in the reserved or private use
set (with syntax adjustments if we use up a private-use one).
Most of the options below have been changed to use `^`, using
Apple's experimental syntax as a model for sigil choice.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what Apple's experimental syntax is -- can you add a link?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this from @grhoten's presentation at the Unicode Technical Workshop this past week. I don't have a link handy, but one will be in the offing I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember watching https://developer.apple.com/videos/play/wwdc2023/10153/ which demoed a syntax which used ^: ^[Bienvenido](inflect: true) a tu iPhone.


It should be noted that an introducer sigil should be as rare as possible in normal text.
This tends to run against common punctuation marks `&`, `%`, `!`, and `?`.

```abnf
reserved-start = "!" / "@" / "#" / "%" / "*" / "<" / ">" / "/" / "?" / "~"
private-start = "^" / "&"
```

## Proposed Design

_Describe the proposed solution. Consider syntax, formatting, errors, registry, tooling, interchange._

We need to choose one of these (or another option not yet considered).
Presentation at UTW did not produce any opinions.

Based on the pro/cons below, I would suggest Option D is possibly the best option?

## Alternatives Considered

_What other solutions are available?_
_How do they compare against the requirements?_
_What other properties they have?_

There are the following designs being considered:

### Option A. Use Pattern Quotes for Messages (current design)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also consider slight variations of Option A which aren't exactly the current main syntax. For example: {# ... #}, {% ... %}, or {[ ... ]}. They could help mitigate at least the }}}} problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout, although it's still a lot of closing syntax }}%}


Complex messages are quoted with double curly brackets.
The closing curly brackets might be optional.

Sample pattern:
```
{{
input {$var}
match {$var}
when * {{Pattern}}
}}
```
Sample quoted pattern with no declarations or match:
```
{{{{Pattern}}}}
```

Pros:
- Uses a sigil `{` already present in the syntax
- No additional escapes
- Consistent with other parts of the syntax?

Cons:
- Somewhat verbose
- Closing portion of the syntax adds no value;
could be a source of unintentional syntax errors
- Messages commonly end with four `}}}}`

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The whole complex message looks similar to a placeholder, which may mislead users into thinking that it's OK to put text before and after it, or even to do things like `You have {{match {$count} when 1 {{one thingy}} when * {{{$count} thingies}}}} and {{match {$count} when 1 {{one gizmo}} when * {{{$count} gizmos}}}}`

> [!NOTE] Other enclosing sequences are also an option, notably `{%...%}` (or similar).
> This does reduce the number of curly brackets in a row.

### Option B. Use a Sigil

Complex messages start with a special sigil character.

```
^input {$var}
match {$var}
when * {{Pattern}}
```
Sample quoted pattern with no declarations or match:
```
^{{Pattern}}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all options not using {{ for code-mode, we could allow {{Pattern}} to be a valid shorthand for quoted patterns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but then we'd be into having multiple equivalent representations?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we already have two anyways: Pattern and {{{{Pattern}}}}. I'm merely describing options that we have. We could, for instance, also consider forbidding complex message syntax without declarations, so that only Pattern is valid.

I think the question to ask first is: given that Pattern is valid syntax and that we preserve whitespace, do we expect use-cases for quoting simple patterns?

It would be another way of ensuring leading and trailing whitespace is preserved, using only the MF2 syntax, regardless of what the host format offers. Consider:

<string xml:space="preserve">"   Hello   "</string>
<string>{{{{   Hello   }}}}</string>

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how you look at this, it would either mean supporting for a pattern within a pattern, or adding a third syntax under the trenchcoat. I'd really rather not. Either we leave the external spaces for the surrounding syntax, or we don't. Let's not do both if we don't need to.

```

Pros:
- Requires minimum additional typing

Cons:
- Requires an additional sigil
- Requires an additional escape for simple pattern start
- Has no other purpose in the syntax

### Option C. Use a Double Sigil

Like Option B, except the sigil is doubled.

```
^^input {$var}
match {$var}
when * {{Pattern}}
```
Sample quoted pattern with no declarations or match:
```
^^{{Pattern}}
```

Pros:
- Less likely to conflict with a simple pattern

Cons:
- Requires an additional sigil
- Requires an additional escape for simple pattern start
- Has no other purpose in the syntax

### Option D. Sigilized Keywords

Instead of quoting the message, adds a sigil to keywords that
start statements, that is, `.input`, `.local` and `.match`.
The keyword `when` might be considered separately.

The sigil used was changed to `.` as a result of the 2023-11-13 teleconference
discussion of sigils. Others considered were `~`, `@`, `&`, and `%`.
Originally this was `#` for similarity to `#define` (etc.) in other environments.

```
.input {$var}
.local $foo = {$bar}
.match {$var}
when * {{Pattern}}
```
Sample quoted pattern with no declarations or match:
```
{{Pattern}}
```

Pros:
- Sigil is part of the keyword, not something separate; note that the
need for escaping is reduced by attaching the sigil to the keyword,
since `.input` or `.local` or `.match` are unlikely to be message starters
- Requires minimum additional typing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say "minimum" here. Option B requires the absolute minimum additional typing, since it's just adding one character per message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, Option D adds zero characters. That's actually the minimum ;-).

I could say "Requires no additional typing". Also, note that this option avoids some of the iteration hazard of other options.

- Adds no characters to messages that consist of only a quoted pattern;
that is, quoting the pattern consists only of adding the `{{`/`}}` quotes
- Maybe makes single-line messages easier to parse visually???

Cons:
- Requires an additional sigil
- Requires an additional escape for simple pattern start

### Option E. Special Sequence

Like Option A except the sequence is closed locally (not at the end of the message).
The suggested sequence is `{#}` but might be `{}` or `{{}}` also.

```
{^}input {$var}
match {$var}
when * {{Pattern}}
```
Sample quoted pattern with no declarations or match:
```
{^}{{Pattern}}
```

Pros:
- Less likely to conflict with a simple pattern
- Requires no additional sigil
- Requires no additional escape

Cons:
- Has no other purpose in the syntax
- Looks like something should happen inside it
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what that means?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{^} looks like some sort of placeholder. So does {}. The sequence doesn't do anything, but it looks like it might do something. If we chose this option, we would have to tell users why it's there.

- Most additional typing

### Option F. Preamble

In this option, all declarations are placed in a dedicated block at the beginning of the message.
The preamble is the "front-matter" of the message, containing the message's logic.
`when` clauses are not part of the preamble.

The preamble can be delimited with `{% ... %}`:

{%input {$var} match {$var}%} when * {{Pattern}}

Alternatively, it can be delimited with a new kind of delimiter, to make it visually distinct from placeholders and patterns:

[[input {$var} match {$var}]] when * {{Pattern}}

We could also consider dropping the `when` keywords:

[[input {$var} match {$var}]] * {{Pattern}}

Pros:
- Provides a clear conceptual distinction between declarations and variants.
- Visually, all code is grouped together.
- Unnests variant patterns.

Cons:
- If `[[ ... ]]` is used to delimit the preamble, it will require `[[` to be escaped at the beginning of simple patterns.