Skip to content

Placeholders: What sigil(s) indicate them? #269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gibson042 opened this issue May 16, 2022 · 9 comments
Closed

Placeholders: What sigil(s) indicate them? #269

gibson042 opened this issue May 16, 2022 · 9 comments
Labels
syntax Issues related with syntax or ABNF

Comments

@gibson042
Copy link
Collaborator

develop syntax uses {…}, but the broader software ecosystem seems to have settled on a paradigm in which interpolated parts of text are indicated by either ${…} (JavaScript template literals and Mako templates) or {{…}} (Mustache, Jinja(2) and Angular).

Unless there is a clear pre-existing convention for single braces within the context of internationalization, I think it would be wise to conform with that external paradigm rather than diverging from it.

@eemeli eemeli added the syntax Issues related with syntax or ABNF label May 16, 2022
@stasm
Copy link
Collaborator

stasm commented May 16, 2022

I wonder if these same arguments would be good reasons to avoid picking ${...} or {{...}} for MF placeholders, in order to avoid friction and conflicts with the programming languages in which MF strings will be embedded. #263 has the same motivation, but for literals.

@mihnita
Copy link
Collaborator

mihnita commented May 16, 2022

Almost every combinations is used somewhere :-)
https://en.wikipedia.org/wiki/String_interpolation

But one might make the argument that we should do something that is slightly different than others.
Reason 1: detection. So that when one finds a string in a generic localization file (.properties, .json, .xml) there is a way to tell that this is MF2
Reason 2: avoid conflict. There is a risk that the underlying platform (python, whatever) might interpret these placeholders before they get to MF2

@gibson042
Copy link
Collaborator Author

I don't think either of those reasons hold up. The friction/conflict justification applies equally to any kind of string interpolation, but that has not prevented modern templating systems from settling in large part on two common patterns (and even there, with one appearing to be approaching dominance). As for the detection justification, the signal seems too weak to be useful—unless it is absolutely explicit (e.g., a "MessageFormat2:" prefix), there will be a large number of false positives (e.g., Lacking enclosing brackets, {$this} is not a Message.) and a moderate number of false negatives (since not every message will even have patterns and/or placeables).

My opinion is that being different from the broader ecosystem for the sake of being different is harmful rather than helpful. However, that does not preclude being different for a supportable reason such as "conforming with a clear pre-existing convention in the narrower scope of internationalization".

@eemeli
Copy link
Collaborator

eemeli commented May 20, 2022

One option here might be to do something close to what Jinja does:

There are a few kinds of delimiters. The default Jinja delimiters are configured as follows:

  • {% ... %} for Statements
  • {{ ... }} for Expressions to print to the template output
  • {# ... #} for Comments not included in the template output

Specifically, we could consider the start of a placeholder to always be two characters, for example: {$ ... }, {: ... }, {/ ... }, {[ ... ]}, {{ ... }}. In the current syntax, this would require two changes:

  1. Disallow whitespace between the initial { and any subsequent sigil.
  2. Reconsider the markup element syntax, e.g. using {+link} ... {-link} (and if MarkupEmpty is added, {+-link} or {+link-}).

With that change, a { followed by a word character would not need to be considered as syntax. In addition to any sigils that are chosen for initial use, others would need to be reserved for later expansion.

@markusicu
Copy link
Member

With that change, a { followed by a word character would not need to be considered as syntax. In addition to any sigils that are chosen for initial use, others would need to be reserved for later expansion.

I am skeptical about allowing { followed by a non-syntax character just being a literal character. I did that in ICU MessageFormat with the ASCII apostrophe, because it's the best I could do to make normal text mostly work (previously a pair of apostrophes always enclosed literal text, as a terrible kind of escaping syntax), but it still confuses developers.

Disallow whitespace between the initial { and any subsequent sigil.

This I like. I generally favor not allowing white space in more places than necessary.

It means that attempting to use unescaped curly braces as literal text yields a fail-fast syntax error. I don't think we need to complicate the syntax beyond that.

@macchiati
Copy link
Member

macchiati commented Jun 8, 2022 via email

@aphillips
Copy link
Member

I like @eemeli's suggestion of Jinja-like syntax. I think it's easier to have a consistent outer marker ({/}) and then use an inner marker (sigal) to indicate type. This reduces the characters that require escaping to just the curly brackets (or at least the opening bracket)

@mihnita
Copy link
Collaborator

mihnita commented Jun 9, 2022

Disallow whitespace between the initial { and any subsequent sigil.

I think I am OK with this, but should make sure we describe it properly (to be sure we are talking about the same thing)

For me {% and {$ and {# etc are not { + sigil.
They are complete, standalone tokens (in parser terms).

So yes, they don't allow spaces.
Same as a byte-shift a >> 2 in C (and others) is not described in terms like "two greater-than signs, but disallow whitespace in between. It is one single thing, the "shift" operator.

@mihnita
Copy link
Collaborator

mihnita commented Jun 9, 2022

And I am with Marcus on allowing { to be OK when not followed by a non-syntax character.
This seems convenient, but adds friction because it is inconsistent.
Now instead of "always escape {" the rule becomes a lot more complicated (escape { if X, but there is no need to escape if Y)
We save typing (one character) at the price of adding extra rules that we now need to understand and remember.

Same as the ; in JavaScript.
I am happy with "always use it", and I am happy with "never use it"
But it should not be "most of the times don't use it, but be careful that if you don't use it in situation A, B, C then it's a problem"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
syntax Issues related with syntax or ABNF
Projects
None yet
Development

No branches or pull requests

8 participants