Skip to content

Explain why we use | and not " and/or ' #465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

eemeli
Copy link
Collaborator

@eemeli eemeli commented Sep 2, 2023

Adds an explanation for why the literal syntax uses | delimiters. I think the explanation is pretty flimsy, but we don't have a better one and I've not been able to convince the WG to change it with #414, which should be closed if this is merged.

@eemeli eemeli requested review from aphillips and stasm September 2, 2023 09:46
@eemeli eemeli added syntax Issues related with syntax or ABNF editorial Issue is non-normative labels Sep 2, 2023
Copy link
Member

@aphillips aphillips left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is worth writing this down in the spec. Thanks for proposing this.

See if my proposal below helps draw the sting of not adopting quote marks 😉

Comment on lines +599 to +600
The `|` is used as a quote delimiter
because it is not commonly used as a quote delimiter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `|` is used as a quote delimiter
because it is not commonly used as a quote delimiter.
The character `|` is used to delimit _quoted_ literals
because it is rarely used in natural language content
or in syntaxes used by various programming and scripting languages.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment on @aphillips's suggestion:

or in syntaxes used by various programming and scripting languages.

It isn't just that vertical pipes are rare in other syntaxes, but specifically that they are rare as quote delimiters.

Comment on lines +601 to +606
Using something more ordinary like `'` or `"`
would mean that they would need to be escaped
if MessageFormat 2 syntax needed to be included in a resource format
that used the same quote delimiter.
Supporting both `'` and `"` as quote delimiters would potentially require
the delimiters to be manually escaped when copying them between resource formats.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

' and " are not special to us, so we can lump these in with other characters that we also rejected while searching for characters that are not common in natural language or programming syntax to use as a quote character.

Suggested change
Using something more ordinary like `'` or `"`
would mean that they would need to be escaped
if MessageFormat 2 syntax needed to be included in a resource format
that used the same quote delimiter.
Supporting both `'` and `"` as quote delimiters would potentially require
the delimiters to be manually escaped when copying them between resource formats.
This is important because the quote delimiting character
has to be escaped whenever it appears in _text_ in a _message_
or in the body of a literal.
Characters such as `'` and `"` are known to be common in ordinary
text, as well as having meaning in various syntaxes.
In particular, our format will be embedded into various resource formats,
many of which use `"` characters to delimit message values.
When our escape and quote characters conflict with those of the host syntax,
this would require the user to escape our quotes
and to double-escape our quoted quotes.
These levels of escaping will make the _message_ difficult to read
in various resource formats.
Experience with special meaning for the apostrophe (`'`) in MF1
also informs this choice.

Example

{"Quoted \"literal\""}

when embedded into (say) a JSON file might require the developer to type:

"message": "{\"Quoted \\\"literal\\\"\"}"

Make sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When our escape and quote characters conflict with those of the host syntax,
this would require the user to escape our quotes
and to double-escape our quoted quotes.

This is why #414 suggests allowing either ' or " as quoted delimiters. That way, your example could be expressed as

{'Quoted "literal"'}

and be embedded in JSON as

"message": "{'Quoted \"literal\"'}"

so only the " that are actual characters of the literal need to be quoted, exactly as is the case with |.

Hence our explanation ought to say why the above is a worse option than picking a novel quote delimiter |. This is missing from your suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works in JSON, but not in many syntaxes (which are more rigid about quotes). And the quote alternation can depend on the exterior quoting regime not visible to e.g. translation tooling. Changes to the quote characters look like differences to processes such as translation memory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to just embed something like that example, e.g.

For example, message = '{|Eat At Joe\'s|}' (using | as the MessageFormat quote delimiter and using the embedding language's syntax to include a literal ' in a '-delimited string containing the message) is more readable than the alternative message = '{\'Eat At Joe\\\'s\'}' (using ' as the MessageFormat quote delimiter, which must itself be escaped in the embedding language along with both characters of the \' MessageFormat-level sequence for including a raw ' in a '-quoted literal).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works in JSON, but not in many syntaxes (which are more rigid about quotes).

Could you give an example? I'm not sure how a resource syntax could be more rigid about quotes than JSON.

And the quote alternation can depend on the exterior quoting regime not visible to e.g. translation tooling. Changes to the quote characters look like differences to processes such as translation memory.

Sure, if you're comparing the exact string representations of messages. However, a reasonable MF2 message comparator would either compare the data model representations of messages, or their understanding of each message's canonical source strings, which would normalise away any whitespace or quotation style differences.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 conveniences that | give us:

  1. "inwards", it's easy to include " and ' in literals, without escaping them via \,
  2. "outwards", it's easy to embed messages with string literals in code or container formats.

The combination of these two is particularly interesting: it's when a message both contains quotes in literals and is embedded in code. If we delimited literals with " or ', then we'd often need to solve two problems at the same time:

  • escape quotes inside literals via \, and
  • escape quotes delimiting literals in code or container formats and escape the MF2's \ from the bullet above, because it's a common escape char in many languages.

This is how we get the gnarly \\\. By using |, we sidestep both problems.

@stasm stasm self-assigned this Sep 4, 2023
@eemeli
Copy link
Collaborator Author

eemeli commented Oct 8, 2023

Closing in favour of #477; a design doc is a better way to rationalise the decision.

@eemeli eemeli closed this Oct 8, 2023
@eemeli eemeli deleted the explain-the-bar branch October 8, 2023 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
editorial Issue is non-normative syntax Issues related with syntax or ABNF
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants