-
-
Notifications
You must be signed in to change notification settings - Fork 36
Explain why we use |
and not "
and/or '
#465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is worth writing this down in the spec. Thanks for proposing this.
See if my proposal below helps draw the sting of not adopting quote marks 😉
The `|` is used as a quote delimiter | ||
because it is not commonly used as a quote delimiter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `|` is used as a quote delimiter | |
because it is not commonly used as a quote delimiter. | |
The character `|` is used to delimit _quoted_ literals | |
because it is rarely used in natural language content | |
or in syntaxes used by various programming and scripting languages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment on @aphillips's suggestion:
or in syntaxes used by various programming and scripting languages.
It isn't just that vertical pipes are rare in other syntaxes, but specifically that they are rare as quote delimiters.
Using something more ordinary like `'` or `"` | ||
would mean that they would need to be escaped | ||
if MessageFormat 2 syntax needed to be included in a resource format | ||
that used the same quote delimiter. | ||
Supporting both `'` and `"` as quote delimiters would potentially require | ||
the delimiters to be manually escaped when copying them between resource formats. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'
and "
are not special to us, so we can lump these in with other characters that we also rejected while searching for characters that are not common in natural language or programming syntax to use as a quote character.
Using something more ordinary like `'` or `"` | |
would mean that they would need to be escaped | |
if MessageFormat 2 syntax needed to be included in a resource format | |
that used the same quote delimiter. | |
Supporting both `'` and `"` as quote delimiters would potentially require | |
the delimiters to be manually escaped when copying them between resource formats. | |
This is important because the quote delimiting character | |
has to be escaped whenever it appears in _text_ in a _message_ | |
or in the body of a literal. | |
Characters such as `'` and `"` are known to be common in ordinary | |
text, as well as having meaning in various syntaxes. | |
In particular, our format will be embedded into various resource formats, | |
many of which use `"` characters to delimit message values. | |
When our escape and quote characters conflict with those of the host syntax, | |
this would require the user to escape our quotes | |
and to double-escape our quoted quotes. | |
These levels of escaping will make the _message_ difficult to read | |
in various resource formats. | |
Experience with special meaning for the apostrophe (`'`) in MF1 | |
also informs this choice. |
Example
{"Quoted \"literal\""}
when embedded into (say) a JSON file might require the developer to type:
"message": "{\"Quoted \\\"literal\\\"\"}"
Make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When our escape and quote characters conflict with those of the host syntax,
this would require the user to escape our quotes
and to double-escape our quoted quotes.
This is why #414 suggests allowing either '
or "
as quoted delimiters. That way, your example could be expressed as
{'Quoted "literal"'}
and be embedded in JSON as
"message": "{'Quoted \"literal\"'}"
so only the "
that are actual characters of the literal need to be quoted, exactly as is the case with |
.
Hence our explanation ought to say why the above is a worse option than picking a novel quote delimiter |
. This is missing from your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works in JSON, but not in many syntaxes (which are more rigid about quotes). And the quote alternation can depend on the exterior quoting regime not visible to e.g. translation tooling. Changes to the quote characters look like differences to processes such as translation memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to just embed something like that example, e.g.
For example,
message = '{|Eat At Joe\'s|}'
(using|
as the MessageFormat quote delimiter and using the embedding language's syntax to include a literal'
in a'
-delimited string containing the message) is more readable than the alternativemessage = '{\'Eat At Joe\\\'s\'}'
(using'
as the MessageFormat quote delimiter, which must itself be escaped in the embedding language along with both characters of the\'
MessageFormat-level sequence for including a raw'
in a'
-quoted literal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works in JSON, but not in many syntaxes (which are more rigid about quotes).
Could you give an example? I'm not sure how a resource syntax could be more rigid about quotes than JSON.
And the quote alternation can depend on the exterior quoting regime not visible to e.g. translation tooling. Changes to the quote characters look like differences to processes such as translation memory.
Sure, if you're comparing the exact string representations of messages. However, a reasonable MF2 message comparator would either compare the data model representations of messages, or their understanding of each message's canonical source strings, which would normalise away any whitespace or quotation style differences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 conveniences that |
give us:
- "inwards", it's easy to include
"
and'
in literals, without escaping them via\
, - "outwards", it's easy to embed messages with string literals in code or container formats.
The combination of these two is particularly interesting: it's when a message both contains quotes in literals and is embedded in code. If we delimited literals with "
or '
, then we'd often need to solve two problems at the same time:
- escape quotes inside literals via
\
, and - escape quotes delimiting literals in code or container formats and escape the MF2's
\
from the bullet above, because it's a common escape char in many languages.
This is how we get the gnarly \\\
. By using |
, we sidestep both problems.
Closing in favour of #477; a design doc is a better way to rationalise the decision. |
Adds an explanation for why the literal syntax uses
|
delimiters. I think the explanation is pretty flimsy, but we don't have a better one and I've not been able to convince the WG to change it with #414, which should be closed if this is merged.